statsmodels is an excellent project and important part of the python scientific stack. But due to resource constraints, they cannot push out bugfixes often enough for my needs. sm2 is a fork focused on bugfixes and addressing technical debt.
Ideally sm2 will be a drop-in replacement for statsmodels. In places where this fails, feel free to open an issue.
With luck, fixes made here will eventually be ported upstream.
| Build Status |
|
|
|
|
| Coverage |
|
-
sm2 contains a subset of the functionality of statsmodels. The first big difference is that statsmodels is more feature-complete.
-
Test coverage statistics reported for sm2 are meaningful (:issue:
4331) -
An enormous amount of code-cleanup has been done in sm2. Thousands of lines of unused, untested, or deprecated code have been removed. Many thousands of flake8 formatting issues have been cleaned up.
-
MultinomialResults.paramsandpredictwill have correct column and row labels (:issue:4541) -
VARResults.cov_paramswill correctly return aDataFrameinstead of raisingValueError. -
VARResults.acfwill return correct results (:issue:4572) -
The
ArmaProcessclass does not have anobsattribute. -
tsa.stattools.acfwill always return(acf, confint, qstat, pvalue)here instead of a different subset of these depending on the inputs. -
stats.diagnostic.acorr_ljungbox will always return
(qljungbox, pval, qboxpierce, pvalbp)here instead of a different subset of these depending on the inputs. -
summary2methods have not been ported from upstream, will raiseNotImplementedError. -
VARResults.test_whitenesshas been superceeded upstream bytest_whiteness_newas the older method was not an actual statistical test (:issue:4036).sm2replaces the older version entirely and keeps only the nametest_whiteness. -
ARModel.fitincorrectly setsmodel.df_residupstream. That has been fixed here. -
GenericLikelihoodModelResults.__init__incorrectly setsmodel.df_residandmodel.df_model. That has been fixed here. -
GeneralizedLinearModel.fitincorrect setsself.muandself.scale. This has been fixed here. (:issue:4032) -
LikelihoodModelResults._get_robustcov_resultsincorrectly ignoresuse_selfargument. This has been fixed here. (:issue:4401)
Issues and Pull Requests are welcome. If you are looking a place to start, here are some suggestions:
-
Search for comments starting with
# TODO:or# FIXME:- Some comments are copied from upstream and should have these labels but are missing them. If you find a comment that should have one of these labels (or is just unclear), add the label.
-
Many tests from upstream are marked with
pytest.mark.not_vettedto reflect the fact that they haven't been reviewed since being ported from statsmodels. To "vet" a test, try to determine:- Is this a "smoke test"? If so, it should be marked with
pytest.mark.smoke. - Is this a test for a specific bug? Can an Issue reference
(e.g.
# GH#1234) be included? - Is there something specific being tested? If so, the test name should
be made informative and often a comment should be added
(e.g.
# test function foo.bar in case where baz argument is near-singular) - Is this testing results produced by statsmodels/sm2 against results produced by another package? If so, it should be clear how those results were produced. The original authors put a lot of effort into producing these comparisons; they should be reproducible.
- Is this a "smoke test"? If so, it should be marked with
-
There are some spots where tests are meager and could use some attention:
tsa.vector_ar.irfregression._predictionstats.sandwich_covariance
-
As of 2018-03-19 there are still 390 flake8 warnings/errors. For many of these, fixing them requires figuring out what the writer's attention was upstream.
-
As of 2018-03-19 about 20% of statsmodels has been ported to sm2 (though a much larger percentage of the usable, non-redundant, non-deprecated code). If there are portions of statsmodels that you want or need, don't be shy.
-
If there is a change you parrticularly like, make a Pull Request upstream to get it implemented directly in statsmodels.