# statsmodels — Statistical Modeling and Econometrics in Python > A Python library providing classes and functions for estimation of statistical models, performing tests, and exploring data with a focus on transparency and completeness of results. ## Install Save as a script file and run: # statsmodels — Statistical Modeling and Econometrics in Python ## Quick Use ```python import statsmodels.api as sm import numpy as np X = np.random.randn(100, 2) X = sm.add_constant(X) y = X @ [1, 0.5, -0.3] + np.random.randn(100) * 0.5 model = sm.OLS(y, X).fit() print(model.summary()) ``` ## Introduction statsmodels complements scikit-learn by focusing on classical statistical inference rather than prediction. It provides detailed model summaries with coefficients, standard errors, p-values, and confidence intervals — the output statisticians and economists expect from tools like R or Stata. ## What statsmodels Does - Fits linear and generalized linear models with comprehensive diagnostic output - Implements time-series analysis including ARIMA, VAR, state-space models, and seasonal decomposition - Provides nonparametric methods like kernel density estimation and lowess smoothing - Runs hypothesis tests (t-test, F-test, Granger causality, unit root tests) - Generates publication-ready regression tables and diagnostic plots ## Architecture Overview statsmodels follows a model-fit-results pattern. You specify a model class (OLS, Logit, ARIMA), call .fit() to estimate parameters, and receive a results object with properties for coefficients, residuals, information criteria, and statistical tests. Under the hood, estimation uses scipy.optimize and numpy linear algebra routines. ## Self-Hosting & Configuration - Install via pip: pip install statsmodels - Depends on NumPy, SciPy, pandas, and patsy for formula-based model specification - Use R-style formulas: sm.OLS.from_formula("y ~ x1 + x2", data=df) - Configure optimizer parameters and covariance estimators per model - Works in Jupyter notebooks with rich HTML output for model summaries ## Key Features - Comprehensive model summaries matching R/Stata output with AIC, BIC, R-squared, and residual diagnostics - Time-series toolbox with ARIMA, SARIMAX, VAR, and exponential smoothing - Robust covariance estimators (HC0-HC3, HAC, clustered) for correct inference under heteroscedasticity - Mixed-effects models for hierarchical and panel data - Survival analysis with Kaplan-Meier and Cox proportional hazards ## Comparison with Similar Tools - **scikit-learn** — focused on prediction accuracy; statsmodels provides inference statistics (p-values, confidence intervals) - **R (stats package)** — the gold standard for statistical computing; statsmodels brings similar functionality to the Python ecosystem - **SciPy (scipy.stats)** — provides individual tests; statsmodels offers full model estimation and diagnostics - **linearmodels** — extends statsmodels with panel data and IV models; statsmodels covers the broader foundation ## FAQ **Q: When should I use statsmodels instead of scikit-learn?** A: Use statsmodels when you need to understand relationships (coefficients, significance, confidence intervals) rather than just predict outcomes. **Q: Does statsmodels support regularized regression?** A: Yes. OLS and GLM classes support elastic net regularization via fit_regularized(), though scikit-learn may be more convenient for pure prediction tasks. **Q: Can I use statsmodels for time-series forecasting?** A: Yes. ARIMA, SARIMAX, and state-space models are well-implemented with automatic parameter selection helpers. **Q: How does the formula API work?** A: Use patsy-style formulas like "y ~ x1 + x2 + x1:x2" to specify models declaratively from a DataFrame, similar to R. ## Sources - https://github.com/statsmodels/statsmodels - https://www.statsmodels.org --- Source: https://tokrepo.com/en/workflows/asset-c95fc338 Author: Script Depot