# statsmodels — Statistical Modeling and Econometrics in Python

> A Python library providing classes and functions for estimation of statistical models, performing tests, and exploring data with a focus on transparency and completeness of results.

## Install

Save as a script file and run:

# statsmodels — Statistical Modeling and Econometrics in Python

## Quick Use
```python
import statsmodels.api as sm
import numpy as np

X = np.random.randn(100, 2)
X = sm.add_constant(X)
y = X @ [1, 0.5, -0.3] + np.random.randn(100) * 0.5

model = sm.OLS(y, X).fit()
print(model.summary())
```

## Introduction
statsmodels complements scikit-learn by focusing on classical statistical inference rather than prediction. It provides detailed model summaries with coefficients, standard errors, p-values, and confidence intervals — the output statisticians and economists expect from tools like R or Stata.

## What statsmodels Does
- Fits linear and generalized linear models with comprehensive diagnostic output
- Implements time-series analysis including ARIMA, VAR, state-space models, and seasonal decomposition
- Provides nonparametric methods like kernel density estimation and lowess smoothing
- Runs hypothesis tests (t-test, F-test, Granger causality, unit root tests)
- Generates publication-ready regression tables and diagnostic plots

## Architecture Overview
statsmodels follows a model-fit-results pattern. You specify a model class (OLS, Logit, ARIMA), call .fit() to estimate parameters, and receive a results object with properties for coefficients, residuals, information criteria, and statistical tests. Under the hood, estimation uses scipy.optimize and numpy linear algebra routines.

## Self-Hosting & Configuration
- Install via pip: pip install statsmodels
- Depends on NumPy, SciPy, pandas, and patsy for formula-based model specification
- Use R-style formulas: sm.OLS.from_formula("y ~ x1 + x2", data=df)
- Configure optimizer parameters and covariance estimators per model
- Works in Jupyter notebooks with rich HTML output for model summaries

## Key Features
- Comprehensive model summaries matching R/Stata output with AIC, BIC, R-squared, and residual diagnostics
- Time-series toolbox with ARIMA, SARIMAX, VAR, and exponential smoothing
- Robust covariance estimators (HC0-HC3, HAC, clustered) for correct inference under heteroscedasticity
- Mixed-effects models for hierarchical and panel data
- Survival analysis with Kaplan-Meier and Cox proportional hazards

## Comparison with Similar Tools
- **scikit-learn** — focused on prediction accuracy; statsmodels provides inference statistics (p-values, confidence intervals)
- **R (stats package)** — the gold standard for statistical computing; statsmodels brings similar functionality to the Python ecosystem
- **SciPy (scipy.stats)** — provides individual tests; statsmodels offers full model estimation and diagnostics
- **linearmodels** — extends statsmodels with panel data and IV models; statsmodels covers the broader foundation

## FAQ
**Q: When should I use statsmodels instead of scikit-learn?**
A: Use statsmodels when you need to understand relationships (coefficients, significance, confidence intervals) rather than just predict outcomes.

**Q: Does statsmodels support regularized regression?**
A: Yes. OLS and GLM classes support elastic net regularization via fit_regularized(), though scikit-learn may be more convenient for pure prediction tasks.

**Q: Can I use statsmodels for time-series forecasting?**
A: Yes. ARIMA, SARIMAX, and state-space models are well-implemented with automatic parameter selection helpers.

**Q: How does the formula API work?**
A: Use patsy-style formulas like "y ~ x1 + x2 + x1:x2" to specify models declaratively from a DataFrame, similar to R.

## Sources
- https://github.com/statsmodels/statsmodels
- https://www.statsmodels.org

---
Source: https://tokrepo.com/en/workflows/asset-c95fc338
Author: Script Depot