`duckreg` : very fast out-of-memory regressions with `duckdb`

python package to run stratified/saturated regressions out-of-memory with duckdb. The package is a wrapper around the duckdb package and provides a simple interface to run regressions on very large datasets that do not fit in memory by reducing the data to a set of summary statistics and runs weighted least squares with frequency weights. Robust standard errors are computed from sufficient statistics, while clustered standard errors are computed using the cluster bootstrap.

See examples in notebooks/introduction.ipynb.

install (preferably in a venv) with

(uv) pip install git+https://github.com/apoorvalal/duckreg.git

or git clone this repository and install in editable mode.

Currently supports the following regression specifications:

DuckRegression: general linear regression, which compresses the data to y averages stratified by all unique values of the x variables
DuckMundlak: Mundlak regression, which compresses the data to y averages stratified by $1, w, \bar{w}{i, .}, \bar{w}{., t}$ where $w$ is a covariate (typically treatment)
DuckDoubleDemeaning: Double demeaning regression, which compresses the data to y averages by all values of $w$ after demeaning by $\bar{w}{i, .}, \bar{w}{., t}, \bar{w}$ .

references:

methods:

libraries:

Grant McDermott's duckdb lecture

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
duckreg		duckreg
notebooks		notebooks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`duckreg` : very fast out-of-memory regressions with `duckdb`

About

Uh oh!

Releases

Packages

Languages

License

s3alfisc/duckreg

Folders and files

Latest commit

History

Repository files navigation

duckreg : very fast out-of-memory regressions with duckdb

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`duckreg` : very fast out-of-memory regressions with `duckdb`

Packages