smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Features

🚀 High-performance data processing powered by DuckDB
🌍 Scalable to handle PB-scale datasets
🛠️ Easy operations with no long-running services

Installation

Python 3.8 to 3.12 is supported.

pip install smallpond

Quick Start

# Download example data
wget https://duckdb.org/data/prices.parquet

import smallpond

# Initialize session
sp = smallpond.init()

# Load data
df = sp.read_parquet("prices.parquet")

# Process data
df = df.repartition(3, hash_by="ticker")
df = sp.partial_sql("SELECT ticker, min(price), max(price) FROM {0} GROUP BY ticker", df)

# Save results
df.write_parquet("output/")
# Show results
print(df.to_pandas())

Documentation

For detailed guides and API reference:

Performance

We evaluated smallpond using the GraySort benchmark (script) on a cluster comprising 50 compute nodes and 25 storage nodes running 3FS. The benchmark sorted 110.5TiB of data in 30 minutes and 14 seconds, achieving an average throughput of 3.66TiB/min.

Details can be found in 3FS - Gray Sort.

Development

pip install .[dev]

# run unit tests
pytest -v tests/test*.py

# build documentation
pip install .[docs]
cd docs
make html
python -m http.server --directory build/html

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
smallpond		smallpond
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smallpond

Features

Installation

Quick Start

Documentation

Performance

Development

License

About

Releases

Packages

Contributors 2

Languages

License

deepseek-ai/smallpond

Folders and files

Latest commit

History

Repository files navigation

smallpond

Features

Installation

Quick Start

Documentation

Performance

Development

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages