Releases: allenai/dolma
Releases · allenai/dolma
v0.9.2
What's Changed
- Remove unnecessary spawn in tokenizer, fix config with multiple paths by @soldni in #67
- Add tagger_modules option to tagger cli by @peterbjorgensen in #69
- feature to get the compliment of a hash sample by @IanMagnusson in #72
- Fix Hardcoded Tokenizer by @soldni in #71
- Fix a few issues of the FixedBucketsValTracker by @peterbjorgensen in #73
- Add attribute correlations by @Muennighoff in #68
- Porting missing code filtering rules to dolma repo by @soldni in #86
- Disable cache in CI to prevent build failures by @soldni in #90
- Reddit processing code by @drschwenk in #74
- update readme by @kyleclo in #95
- code/reasoning evaluation script by @benbogin in #94
- Add The Stack statistics by @Muennighoff in #92
- Fixing Build Config Issues by @soldni in #99
New Contributors
- @peterbjorgensen made their first contribution in #69
- @IanMagnusson made their first contribution in #72
- @drschwenk made their first contribution in #74
- @benbogin made their first contribution in #94
Full Changelog: v0.9.1...v0.9.2
v0.9.1
What's Changed
- Fix Jekyll Docs Build by @soldni in #55
- Adding Citation text back to README by @soldni in #56
- Bump rustix from 0.37.20 to 0.37.25 by @dependabot in #59
- Documentation on BaseParallelProcessor by @soldni in #62
- Add download instruction by @Muennighoff in #63
- Fix spawn method for multiprocessing by @soldni in #64
- Fix hardcoded URL by @soldni in #65
- Fix Accidental Override of Boolean Value by @soldni in #66
New Contributors
- @Muennighoff made their first contribution in #63
Full Changelog: v0.9.0...v0.9.1
v0.9.0
What's Changed
- Skipping AWS checks when aws access key is not available by @soldni in #28
- env variable is not passed to tests by @soldni in #29
- Fix make by @chris-ha458 in #24
- Fix
make
more by @chris-ha458 in #31 - ff by @soldni in #36
- Adding C4 example, dryrun mode, profiling taggers by @soldni in #37
- Only run Python style checks on source and tests by @soldni in #38
- fix rust parts by @chris-ha458 in #23
- Add rust unit tests by @chris-ha458 in #35
- Bump webpki from 0.22.0 to 0.22.2 by @dependabot in #52
- Adding Tokenizer, Writing Documentation, Misc Bugs & CLI improvements by @soldni in #54
New Contributors
- @chris-ha458 made their first contribution in #24
- @dependabot made their first contribution in #52
Full Changelog: v0.8.0...v0.9.0
v0.8.0
v0.7.0
v0.6.5
v0.6.4
v0.6.3
What's Changed
- Mixer can use s3 or local paths by @rodneykinney in #14
New Contributors
- @rodneykinney made their first contribution in #14
Full Changelog: v0.6.2...v0.6.3
v0.6.2
What's Changed
- Add Dirk as an author by @dirkgr in #2
- README, tests by @kyleclo in #1
- ff main by @soldni in #4
- Kylel/test taggers by @kyleclo in #5
- ff soldni/cli branch by @soldni in #7
- add tests for data types by @kyleclo in #8
- ff by @soldni in #9
- CLI for dolma by @soldni in #6
- Readme and instructions by @soldni in #11
- Update README.md by @ianand in #12
- Fixing Build Issues by @soldni in #13
New Contributors
- @dirkgr made their first contribution in #2
- @kyleclo made their first contribution in #1
- @soldni made their first contribution in #4
- @ianand made their first contribution in #12
Full Changelog: https://github.com/allenai/dolma/commits/v0.6.2