Skip to content

allenai/dolma3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Dolma 3

Dolma 3 consists of three datasets constructed for the OLMo 3 family of models: Dolma 3 Mix, a diverse 5.9T-token pre-training dataset, Dolma 3 Dolmino Mix, a 100B-token mid-training dataset targeting performance improvements in math, code, QA, instruction and thinking, and Dolma 3 Longmino Mix, 50B tokens of long context data. This repository contains descriptions and code necessary for reconstructing the Dolma 3 datasets.

For further details, please refer to the OLMo 3 paper and the OLMo 3 website.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 8