LocalLLaMA
noneabove1182
•
12mo ago
•
94%
HUGE dataset released for open source use
together.ai30T tokens, 20.5T in English, allegedly high quality, can't wait to see people start putting it to use!
Related github: https://github.com/togethercomputer/RedPajama-Data
Comments 4