Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can anyone share what a training dataset would look like for something like this? What are some use cases?


It is downloading a dataset (TinyStories) from huggingface[1]. Here you can drill down deeper into the structure and content of the source data.

[1] https://huggingface.co/datasets/roneneldan/TinyStories


Karpathy's nanoGPT has a full training pipeline using Shakespeare. [1]

The use case for this is learning in simple example.

[1] https://github.com/karpathy/nanoGPT




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: