In this data story, we explore the abstracts of papers related to large language models (LLMs) published on arxiv.org since its inception.
The metadata is sourced from Cornell's Kaggle arxiv dataset which lists almost 1 million papers.
We extracted all Computation and Language (cs.CL
) papers whose title mentions "LLM".
We then used ChatGPT to cluster the titles and identify topics and sub-topics automatically.
This visual explores which topic was mentioned where in the document.