LLM Benchmark Dataset

A project to make LiveBench’s Dataset available on Kaggle.

LiveBench is a benchmark for large language models (LLMs) that prevents test contamination through monthly updates sourced from recent material. It ensures objective evaluation using verifiable answers and includes 18 tasks across six categories, with plans for continuous updates.

You can explore the website, the original paper, the code, and the datasets on livebench.ai. The data is made available by the authors through two main sources:

LiveBench Website: Offers data and older versions with filtering options for easier exploration.
Hugging Face: Provides detailed data for the latest version of the benchmark but does not include earlier evaluation versions.

While the website presents data in a visual and interactive format, it can be difficult to export the data for direct analysis. On the other hand, the data available on Hugging Face lacks information such as the model provider, earlier evaluations, and the most recent commits to the dataset available on the website.

This dataset aims to make the LiveBench time-series data available in the format presented on the website. To achieve this, it gathers and processes data from the website’s GitHub repository and the files used by the live version of the website, ensuring that the data is always up-to-date.

Materials

Dataset Code