By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
UCI Machine Learning Repository
Text

UCI Machine Learning Repository

The UCI Machine Learning Repository is one of the most iconic resources for the machine learning community. Created at the University of California, Irvine, it brings together hundreds of public datasets used for experimenting, teaching, and benchmarking machine learning algorithms.

Download dataset
Size

Several hundreds of datasets, of various sizes, in CSV, ARFF and other formats

Licence

Free for academic use. Verification recommended for commercial uses according to data sets

Description


The UCI repository includes:

  • Several hundreds of datasets classified by type of task (classification, regression, clustering)
  • Various formats: CSV, ARFF, TXT, etc.
  • Metadata associated with each data set (source, description, type of variables...)
  • A simple interface to explore, download, and use files directly

What is this repository for?


It is used for:

  • Experimenting and testing machine learning models
  • Validating tabular data processing pipelines
  • Training supervised models on concrete cases (classification, regression)
  • Teaching data science and machine learning algorithms

Can it be enriched or improved?


Yes, this resource can be enriched:

  • By offering cleaned or pre-processed versions of the most popular datasets
  • By annotating certain datasets with secondary tasks (for example, anomaly detection)
  • By cross-referencing UCI datasets with real sources for hybrid use cases
  • By creating explanatory notebooks or standardized benchmarks on the most used games

🔗 Source: UCI Machine Learning Repository

Frequently Asked Questions

Is the repository still relevant despite the emergence of more modern sources?

Yes, it remains a reference for learning, rapid validation of algorithms and educational projects. Its diversity and simplicity make it an ideal starting point.

Can these datasets be used in production?

Not directly. Most are small in size and intended for experimentation or teaching. For projects in production, it is recommended to use more representative data.

Are there newer alternatives?

Yes, platforms like Kaggle Datasets, OpenML, or Hugging Face Datasets offer modern datasets that are often larger or annotated for specific tasks.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.