Our selection of the best datasets to develop artificial intelligences for agriculture


Modern agriculture is undergoing a major transformation thanks to the integration of technologies such as artificial intelligence and computer vision. At the heart of this revolution are agricultural datasets, essential for training models and developing innovative solutions. Here is a selection of the 15 best datasets that can contribute to the advancement of research and technological application in agriculture.
Dataset types in agriculture
1. Images of Crops and Diseases
These datasets contain images of healthy and diseased plants, allowing computer vision models to detect and diagnose crop diseases. For example, the PlantVillage Dataset offers a vast collection of images for various cultures and diseases.
2. Soil Data
Including information on soil composition, texture, and fertility, these datasets help optimize fertilizer use and improve land management.
3. Meteorological data
Historical and real-time climate data is important because it can be used to forecast crop yields and plan agricultural activities.
4. Satellite imagery
Les satellite images provide an overview of fields, helping to monitor crop health, plant cover, and anomaly detection.
5. Performance Data
This data makes it possible to analyze crop productivity based on various factors such as agricultural practices, soil conditions, and climate.
Our top 15 of the best datasets for agriculture
1. PlantVillage Dataset
Description : One of the largest datasets of images of healthy and diseased plant leaves, covering 38 classes of diseases on 14 different crops.
Use : Detection and classification of plant diseases using computer vision.
Link : 🔗 PlantVillage
2. Agriculture-Vision Dataset
Description : Includes over 94,000 annotated aerial images of agricultural fields with various anomalies such as weeds, dry areas, and insect damage.
Use : Detection of anomalies in crops to improve agricultural surveillance.
Link : 🔗 Agriculture-Vision
3. Open Images Dataset for Agriculture
Description : A subset of the Open Images Dataset specially annotated for agricultural objects, including machines, farm animals, and crops.
Use : Recognition and 🔗 object detection specific to agriculture in the images.
Link : 🔗 Open Images
4. Sentinel-2 Satellite Imagery
Description : High-resolution multispectral satellite imagery dataset provided by the European Space Agency, covering the entire globe.
Use : Crop monitoring, plant health analysis, creation of a map for thematic soil and crop mapping.
Link : 🔗 ESA Sentinel-2
5. Soil Moisture Active Passive (SMAP) Dataset
Description : Satellite data providing accurate measurements of soil moisture on a global scale.
Use : Irrigation management, drought forecasting, climate modeling.
Link : 🔗 NASA SMAP
6. FAO Statistical Database (FAOSTAT)
Description : FAO's global statistical database on agriculture, including production, trade, prices and land use.
Use : Analysis of global agricultural trends, economic research, and assessment of the impact of the common agricultural policy on agricultural practices and land use.
Link : 🔗 FASTAT
7. USDA National Agricultural Statistics Service (NASS)
Description : Detailed data on agriculture in the United States, including yields, farming practices, and economic statistics. USDA NASS provides critical services by collecting and disseminating this agricultural data.
Use : Market research, agricultural planning, policy analysis.
Link : 🔗 USDA NASS
8. DeepWeeds Dataset
Description : A dataset of over 17,000 annotated images of weeds common in Australian agricultural environments.
Use : Development of automatic weed detection systems.
Link : 🔗 DeepWeeds
9. European Soil Database (ESDB)
Description : A detailed database on the characteristics of European soils, including texture, chemical composition, and physical properties.
Use : Sustainable soil management, land use planning.
Link : 🔗 ESDAC
10. Radiant MLhub Agriculture Datasets
Description : A platform offering open datasets for machine learning in Earth observation, focused on agriculture. Radiant MLhub offers open data for machine learning applications, especially in agriculture.
Use : Crop classification, change detection, plant health analysis.
Link : 🔗 Radiant MLhub
11. Crop Yield Prediction Dataset (Kaggle)
Description : Data including historical crop yields, weather conditions, and soil characteristics.
Use : Modeling and prediction of crop yields.
Link : 🔗 Kaggle Crop Yield Prediction
12. Global Food Prices Database
Description : Data on global food prices, collected by the World Food Programme. The dataset includes data on the share of various food items in global food prices.
Use : Economic analysis, food security study.
Link : 🔗 WFP Data
13. CropDeep Dataset
Description : A collection of images for crop recognition and disease detection, covering several plant species.
Use : Classification of crops, diagnosis of diseases.
Link : 🔗 CropDeep
14. CGIAR Big Data Platform
Description : Open datasets on agriculture in developing countries, covering crops, climate, soil, and more.
Use : Research in sustainable agriculture, adaptation to climate change.
Link : 🔗 CGIAR Platform
15. UCI Machine Learning Repository - Mushroom Dataset
Description : Detailed data on the physical characteristics of different species of fungi.
Use : Classification of species, toxicological studies.
Link : 🔗 UCI Mushroom Dataset
Where can I find datasets for agriculture?
Finding the right dataset for your agricultural projects can be a challenge. Here are some sources and platforms where you can access a variety of agricultural datasets:
Specialized platforms
- Kaggle: An online platform that hosts a multitude of public datasets, including those related to agriculture. Here you can find data for computer vision, predictive analytics, and more.
Link: 🔗 Kaggle datasets
- Radiant MLhub: Provides open datasets for machine learning in Earth observation, with a focus on agriculture and environmental sustainability.
Link: 🔗 Radiant MLhub
Research institutions and universities
- CGIAR: The International Consortium for Agricultural Research offers an open data platform that facilitates access to a multitude of agricultural datasets, especially in developing countries.
Link: 🔗 CGIAR Big Data Platform
- INRAE: The National Research Institute for Agriculture, Food and the Environment in France publishes data and resources for agricultural research.
Link: 🔗 INRAE Data
Governmental and international organizations
- FAO: The United Nations Food and Agriculture Organization provides a variety of global statistical datasets on agriculture through its FAOSTAT database.
Link: 🔗 FASTAT
- USDA: The U.S. Department of Agriculture offers detailed data sets on American agriculture that are useful for comparative analysis or market research.
Link: 🔗 USDA NASS
Online communities and social networks
- GitHub: Many researchers and developers share their datasets and source code on GitHub, which can be a great resource for finding specific data.
Link: 🔗 GitHub
- Specialized forums: Platforms like Stack Overflow, Reddit, or LinkedIn groups can be useful for asking for dataset recommendations or sharing resources with the community.
Conclusion
Access to high-quality datasets is key to stimulating innovation in agriculture. Agricultural datasets that we have presented to you in this article offer valuable resources for the research and development of technological solutions aimed at improving the productivity, sustainability and efficiency of the agricultural sector. By exploiting this data, agri-food leaders and innovative startups are helping to address global challenges related to food security and climate change.