OpenAI HumanEval
OpenAI HumanEval is an evaluation dataset dedicated to code generation in Python. It contains 164 problems with function signature, explanatory docstring, canonical solution, and unit tests. This dataset was created manually to ensure that it is not in the model training corpora, thus allowing for reliable evaluation.
Description
The dataset OpenAI HumanEval includes 164 Python programming problems. Each example contains the signature of a function, a docstring describing the expected behavior, the body of the canonical solution, and unit tests to validate the generated code. This dataset is designed to assess the ability of models to generate correct and functional code.
What is this dataset for?
- Evaluate the quality of models for automatically generating Python code.
- Serve as a basis for fine-tuning specialized programming models.
- Test the robustness of models in understanding and producing complex functions.
Can it be enriched or improved?
Yes, it is possible to add new issues or extend unit tests to cover more cases. You can also diversify languages or increase the complexity of tasks for more advanced training.
🔎 In summary
🧠 Recommended for
- NLP/code researchers
- AI Developers
- Programming Educators
🔧 Compatible tools
- Classic ML frameworks
- Python environment
- Jupyter notebooks
💡 Tip
Always execute generated code in a secure environment to avoid the risks associated with the execution of arbitrary code.
Frequently Asked Questions
What is the main particularity of the HumanEval dataset?
It contains programming problems that are manually designed not to appear in the training data, thus ensuring a fair evaluation of code generation models.
How many examples does this dataset contain?
It includes 164 examples of Python programming problems with unit tests.
Is it possible to add your own problems to HumanEval?
Yes, the dataset can be enriched with new problems or tests, which makes it possible to adapt the difficulty and diversity of the tasks.