Prompt Injections Data Set
The Prompt Injections dataset contains examples of prompt injections designed to manipulate or bypass LLMs. It includes various techniques, such as prompt leaking, jailbreaking, and switching, in multiple languages.
Over 1000 text examples, multilingual (7 languages), CSV file or similar
Apache 2.0
Description
This dataset brings together more than 1000 examples of prompt injections in several languages (English, French, German, Spanish, Italian, Portuguese, Romanian) in several languages. These examples illustrate techniques for bypassing and manipulating language models, making it possible to better understand and counter these attacks.
What is this dataset for?
- Improving the robustness of LLMs in the face of malicious injections
- Train models to detect and neutralize prompt injections
- Study the different methods of attacking language models
Can it be enriched or improved?
Yes, this corpus can be supplemented by recent examples or examples specific to certain contexts of use. An additional annotation on the nature of the attacks can also improve its value.
🔎 In summary
🧠 Recommended for
- AI security researchers
- LLM developers
- NLP analysts
🔧 Compatible tools
- Hugging Face
- PyTorch
- TensorFlow
- Jupyter notebooks
💡 Tip
Treat this data carefully, avoiding its malicious use, to reinforce the security of the systems.
Frequently Asked Questions
What injection techniques are covered by this dataset?
Prompt leaking, jailbreaking, switching mode, and other LLM bypass methods.
Is this dataset only in English?
No, it is multilingual with 7 languages including French, English, English, German, Spanish, Italian, Portuguese and Romanian.
Can this dataset be used to train a business model?
Yes, the Apache 2.0 license allows commercial use under conditions.




