Prompt Injections Data Set

The Prompt Injections dataset contains examples of prompt injections designed to manipulate or bypass LLMs. It includes various techniques, such as prompt leaking, jailbreaking, and switching, in multiple languages.

Download dataset

Size

Over 1000 text examples, multilingual (7 languages), CSV file or similar

Licence

Apache 2.0

Description

‍

This dataset brings together more than 1000 examples of prompt injections in several languages (English, French, German, Spanish, Italian, Portuguese, Romanian) in several languages. These examples illustrate techniques for bypassing and manipulating language models, making it possible to better understand and counter these attacks.

‍

What is this dataset for?

‍

Improving the robustness of LLMs in the face of malicious injections
Train models to detect and neutralize prompt injections
Study the different methods of attacking language models

‍

Can it be enriched or improved?

‍

Yes, this corpus can be supplemented by recent examples or examples specific to certain contexts of use. An additional annotation on the nature of the attacks can also improve its value.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐⭐⭐ (Simple, clear format and text-only)
🧼 Need for cleaning	⭐⭐⭐⭐⭐ (Very low – ready-to-use data)
🏷️ Annotation richness	⭐⭐✩✩✩ (Basic – examples without complex annotation)
📜 Commercial license	✅ Yes (Apache 2.0)
👨‍💻 Beginner friendly	✅ Yes, accessible for researchers and developers
🔁 Fine-tuning ready	🛡️ Useful for fine-tuning in model safety and control
🌍 Cultural diversity	⚡ Multilingual – 7 languages represented