Cybersecurity Heimdall v1.1
Structured textual data set to train aligned and secure models in the field of defensive cybersecurity.
Description
Cybersecurity Heimdall v1.1 is an instructional training dataset dedicated to defensive cybersecurity. It contains over 21,000 realistic dialogues (triples). System
/ User
/ helper
), built from more than 100,000 public technical sources. Each exchange is designed to follow security standards such as OWASP, NIST CSF, or MITRE ATT&CK, while integrating explicit denials for malicious requests.
What is this dataset for?
- Train specialized language models in defensive cybersecurity
- Improving the ethical alignment of LLMs on sensitive technical issues
- Serve as a benchmark in QA, classification or synthesis tasks in computer security
Can it be enriched or improved?
Yes. It is possible to add scenarios linked to regional standards (RGPD, ISO 27001), multilingual translations or additional annotations (risk level, type of attack). The triplet structure allows easy customization, adapted to supervised fine-tuning.
🔎 In summary
🧠 Recommended for
- Cybersecurity researchers
- AI security engineers
- Cybersecurity Agent creators
🔧 Compatible tools
- Hugging Face Transformers
- TRL
- QLora
- DeepSpeed
- LangChain
💡 Tip
Use system fields to inject ethical constraints and reinforce the automatic refusal of offensive prompts.
Frequently Asked Questions
Does this dataset include examples of red teaming?
No, it focuses on defensive approaches. Offensive tactics are not present in order to maintain a secure and ethical framework.
Can this dataset be used in a professional setting?
Yes, the Apache 2.0 license allows commercial or industrial use, provided you meet the license conditions.
Is it multilingual?
No, it's mostly in English. However, it can be enriched with translations for multilingual cybersecurity projects.