Titanium 2.1 — DevOps Dataset and LLM Model Architecture

Dataset of technical prompts oriented towards DevOps, cloud computing, shell scripting and software architecture for language models.

Download dataset

Size

31,700 prompt/response pairs, in JSON format

Licence

Apache 2.0

Description

‍

Titanium 2.1 — DeepSeek R1 is a corpus of 31,700 synthetic prompts focused on complex software architecture, DevOps, and cloud scenarios. The answers are generated automatically by the DeepSeek R1 model, simulating concrete cases of the software life cycle: design, infrastructure scripts, multi-cloud management (Azure, AWS, GCP), Terraform and much more.

‍

What is this dataset for?

‍

Evaluate the performance of LLMs on concrete DevOps tasks
Train specialized models in cloud infrastructure and automation
Test technical reasoning skills in software architecture

‍

Can it be enriched or improved?

‍

Yes, you can add human annotations on the quality or correctness of responses, include variant prompts with additional constraints, or create multi-stage scenarios to simulate a complete DevOps pipeline.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐✩✩ (Medium – requires manual filtering and evaluation)
🧼 Need for cleaning	⭐✩✩✩✩ (High – unfiltered responses, variable quality)
🏷️ Annotation richness	⭐⭐✩✩✩ (Limited – no native quality annotations)
📜 Commercial license	✅ Yes (Apache 2.0)
👨‍💻 Beginner friendly	⚠️ No – advanced technical content
🔁 Fine-tuning ready	🎯 Excellent for specialized technical models
🌍 Cultural diversity	⚠️ Low – mostly Anglo-Saxon content