Rust: Java Test - Code comparison dataset

A comparative dataset between Rust and Java languages, useful for training or testing models for generating, compiling, or translating code.

Download dataset

Size

68,167 lines (181 MB), text or parquet format

Licence

MIT

Description

‍

Rust—Java Test is a dataset containing over 68,000 rows representing tests, snippets, or code pairs in Rust and Java. It is suitable for code processing tasks, cross-evaluation between languages, or automatic generation using specialized LLM models.

‍

What is this dataset for?

‍

Train translation or code generation models between Rust and Java
Evaluate compilation performance, security, or readability on two distinct languages
Testing automation pipelines in programming

‍

Can it be enriched or improved?

‍

Yes. This dataset can be enriched with other languages or metadata: compilation time, typical errors, development context, etc. It can also be annotated manually (quality, performance, readability) for more advanced uses.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐⭐⭐ (Easy to load in a notebook or IDE)
🧼 Need for cleaning	⭐⭐⭐⭐✩ (Low – may require syntax normalization)
🏷️ Annotation richness	⭐⭐✩✩✩ (Limited – no technical meta-info provided by default)
📜 Commercial license	✅ Yes (MIT)
👨‍💻 Beginner friendly	⚠️ Moderate – requires programming knowledge
🔁 Fine-tuning ready	🎯 Useful for code-generating LLMs
🌍 Cultural diversity	⚠️ Neutral – code-focused, no identified cultural bias