Circa - Interpreting indirect answers in conversation

The <strong>Circa</strong> dataset contains dialogues in English that focus on polar questions (yes/no) and their indirect answers. Exchanges are extracted from 10 distinct social situations and annotated by several annotators to interpret the indirect response.

Download dataset

Size

Several thousand question-answer pairs, JSON format

Licence

CC-BY 4.0

Description

‍

Circa is a linguistic corpus that helps to understand how to interpret indirect answers to closed-ended questions in various social contexts. Each example combines a polar question asked by one person (X) and an indirect answer given by another (Y), with multiple annotations indicating the likely interpretation.

‍

What is this dataset for?

‍

Train NLP models to detect the implicit in indirect responses
Studying conversational interactions in a social context
Improving the understanding of virtual assistants in the face of non-explicit answers

‍

Can it be enriched or improved?

‍

Yes, the dataset can be extended by adding other social contexts, languages, or finer annotations on tone or emotion. Multilingual versions would also be beneficial.

‍

🔎 In summary

Criterion	Evaluation
🧩 Ease of use	⭐⭐⭐⭐⭐ (Simple JSON data, easy to manipulate)
🧼 Need for cleaning	⭐⭐⭐⭐⭐ (Clean data, little preprocessing needed)
🏷️ Annotation richness	⭐⭐⭐⭐✩ (Multi-criteria annotations on interpretation)
📜 Commercial license	✅ Yes (CC-BY 4.0)
👨‍💻 Beginner friendly	⚠️ Moderate – understanding of conversational context needed
🔁 Fine-tuning ready	🎯 Suitable for training fine conversational models
🌍 Cultural diversity	⚠️ Limited to English, Western social contexts