By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
Circa - Interpreting indirect answers in conversation
Text

Circa - Interpreting indirect answers in conversation

The <strong>Circa</strong> dataset contains dialogues in English that focus on polar questions (yes/no) and their indirect answers. Exchanges are extracted from 10 distinct social situations and annotated by several annotators to interpret the indirect response.

Download dataset
Size

Several thousand question-answer pairs, JSON format

Licence

CC-BY 4.0

Description

Circa is a linguistic corpus that helps to understand how to interpret indirect answers to closed-ended questions in various social contexts. Each example combines a polar question asked by one person (X) and an indirect answer given by another (Y), with multiple annotations indicating the likely interpretation.

What is this dataset for?

  • Train NLP models to detect the implicit in indirect responses
  • Studying conversational interactions in a social context
  • Improving the understanding of virtual assistants in the face of non-explicit answers

Can it be enriched or improved?

Yes, the dataset can be extended by adding other social contexts, languages, or finer annotations on tone or emotion. Multilingual versions would also be beneficial.

🔎 In summary

Criterion Evaluation
🧩 Ease of use⭐⭐⭐⭐⭐ (Simple JSON data, easy to manipulate)
🧼 Need for cleaning⭐⭐⭐⭐⭐ (Clean data, little preprocessing needed)
🏷️ Annotation richness⭐⭐⭐⭐✩ (Multi-criteria annotations on interpretation)
📜 Commercial license✅ Yes (CC-BY 4.0)
👨‍💻 Beginner friendly⚠️ Moderate – understanding of conversational context needed
🔁 Fine-tuning ready🎯 Suitable for training fine conversational models
🌍 Cultural diversity⚠️ Limited to English, Western social contexts

🧠 Recommended for

  • Conversational NLP researchers
  • Virtual assistant developers
  • Computational linguists

🔧 Compatible tools

  • Hugging Face
  • PyTorch
  • TensorFlow
  • SpacY

💡 Tip

Use multiple annotations to better calibrate the confidence of interpretations in the models.

Frequently Asked Questions

What type of questions does this dataset contain?

It mainly contains closed-ended questions (yes/no) asked in a variety of social situations.

How are indirect responses annotated?

Each answer is annotated by five annotators, with a majority to determine the primary interpretation.

Can the dataset be used for languages other than English?

Currently no, but it can be extended or adapted for other languages and social contexts.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.