sQuad (Stanford Question Answering Dataset)

sQuAD (Stanford Question Answering Dataset) is a reference text dataset for training and evaluating natural language comprehension models. It combines excerpts from Wikipedia with specific questions, the answers to which are directly present in the passages provided.

Download dataset

Size

Over 100,000 question and answer pairs, in JSON format

Licence

Free for academic research. Commercial use may require an audit of the conditions of use

Description

‍
The sQuad dataset includes:

Over 100,000 question and answer pairs (version 1.1)
Text passages from Wikipedia pages
Human annotations where the answers are continuous snippets of the text (span-based)
An easily usable format structured in JSON for supervised training

‍

What is this dataset for?

‍
sQuAD is widely used for:

Training question-answer models in NLP
Evaluating the performance of models on natural language comprehension tasks
The fine-tuning of large language models for practical applications (voice assistants, conversational bots, search engines)
Experimentation on methods for extracting, reformulating or synthesizing responses

‍

Can it be enriched or improved?

‍
Yes, sQuAD can be enriched by:

The addition of more complex questions (multiple, implicit, or reformulated answers)
The introduction of content from sources other than Wikipedia for better generalization
Evaluation on derived tasks: long answers, open generation, or justified response
Translation and adaptation for multilingual or specialized versions (medical, legal...)

‍

Tools like Haystack, Hugging Face Transformers, or LangChain are commonly used to exploit or extend sQuad in modern NLP pipelines.

‍

🔗 Source: SquAD Dataset

‍

Frequently Asked Questions

What is the difference between sQuad 1.1 and 2.0?

sQuAD 1.1 only contains questions whose answers are always present in the text. sQuad 2.0 adds unanswered questions to test the ability of models to recognize the absence of relevant information.

‍

Can sQuad be used for free generation models like GPT?

Yes. Although originally designed for extraction, sQuAD can be adapted for training or evaluating generative models using context as a prompt and response as a target.

Are there multilingual alternatives to sQuad?

Yes, several datasets are inspired by it, such as xQuad, MLQA or TyDi QA, which offer multilingual versions or adapted to specific languages.

Similar datasets

CelebA

NIH Chest X-rays

ImageNet