By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Open Datasets
WebClick — Multimodal benchmark for web browsing
Multimodal

WebClick — Multimodal benchmark for web browsing

WebClick is a multimodal benchmark dataset designed to assess the ability of models and agents to understand and navigate web interfaces. It contains screenshots annotated with natural language instructions and specific click areas.

Download dataset
Size

1,639 PNG/JPEG images, text instructions, bounding box coordinates in JSON

Licence

Apache 2.0

Description

The dataset WebClick contains 1,639 screenshots of annotated websites with natural language instructions and precise bounding boxes. This data comes from real tasks of human agents and users, covering web browsing, online shopping, and calendar management.

What is this dataset for?

  • Evaluate the understanding of user interfaces by multimodal models
  • Test the ability to accurately locate clicks in response to natural language instructions
  • Develop and benchmark intelligent agents for automated web browsing

Can it be enriched or improved?

This dataset can be enriched with additional annotations, such as complex interactive elements or multi-page contexts. The integration of data from other web environments would improve the robustness of the models.

🔎 In summary

Criterion Evaluation
🧩 Ease of use⭐⭐⭐⭐⭐ (Good, JSON format and images easy to use)
🧼 Need for cleaning⭐⭐⭐⭐⭐ (Minimal, precise and rigorous annotations)
🏷️ Annotation richness⭐⭐⭐⭐⭐ (Excellent, including natural language instructions and exact bounding boxes)
📜 Commercial license✅ Yes (Apache 2.0)
👨‍💻 Beginner friendly🌟 Yes, well-documented and structured dataset
🔁 Fine-tuning ready🎯 Perfect for training multimodal UI/language models
🌍 Cultural diversity⚠️ Primarily English, wide variety of websites

🧠 Recommended for

  • Multimodal AI researchers
  • Web agent developers
  • R&D, UX and automated navigation teams

🔧 Compatible tools

  • PyTorch
  • TensorFlow
  • Hugging Face
  • Visual annotation tools

💡 Tip

Use advanced spatial grounding techniques to maximize click location accuracy.

Frequently Asked Questions

What data is provided in WebClick?

Website screenshots, natural language instructions, and precise coordinates of bounding boxes.

Is this dataset suitable for creating intelligent agents for web browsing?

Yes, it allows you to train and evaluate agents who are able to understand instructions and interact with web interfaces.

What are the usage scenarios covered by WebClick?

Agent-assisted browsing, online shopping, calendar management, and other complex web interactions.

Similar datasets

See more
Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.

Category

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique.