CelebA

CelebA (CelebFaces Attributes Dataset) is an iconic Computer Vision dataset, centered on human faces. It is widely used in the fields of facial recognition, image generation, and facial attribute analysis, thanks to the richness of its annotations.

Download dataset

Size

Over 200,000 face images in JPEG format, annotations in TXT files

Licence

Free for academic use under specific conditions of the CelEBA license

Description

‍
The CelEBA dataset includes:

202,599 JPEG images of celebrity faces
40 annotated attributes per image
5 landmarks per face for facial alignment
Binary segmentation masks in the Celebamask-HQ version

‍

CelebA is recognized for the diversity of faces represented, in terms of traits, ages and accessories, making it a resource of choice for training robust and generalizable models.

‍

What is this dataset for?

‍
CelebA is commonly used for:

Training facial recognition models
Analysis and classification of facial attributes
Training GaNS (Generative Adversarial Networks) for the generation of synthetic face images
The evaluation of detection models or modification of attributes (add a smile, remove glasses, etc.)

‍

Can it be enriched or improved?

‍
Yes, CelEBA can be improved in a number of ways:

By adding new attributes specific to certain populations or cultural expressions
By combining with other face datasets to improve demographic diversity
By refining segmentation masks for more precise processing tasks
By integrating CelEBA into multimodal pipelines (voice + image, text + image) for wider applications

‍

🔗 Source: CelEBA Dataset

‍

Frequently Asked Questions

Can I use CelEBA to test face generation models?

Yes, CelEBA is ideal for that. It is used as a reference for training or testing GaNS, due to the quality and variety of faces.

How to manage the biases present in this dataset?

CelebA has been criticized for an unbalanced representation of certain ethnic origins or genders. To limit bias, it is recommended to supplement it with other more representative data sets or to adjust the weights during training.

Is there a version with segmentation masks?

Yes, the Celebamask-HQ version includes high-quality segmentation annotations to train models on fine facial segmentation tasks.

Similar datasets

MNIST

LUNA16

TCIA Dataset (The Cancer Imaging Archive)