Dataset for Computer Vision

Computer Vision is an area in the field of Artificial Intelligence that enables machines to interpret and understand visual information. As in case of any other AI application, Computer vision also requires huge amount of data to give accurate results. These datasets provide all the necessary training material for these algorithms.

A dataset that will well-prepared and maintained will allow the model to learn from examples, recognize pattern and then make predictions about the unseen data. Therefore, the quality of datasets matters a lot, as it impacts the performance and robustness of computer vision applications.

Types of Datasets in Computer Vision

The field of Computer Vision is vast and it can include various applications that make human life easier. To fulfill the different requirements of these applications, there can be various categories of datasets based on the type of visual data they contain.

  • Image Dataset: This dataset contains static images. These images are often labeled with annotations in such a way that the annotations acts as labels and help in supervised machine learning tasks. Annotations could be object boundaries, categories, or other relevant information. For example, face recognition, object detection, and scene understanding datasets.
  • Video Datasets: Videos are nothing but sequences of images(frames). These loads of images when played with speed give us sense of motion. Video datasets has variety of videos, which may be labeled or unlabeled. These datasets are essential for tasks like action recognition, video segmentation, and tracking.
  • 3D Datasets: Such datasets contain three-dimensional structure of objects or scenes. These include point clouds, 3D meshes, and volumetric data. They can be used in applications like 3D reconstruction and autonomous driving.
  • Synthetic Dataset: Synthetic data is not a data generated by human, in fact it is generated using computer graphics and simulation techniques. These datasets are valuable for training models in scenarios where collecting real-world data is challenging or impractical.

As mentioned earlier, data is heart of AI. It can be really difficult to find quality data for training your model. Especially for computer vision tasks, it might take you days or weeks to get a dataset that suits your purpose. Therefore, we have listed some of the popular computer vision datasets

Popular Computer Vision Datasets for Image Classification

ImageNet

Dataset link: https://www.image-net.org/update-mar-11-2021.php

It is one of the most popular datasets having more than 14 million images that are hand-annotated. These millions of images are categorized into thousands of classes. The images in this dataset are organised based on WordNet Hierarchy. Thousands of images depict each node of hierarchy. Object-level annotations provide a bounding box around the (visible part of the) indicated object.

CIFAR-10 and CIFAR-100

Dataset link: https://www.cs.toronto.edu/~kriz/cifar.html

CIFAR-10 dataset of the Goggle Images consists of 60,000 32×32 color images in 10 different classes with 6,000 images per class. The classes are items familiar to people such as airplane, car, bird, cats, and dogs. It is applied mainly for training machine learning and computer vision applications.

CIFAR-100 is like CIFAR-10, but the dataset contains 100 different classes each of which includes 600 images. The following 100 classes are classified into twenty superclasses. It gives more categories of data than CIFAR-10 so the data categorized here is detailed and different.

MNIST

Dataset link: https://git-disl.github.io/GTDLBench/datasets/mnist_datasets/

The MNIST dataset has 70000 colorless images that are each 28 pixels by 28 pixels and contain writing which ranges from 0 to 9. It is divided into 60000 training image and 10000 testing images. MNIST database is used as a standard database for any new machine learning algorithm andtechniques, particularly in the image classification applications.

Fashion MNIST

Dataset link: https://github.com/zalandoresearch/fashion-mnist

Fashion MNIST is a dataset of 70,000 28 pixel by 28 pixel grayscale images of ten types of clothing including: shirt, trouser, pullover, dress, coat, sandal, sneaker, bag, ankle boot, and shoe. It is envisaged to be used as a direct replacement for the original MNIST dataset, but due to the higher variability and resemblance of most of the fashion articles, it proves to be slightly more challenging for classification.

Popular Computer Vision Datasets for Object Detection

COCO (Common Objects in Context)

Dataset link: https://cocodataset.org/#home

This dataset released by Microsoft has 328k images. These images are annotated for tasks like object detection, segmentation, and image captioning. Its complex scenes and diverse object categories make it a standard benchmark for various computer vision tasks.

Pascal VOC

Dataset link: http://host.robots.ox.ac.uk/pascal/VOC/

Based on object detection and image segmentation, Pascal Visual Object Classes (VOC) is used as the dataset. It also contains of 10 object classes among which there are people and face, animals, vehicles and indoor objects. Pascal VOC also has annotations for object boundaries, object segmentation masks as well as objects’ classes.

Open Images Dataset

Dataset link: https://storage.googleapis.com/openimages/web/index.html

Open images are a large scale dataset obtained from Google and at least contains almost 9 million images that has image level annotation and bounding boxes of objects, object segmentation masks, visual relationships and located narratives. It has a large scope of object classes and is applied to numerous computer visions tasks, such as object detection, segmentation, and visual relations detection.

Popular Computer Vision Datasets for Image Segmentation

Cityscapes

Dataset link: https://www.cityscapes-dataset.com/

The Cityscapes dataset is entirely oriented towards the semantic analysis of the scene in the urban environment. It consists of five thousand higher resolution images and pixel-level tags as well as twenty thousand weak annotations. The images are captured in 50 different cities under various weather conditions, and there are ample data available for activities such as semantic segmentation, instance segmentation, and object detection in the urban setting.

ADE20K

Dataset link: https://groups.csail.mit.edu/vision/datasets/ADE20K/

ADE20K from the MIT Scene Parsing Benchmark is another dataset with more than 20,000 images, shared for scenes and objects. Every picture is provided with pixel-level density of objects and/or stuff. Some tasks that the dataset is used for include scene understanding, objects’ detection and instance segmentation.

CamVid

Dataset link: https://groups.csail.mit.edu/vision/datasets/ADE20K/

CamVid database is a driving video sequences labeled for each pixel in terms of the object class semantics. It has high image resolution with 701 color images labeled on the pixel level for thirty-two classes. This dataset is used for conducting research in autonomous driving and gives a real taste of semantic segmentation.

Popular Computer Vision Datasets for Face Recognition

LFW (Labeled Faces in the Wild)

Dataset link: https://vis-www.cs.umass.edu/lfw/

LFW is composed of 13,000 labelled face pairs which are obtained from the web. This is intended for the large scale face recognition with no restrictions as to pose, expression or illumination. The images contain the identity of the person, and there is a commonly used test set of protocols for judging the facial recognition rate.

CelebA

Dataset link: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

CelebA or CelebFaces Attributes primarily consists of over 200,000 celebrity images and 40 labels per image, including age, gender, and all the features on the face. The dataset has also marked several key features on the faces among them being forehead, right cheek, left cheek and the chin. CelebA is also used for the tasks like; face attribute recognition, face detection, and generative modeling.

Popular Computer Vision Datasets for Human Pose Estimation

MPII Human Pose Dataset

Dataset link: http://human-pose.mpi-inf.mpg.de/

It is comprised of about 25,000 images with over 40,000 annotated poses, which is dedicated to the human body pose estimations. The images focus on many aspects of human activity; as such, the joint annotations of the images are elaborate. The human pose estimation refers to the process of identifying the position of human joints and it is used in training and evaluation of algorithms.

COCO Keypoints

Dataset link: https://cocodataset.org/#keypoints-2020

This is as part of the larger dataset commonly known as the COCO dataset with annotations for keypoint detection. It contains more than 200K images as well as over 250K person instances that are annotated by 17 keypoints of each subject, including the limbs’ joints and facial features.

Popular Computer Vision Datasets for Autonomous Driving

KITTI

Dataset link: https://www.cvlibs.net/datasets/kitti/raw_data.php

KITTI is a dataset developed by Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago for autonomous driving. It is basically a collection of images, LiDAR scans, and other sensor data collected from driving scenarios. It supports tasks like stereo vision, optical flow, visual odometry, 3D object detection, tracking, and depth estimation.

ApolloScape

Dataset link: https://apolloscape.auto/

ApolloScape is another open dataset proposed for the field of autonomous driving. It provides very dense images and point clouds along with dense labels for 2D and 3D object detection, lane markings segmentation, and scene understanding. The dataset ensembles multiple urban settings and weather conditions.

Popular Computer Vision Datasets for Medical Imaging

ChestX-ray14

Dataset link: https://www.v7labs.com/open-datasets/chestx-ray14

The ChestXray14 dataset is obtained from seventy hospitals which includes 112,008 frontal view X-ray images of 30,000 patients. Every image has 14 disease label attributes that include pneumonia, emphysema, and fibrosis among others. The dataset is employed in the training and testing of disease diagnostics in medical images.

ISIC (International Skin Imaging Collaboration)

Dataset link: https://challenge.isic-archive.com/data/

ISIC is a large public database that includes more than a thousand dermoscopic images of skin lesions with annotations of different skin diseases such as melanoma. It is one of the contributions to the improvement of research in dermatoscopy automated image analysis for skin cancer; it has data for segmentation of lesion, classification of disease and analysis of skin conditions.

Kinetics-700

Dataset link: https://github.com/cvdfoundation/kinetics-dataset

There are 650,000 clips in this massive video dataset, which covers 700 different human motion types. The videos show both human-to-human and human-to-object interactions, such as embracing and playing instruments. At least seven hundred video clips are included in each action class, and each clip has an action class annotation that lasts for roughly ten seconds.

Cityscapes

Dataset link: https://www.cityscapes-dataset.com/

Cityscapes is a library that includes a wide range of stereo video clips taken in various street settings across fifty different locations. The pictures were taken over time in a range of weather and light circumstances. Cityscapes dataset includes semantic, instance-wise, and dense pixel annotations. They have it for 30 classes divided into 8 categories. It offers 20,000 coarsely annotated frames and 5000 frames with pixel-level annotations.

LabelMe-12–50k

Dataset link: https://www.ais.uni-bonn.de/download/datasets.html

This dataset has fifty thousand JPEG images with twelve classes (thirty thousand for testing and forty thousand for training). The pictures are taken out of LabelMe. Classes comprise things like people, cars, trees, and keyboards. The training and testing set contains 50% of photos with a centered object and 50% with a randomly selected section of an image (referred to as “clutter”). This dataset is suitable for object recognition.

Applications of Computer Vision Datasets

Datasets for Computer Visions can be used in various applications that uses AI to enhance it’s working and accuracy.

  • Healthcare: There are datasets like ChestX-ray14 that facilitate the development of algorithms for medical image analysis, including disease detection and diagnosis.
  • Autonomous Vehicles: When you are training self-driving car, it needs data to perceive and navigate the environment. Datasets such as Waymo Open and ApolloScape are used.
  • Retail: In retail, visual datasets help in creating systems for inventory management, automated checkout, and customer behavior analysis.
  • Security and Surveillance: There are face recognition datasets like LFW (Labeled Faces in the Wild) that are used to develop systems for identity verification and surveillance.
  • Agriculture: Agriculture can be improved in multiple ways by use of computer vision. Datasets capturing crop images and environmental conditions will help in precision farming, allowing farmers to automated crop monitoring and disease detection.

Challenges with Computer Vision Datasets

  • Data Quality: Computer vision tasks need high-quality annotated data because it is critical to avoid errors. In some cases such as disease detection, poor quality data that lead to inaccurate models which critical considering patient’s health.
  • Bias and Fairness: It important that diverse scenarios are included in the dataset. This will help to prevent biased models which perform poorly on underrepresented groups.
  • Scalability: When you have large dataset, you will need substantial storage and computational resources. This can be a barrier for many researchers.
  • Privacy and Ethics: When you collect visual data, it might raise privacy concerns and ethical issues that must be addressed. This can happen especially if people are involved.

Conclusion

By now you should’ve understood the role of datasets in computer vision research and development. They are not only essential for training and testing but also creating accurate models(if large dataset is given). There are many challenges that are currently faced by researcher in collecting and maintaining the data. However, with the advancements in the field of AI, many techniques are being developed to make this process smooth and quicker.