Different types of computer vision in real-life use cases.
Artificial Intelligence (AI) is a broad topic for research and discussion. In this article, I want to talk about various types of computer vision, such as Image Classification, Object Localisation, Object Detection, and Image Segmentation. Which one do you need for your use case?
Computer vision is a subfield of AI that enables computers and systems to process visual data, such as images and videos, and generate patterns for detecting, tracking, and classifying objects. We can train a machine learning model in different ways. Depending on that, computer vision can be part of different subfields of AI.
- Computer vision is a subfield of AI in general
Based on Wikipedia’s definition, we can consider AI as intelligence demonstrated by machines. Imagine that you show different emojis to the system, such as happy and sad — and the machine can distinguish them. We can call the action of this machine AI, but at the same time, we have a very simple rule-based technology under the hood. The algorithm is a loop of “if” statements in which it analyzes pixels’ color — black or white — and the coordinates of pixels.
- Computer vision is a subfield of Machine Learning (ML)
Imagine another case when you collect thousands of images of dogs and cats and use them for training your model to recognize patterns and distinguish between them. You know in advance what is depicted on the images and teach your model by providing the correct answers. This type of learning is called supervised learning, and this type of computer vision belongs to the ML subfield.
- Computer vision is a subfield of Deep Learning
In this subfield, your model is independent. You need to prepare the data for training, and then the model will be able to analyze the data and find patterns itself using unsupervised learning.
AI can perform various tasks, and for each task, it uses a specific type of data. In general, we can use images, text, audio files, and tabular data as input to train the ML model.
For tasks that are part of computer vision, we use images. Since video is a set of frames or images, we also consider it as legitimate input data.
The way computers can see is different from the way people see. The requirements for a person to recognize an object are good eyesight and familiarity with the object.
We can all recognize the dog in the left image, but not everyone knows what’s on the right one.
How the computer sees is almost the same, but it has different visions depending on how you have trained your model. We can name four different types of computer vision — Image Classification, Object Localisation, Object Detection, and Image Segmentation.
- Image Classification — is a supervised ML problem in which a model can recognize one or multiple classes. For example, we trained the model with two classes — “dog” and “cat”- to distinguish them and show a label.
- Object Localisation — is a regression problem in which the model determines the approximate location of the object of interest. Instead of showing the object’s class (you already know it), it gives you the x and y coordinates of the central point of the object and the height and width to draw a bounding box around it.
- Object Detection — is a complex problem that combines both Image Classification and Object Localization. It recognizes one or multiple classes, such as “Dog” and “Cat”, and it gives you the coordinates of their central points. It also gives you the height and width to draw bounding boxes around them to show their size and location.
- Image Segmentation —another complex problem that demands the machine to recognize an object and show its location. The first part of the task is the same, and we use Image Classification to identify one or multiple classes, such as “Dog” and “Cat”. But for the second part, when we need to show the object’s location, the approach is different. Instead of identifying the central point and boxes boundaries, it highlights objects as a pixel mask for each object in the image.
Let’s look at some examples to understand better which type of computer vision is more suitable for your use case.
Grocery store mobile app
Imagine that you are in the grocery store where you can find a wide variety of fruits and vegetables. You are familiar with some of them — for example, there are Granny Smith, Pink Lady, and Cripps Red apples. But some of the fruits you’ve never seen before. You can train an ML model to recognize different varieties of fruits and vegetables, connect the model with the mobile application, point the camera to the fruit and see its name.
You can try to make your own digital guide. Have you heard about the Peltarion platform? It’s a low-code tool that helps you to train ML models. They have a bunch of tutorials on their website, and one of them is the use case that we discussed above. Go through the tutorial Find similar images of fruits. You don’t need coding or technical skills to complete it.
Apple orchard
Imagine that you are a farmer and you have an apple tree garden. You produce apple juice, so each ripe apple is material for your juice. If an apple has ripened and fallen to the ground, and you have not picked it up, then it may begin to rot, which means that you have lost your potential source of the juice.
That is a problem that can be solved by Object Localization. We can use surveillance cameras to get data for ML and train the model to recognize an apple and point us to its location to know if it is on the tree or fell to the ground. Suppose it is on the ground, then it is time for us to go and pick it up to prevent it from rotting.
Protection of wild animals
In Maasai Mara, a large national game reserve in Kenya, surveillance cameras with night vision and object detection are used to protect elephants from poachers. It can detect humans in the video and send an alert signal to the ranger team — it helps them take action immediately and save the lives of animals.
You can learn more about the use-case from the documentary The Age of AI, which you can watch on YouTube. They review a lot of other different use cases. So if you want to learn more about AI applications in various fields, I definitely recommend watching it.
Social distance monitoring framework
Surveillance cameras with object detection to monitor the distance between each person. The case appeared during the COVID-19 pandemic when the Ministry of Health strongly recommended keeping a distance to prevent the spread of infection.
You can read about this case in the article Social Distancing Detector using Deep Learning and Depth Perception written by Osama Fawad.
Skin cancer detection
Currently, Image Segmentation is widely applied in the health industry. That helps to analyze specific parts of the human body and improve the accuracy of diagnosis and efficiency of treatments.
For example, human life could be at risk due to skin lesions, and it is difficult to distinguish benign skin lesions from melanoma at an early stage with the naked human eye. It is possible to build a ML algorithm trained on skin images, each with a corresponding segmentation mask image that indicates the boundaries of the lesions. This case is well described as a tutorial on the Peltarion platform. Try following the Skin cancer detection tutorial to create a model that can generate segmentation masks for images of skin lesions.
All types of computer vision are useful. It is important to use them for the right case, which includes understanding the main goal, sources of data, amount of effort, and computational power. In references, you can find more information about AI and become familiar with other use cases.
References
- Elements of AI — learn more about AI-related subfields
- Peltarion platform — low-code tool to train ML models
- Find similar images of fruits — Image Classification tutorial on the Peltarion platform
- The Age of A.I. — learn more about ways of AI applications in different industries
- Social Distancing Detector using Deep Learning and Depth Perception
- Skin cancer detection — Image Segmentation tutorial on the Peltarion platform