Computer vision is a field of Artificial Intelligence that emphasizes training computers using AI algorithms so that they can capture, analyze, and comprehend essential information about images. Computer vision imitates parts of the human brain connected to vision to interpret images in the same way as we do. The advancements in artificial intelligence, deep learning, and neural networks have made this technology a center of attention among technologists and mathematicians for the past few years. Experts envision a huge potential in this technology, and they are optimistic that in the future computers will be able to perceive images and videos better than humans.
Why Is Computer Vision Important?
Smartphones are an integral part of our lives. From birthday events and college annual functions to workplace celebrations and vacation selfies, people have lots of visual information such as photos and videos in their smartphones. Besides this, many social media platforms like Instagram and Pinterest thrive on the pictures and videos shared by millions of users every day. YouTube has probably the second largest search engine after Google, where billions of people watch videos every day and in each minute videos with an aggregated watch time of hundreds of hours are uploaded.
This indicates that the large amount of information on the internet is in the form of images and videos. Indexing text is relatively easier, however, to index visual information, we need algorithms that recognize the images and videos. When computer vision was not in the picture, the search engines relied on the meta descriptions provided by the user who uploaded them. With time, the algorithms of computer vision have become more sophisticated. Today, we have algorithms that decipher visual information with 99% accuracy. Besides this, computer vision can be applied across industries to increase the efficiency of business operations.
How does Computer Vision Work?
Computer vision works by recognizing patterns. We train models to decipher visual information by exposing them to as many labeled images as possible. For example, if you expose the computer to thousands of images of parrots, it will apply various algorithms to analyze colors, shapes, and distance between objects in the photos, to learn how a parrot looks like. After the training is finished, you can feed any new image, and the computer will tell you if this is an image of the parrot or not from its experience.
A special type of neural network that makes computer vision possible is a Convolutional Neural Network (CNN). CNN breaks the images into smaller grids of pixels known as filters. Each filter is a matrix of pixels and on each pixel, the network applies a series of statistical and mathematical calculations. It also compares the pixels to the patterns. CNN is used to create the algorithms for object detection, for instance, SSD (Single-shot multi-box Detection) and YOLO (You Only Look Once). There are three types of layers in Convolutional Neural Network: convolutional layer, pooling layer and fully connected layer. Each of the layer performs the specific task on the input. The first layer of CNN is always a convolutional layer in which filters are applied on the image. In the first layer of CNN, it can comprehend high-level patterns such as edges of the images. As the network convolutes further, it starts to recognize the entire pictures such as faces of animals.
Computer Vision Techniques
Computer vision is giving rise to many cutting-edge applications that are revolutionizing our daily lives. Before discussing those applications, we will see some major computer vision techniques because they are the basis of many CV applications.
Image Classification
It refers to delegating a label to the entire photograph and is also known as object classification or image recognition. For example, if you are shown an image of a maple tree, you can instantly recognize it. Have you ever wondered how you immediately recognize it? The answer is straightforward. You have seen trees before, so when you come across the picture of a tree, your brain immediately tells you that it belongs to the tree category. A single image can be categorized into multiple categories, for instance, a tree can belong to a category of plants or living things. While our brains can easily classify images, computers are unable to do so without being subjected to deep learning methods. Billions of images are uploaded every day on the internet. It is implausible that someone can manually categorize each image. Image classification techniques of computer vision automate the classification process by labeling the pictures quickly and designating them into the common categories.
Object Detection
It is a subtask of image classification but with a constraint. We know that a single photograph may contain images of multiple objects, therefore a technique is needed for identifying multiple objects in the same scene. The commonly used dataset for object detection is the PASCAL Visual Classes datasets or PASCAL VOC.
Object Tracking
This technique is used to track one or multiple moving objects in a video. It has two categories: generative and discriminative. The generative approach explains visible features and mitigates the reconstruction error while fetching the object. The discriminative approach is more precise and is employed to differentiate between the object and its background. It is also referred to as Tracking by Detection.
Semantic Segmentation
It is also known as object segmentation. It works by drawing a line around each identified object in the image because it splits the entire picture into groups of pixels that can be labeled and categorized. In other words, we can say that semantic segmentation comprehends each pixel that forms the picture. For example, it not only detects an animal in the picture but also tells where the edges of the animal in the picture are.
Instance segmentation
It segregates different instances of the classes. For example, it labels three cats with three different colors. In simple classification, we feed the image to a computer with the objective that it will describe what is in the picture. But segregating instances is a much-complicated task because we often have visual information with multiple objects and different backgrounds. In instance segmentation, we not only need to categorize these objects but also detect the boundaries, differences in colors and shape, and their relationship with each other.
Real-Life Applications of Computer Vision
Computer vision is revolutionizing every industry including retail, automotive, healthcare and agriculture. Innovation and providing greater comfort are the hallmarks of this technology. In this section, we will see what real-life applications of computer vision are.
Retail Industry
Computer vision is enhancing the customer experience in the retail industry by providing valuable information about the product to the customers. For example, when a certain company launches a new product, the customers may be skeptical of buying that product. However, using a computer vision-based mobile application, customers can get essential information about the product which will influence the customer’s buying decision. Besides this, it is helping retailers in optimizing their business operations by automating data collection and improving the payment and compliance processes. The technology can also circumvent losses in the form of theft by employing concatenated cameras that keep a close eye on the retail store and immediately detect suspicious activity. Besides improving the security of stores, computer vision is now widely used to improve sales and marketing operations.
Take the giant online retailer, Amazon, for example. The most recent project of Amazon is the AmazonGo store which is the prime example of using computer vision cameras to enhance the payment process. The store employs “Just Walk Out” technology as customers do not have to wait in a long queue to pay for the items. The customers turn on their Android or IOS applications before walking into the store. Cameras are installed everywhere in the store which not only monitor the items picked up from the shelves but also the person picking them. If a customer puts the item back into the shelf, the intelligent system removes that item from the customer’s virtual basket. As the name implies, the main idea of the store is that customers can leave the store once they are done with their purchases. The application sends them an online receipt and they pay for the purchased items using their Amazon accounts. Although the store has no cashiers, yet employees work behind the scenes to monitor the algorithms and constantly train them.
Self-driving cars
According to WHO, road accidents will become the seventh leading cause of deaths worldwide by the year 2030. A vast percentage of road accident deaths are caused by human error and negligence. Companies are incorporating computer vision technology in cars to make the whole driving experience much safer for people. For instance, a company Waymo, formerly known as Google self-driving car project, is using sensor technology to build self-driving cars. These cars are expected to make the driving experience safer for drivers and lead to fewer accidents in the future. The trained software system and sensors in the Waymo cars are capable of monitoring 360 degrees motion of the pedestrians, motorcyclists, cyclists, and other vehicles. The software is trained using algorithms so that it can follow the traffic rules and regulations, and identifies hurdles, for instance, an object in the middle of the road. It also recognizes the signals made by people in other vehicles to anticipate their movement. These self-driving cars are trained using deep networks so that they can handle situations on the road as we do such as giving way to ambulances, slowing down for pedestrians, and creating space for the cars that are parking.
Besides Waymo, Tesla has also launched three autopilot car models so far. Tesla vehicles are equipped with a camera system containing eight cameras also known as Tesla Vision, twelve sensors, and radars. The cameras enable a 360-degree view around the car. Ultrasonic cameras installed in the car help it to pinpoint soft and hard objects on its way, while the radars ensure visibility during heavy rain, fog, and dust.
Self-driving cars intend to reduce road accidents, however, in 2018, a Tesla self-driving car met a fatal accident, while its autopilot mode was activated. The reports suggest that it was the fault of the driver because even after the repeated warnings, the driver did not put his hands on the wheel. The technology of self-driving cars using computer vision is still evolving and companies who are pioneers in this area are incessantly upgrading their models. One such current improvement in the Tesla model stops the car if the driver does not respond after the three repeated warnings to put his hand on the wheel.
Automated Customer Service
The neoteric trend of installing smart home devices has overwhelmed customer service departments of companies because of the volume of people calling for assistance. Sometimes, many people have a common issue and the issue is so small that the customer himself can resolve it. However, discovering the issue itself entails a deep understanding of these devices. Companies are now thinking of developing a self-customer service by using computer vision technology. Although this is an emerging computer vision application, the researchers are hoping to see positive results soon.
Smart cameras equipped with computer vision technology will detect the issue and the connected mobile application will guide the customer to resolve it by providing clear and concise visual instructions. The mobile applications will also monitor the customers and interrupt them in the middle if they are not following the right path of action.
Automated Data Collection
Companies must assemble customer data to offer promotions and examine their buying decisions. Companies employ different techniques to collect customer data from using customer feedback forms to tracking them indirectly. Manually collecting customer data is not only time consuming, but it is also inefficient. Computer vision can enhance the efficacy of the whole process by employing facial recognition. The purchase patterns of customers help the companies to determine the popularity of the specific product among the customers and which products are popular in which geographic locations. In this way, they can create more personalized products for the customers based on their interests, values, and geographic location. Although the technology offers undeniable benefits, however some organizations and people have raised their concerns about the usage of facial recognition systems to collect consumer data. These people and organizations are of the view that it is unethical for the companies to collect the customers data without their discretion. Furthermore, the likelihood of the misuse of data has made this technology more controversial.
Manufacturing Industry
The manufacturing industry is a pioneer in adopting Artificial Intelligence for automating its operations. Gone are the days, when humans used to work for long hours in factories to take part in the manufacturing of goods, assembling different parts and maintaining the machinery. Now, robots have taken charge of many operations in this industry, resulting in an unprecedented increase in efficiency and cost-effectiveness.
Computer vision is offering the following benefits to this industry:
- Predictive Maintenance: Cameras and sensors trained for computer vision identify defects in the machinery before humans can see them. Thus, the right action at the right time prevents the shutdown of the machinery and saves a lot of cost and time.
- Product Inspection: Many industries need to count the products before packing them. For example, a biscuit manufacturing company should put the same amount of biscuits in every packet as they have mentioned in their advertisements. Computer vision can automate the whole procedure of package inspection by taking pictures of the biscuits. These pictures are then passed to a specialized computer that analyzes it to ensure that every biscuit is not broken and it is of the right shape and width.
- Compliance: Governments often apply strict rules and regulations on certain industries for the welfare of people. The industries that manufacture such products should follow the regulations, from well-printed ingredients and package quality to manufacturing and expiry dates. Any discrepancy in these operations can lead to strict legal penalties on the companies. Manually inspecting every product and its package is impossible. Cameras can take pictures of the items and the packages to ensure that the company is complying with all the rules.
Healthcare industry
In the healthcare industry, computer vision is delivering miraculous results by saving patients’ lives. Some of the applications of computer vision in the healthcare industry are discussed below:
- Accurate Diagnosis: The technology minimizes the probability of human error by offering accurate diagnosis to the patient. Computer vision software can identify the hidden underlying causes symptoms of the patient which may be overlooked by his doctor. Besides this, sometimes doctors can suggest expensive tests and procedures to the patients which are not required. Due to the high accuracy level of the computer vision algorithms, these kinds of extra medical expenditures can be avoided. Experts are hopeful that in the future, we might see a considerable upsurge in the diagnosis capability of the computer vision algorithms.
- Timely Diagnosis: The life-threatening diseases, for instance, cancer should be diagnosed at an early stage. Computer vision algorithms are capable of detecting even smallest anomalies; therefore, they can detect the symptoms at an early stage when the disease is still treatable.
- Optimize Medical Operations: Computer vision automates many healthcare processes, thus saving time for doctors. Consequently, doctors can have more time for face to face interaction with the patient which is constructive for the doctor-patient relationship. By optimizing medical processes, computer vision can lessen the burden on the healthcare system because doctors will have extra time to examine more patients than usual.
- Better Health Advice: Computer vision can help physicians provide useful advice to the patients by monitoring their health and fitness conditions. For example, a camera can take pictures of the person to analyze his/her body fat. Based on the analysis of these images by a computer vision system, a physician can advise a person to lose his/her body fat.
Many companies have developed intelligent medical devices and equipment using computer vision technology. For example, a company, Gauss Surgical, has built blood monitoring solutions using computer vision that monitor blood loss during medical procedures. These solutions capture the images of the patient’s blood loss during a medical procedure which are then processed by the algorithms to accurately estimate the loss. Another application developed by Amazon Web Services (AWS) employs a DeepLens camera to help the patients examine and manage a skin disease known as psoriasis. This application is known as DermLens and it is the prime example of using computer vision technology in healthcare applications.
Agricultural Industry
In the agricultural industry, computer vision systems are employed to categorize food products, identify defects or damages, and analyze them based on their shapes, colors, and sizes.
Computer vision is transforming the agricultural industry in the following ways:
- Detecting Plant Diseases: Computer vision algorithms can detect the plant diseases at the early stage, thus saving cost, time and resources. Sometimes the disease symptoms in plants are so negligible that even the industry experts overlook them. If the right action is not taken from the beginning, these diseases turn into a huge problem at a later stage.
- Help in Phenotyping: Plant phenotyping means describing plants physical features and characteristics. Since visual features are the fundamental characteristic of phenotyping in plants, therefore computer vision algorithms can do it with greater precision than humans.
- Grading Fruits and Vegetables: Traditionally, humans spend a lot of time and effort to grade fruits and vegetables, according to their quality. Computer vision algorithms can analyze the images of fruits and vegetables and grade them using their experience at a greater precision than humans. For example, if we feed two images to these algorithms, one of the high-grade bananas and other of the defected bananas, then the software can easily distinguish between the two and describe their grades based on its experience.
Some industries are embracing computer vision technology at a faster pace than others. The reason behind slow adoption is perhaps the evolving nature of the technology. The role of humans in the industries using CV is not entirely wiped out, because we still need humans to develop the algorithms and detect mistakes made by them.
Q&A
Why do we need computer vision?
Smartphones are an important part of our lives. Every day, people upload and share tons of visual information. Gone are the days when the internet was mostly text-based. Today, with this mammoth amount of visual data, we need algorithms that can extract information from these pictures and analyze them. Moreover, useful applications of computer vision are spread across many industries such as retail, automotive, healthcare and agriculture.