Unsupervised learning: how are the ais formed that differentiate eagles from airplanes

Big Data & AI Fair 2023, Paris – Of course, generative AI occupies many meetings in companies. Perhaps to the point of forgetting that AI work is progressing far from the buzz of ChatGPT. Unsupervised learning, for example, is a most promising technique. This variation of machine learning – also known as Machine Learning – where the data are not labeled (unlike supervised learning) extracts classes or groups of objects with common characteristics without the help of a supervisor.

The ambition of this technique is to discover the structures underlying these unlabeled data, and it is a way to experiment how far artificial intelligence can bring in terms of performance. Armand Joulin, researcher in artificial intelligence, on this point presented at the Big Data & AI Fair a work on the training of visual recognition ais thanks to an unsupervised learning technique. A technique already used by Meta for voice recognition.

“Image recognition by an AI began with supervised learning,” he recalls. “With this technique, we give an image to a machine, and we ask it to do a task with this image, such as recognizing a subject. To do this, we label the images and teach the machine to recognize the labels”.

How to do object recognition on video?

“It is a very effective but time-consuming method. It also has limitations, because switching for example from image recognition to video requires reforming an AI”. And in addition to cameras, modern smartphones are increasingly incorporating sensors, such as infrared. “Here too, doing infrared image recognition requires making a new AI,” says the researcher.

Above all, classification is just one of the use cases of visual recognition. Copy detection as part of the fight against plagiarism, style transfer or even captioning are all tasks that can be asked of a machine… provided that we train her each time for this specific task, with a new neural network.

Armand Joulin is therefore trying to find a method for a machine to learn functions that can be used everywhere, without having to train new neural networks. And it poses two prerequisites

At the time of learning the machine, it is necessary to “educate” it on different media (video, photo, selfie, radio,…)
It takes a simple task, such as, for example, writing a text from an image.

“But the problem, he says, is that we often describe two slightly different images in the same way”. Hence the idea of asking the machine to describe not what is presented to it on an image, but to describe the differences between two visuals. To do this, the researcher makes the AI compare many images and thus teaches it to note the differences.

Recognizing a cat photo is good, but recognizing the differences between two cat photos to describe them is much better.

This technique, called “no supervision”, is more efficient than supervised learning, he assures.

With this discrimination technique, the AI can classify, without labels, the images by what is different or similar to them. This is an unsupervised form of classification.

“With this technology, the machine identifies the most discriminating part of what is offered to it image or video” he details.

AI can recognize differences, but also similarities on very different media. This is one of the abilities of AI trained with unsupervised learning.

“We have passed the milestone of what ais cannot do”

And it is also possible to ask him what are the common points, between planes and birds on different media, such as photos and videos, but also with 3D objects.

“With this system, it is possible to have object recognition on video, the AI identifies the differences and similarities between airplanes and eagles,” he says.

And to conclude: “We have now surpassed the milestone of what AI cannot do in terms of visual recognition”.