I.T.

Gato, DeepMind and the race towards general artificial intelligence

Gato is a new multimodal AI system from DeepMind capable of performing hundreds of different tasks always using the same neural network.

There are those who think that the way to human-level artificial intelligence is now mapped out, now it would just be a question of increasing computational resources, while others are slowing down because many requirements would still be missing. However, the extraordinary ability to manage very different tasks makes Gato an AI system different from the others, which if on the one hand is not yet the general artificial intelligence that everyone expects, on the other it is still an innovative system for the way it which manages to process very different data from the same architecture as deep learning.

Close AI and general AI

So far one of the main distinctions in the complex world of artificial intelligence has been the difference between weak AI, also called "narrow" AI, and strong AI, also called "general" AI. It was a fairly simple way to settle the question of thinking machines right away. Narrow AI is the kind of artificial intelligence that only performs one task, such as planning a route, providing relevant search results, or having a written conversation. General AI, on the other hand, is the kind of artificial intelligence we see in movies, which thinks like a human being, which performs many tasks simultaneously, creating useful synergies between them. The acronym for these human-like machines is AGI, Artificial General Intelligence. For most researchers, a chimera that is theoretically possible but which we will not reach any time soon.

However, this distinction today creaks and begins to be less and less simple to explain. In fact, in recent years research has pushed towards the creation of increasingly generalist artificial intelligence models, without however leading to the discovery of AGI. It is therefore creating a kind of middle ground, where we find AI models that are able to perform numerous tasks of a different nature, so much so that they can no longer be described as "narrow" AI, but which at the same time do not show that causal intelligence or awareness that for many experts should be inherent in an AGI.

Multimodal AI

We can call this type of artificial intelligence "generalist" or perhaps more correctly "multimodal”, As there are several ways to interact with it. To give an example, a multimodal AI system would be able to find the weather forecast for our area (search and select the best result), tell us that it will rain today (natural language processing and speech synthesis) and check if we are going out with or without an umbrella (machine vision). Furthermore, one of the main characteristics of a multimodal system is that of “ingesting” data of different types - for example images and text - knowing how to draw useful information from both. As a result it will seem to us that we are dealing with a real intelligence, in reality there are only multiple AI models put "in battery" and in synergy with each other.

The DeepMind Zoo

As regards the research towards multimodal AI, in recent weeks the London company DeepMind, which - we remember - is part of the galaxy of Google, has released two AI systems that have made a lot of talk about themselves. The first is called Flamingo, and is a model capable of solving “multimodal tasks”, that is, tasks that may have incoming information conveyed through different modalities, such as images, video and text, even in combination with each other. Flamingo is a visual language model (VLM) that can handle classification information, caption management, image-based question answers, all while providing only a few input / output samples (so-called "few-shot learning" ").

The purpose of the model is to "understand" the situation of an image or video, describing it correctly with its linguistic system and correctly answering questions relating to what it "sees".

Connectivism and intelligence?

Gato isn't always the best AI model for a given task. The control of a Sawyer robot (it is a robot consisting of an arm with many "joints") is of a good standard, but the creation of captions is only mediocre, while the handling of some Atari games is less than that of others dedicated AI models. DeepMind states that out of 450 tasks (compared to the 604 he was trained on) Gato is more accurate than human experts "more than half the time". A somewhat convoluted way of saying that out of a total of 604 tasks, at least 154 return very poor results, while in the remaining 450 a good half of the time Gato behaves better than a human expert, but another half of the time it behaves worse.

The road to generalization

The results of these weeks are the result of a commitment that DeepMind has been carrying out for many years. Let's not forget that the company's goal is to “solve the problem of intelligence”, developing ever more general systems capable of tackling a wide range of different problems. That's what the company calls Artificial general intelligence, and that's where they want to go. Last year a step in this direction was taken with Receiver, a multimodal model based on the Transformer architecture capable of handling different types of inputs, such as images, text, video, sound, 3D data. The creators of Gato themselves think that Perceiver could be useful for further expanding the number of modes of future general systems.

Article extracted from the Post of Luca Sambucci, if you want to read theentire post click here