How Google DeepMind is learning like a child: DeepMind uses videos to teach itself about the world

Vaughn Highfield July 17, 2017

The latest project out of Google DeepMind is teaching AI to understand what’s happening in the world. To do so, it’s turned to a vast catalogue of video files to help it understand the world it lives in.

So far DeepMind’s AI projects have all looked inwards, understanding how AI’s can write, interpret their virtual environments, categorise images or even grasp the difficulties of movement. But this time the DeepMind team have taught an AI to look outwards and understand what’s going on in the real world it’s now a part of.

The project lets an AI teach itself to recognise a range of visual and audio concepts by watching small snippets of video. So far it’s understood what it means to mow a lawn or tickle someone but, at no point in its training, has it been taught the words to describe what it’s seeing or hearing. It’s understanding these actions all by itself.

Just like DeepMind taught an AI to interpret its surroundings via Symbol-Concept Association Network, the team leading this DeepMind project is following a similar path. Instead of using labels to teach an AI what each object it’s looking at is, this DeepMind project teaches itself because it learns to recognise images and sounds by matching them up with what it can see and hear.

This method of learning is almost exactly like how humans think and learn to understand the world around them.

The algorithm started as two separate neural networks – one handled image recognition, the other audio. The image network was shown stills from videos, and the audio handled 1-second clips from the same point of the video the image network was shown. The AI was trained with 60 million still video and audio pairs, taken from 400,000 videos.

A third network then compared these images with audio clips to learn which sounds corresponded to which video stills. From this, it then learnt to recognise audio and visual concepts including crowds, tap dancing and running water without ever being given a specific label for such a concept. This doesn’t mean it suddenly knew the words to describe such an action, instead, it meant that you could show it a new picture of someone clapping, for instance, and it’d know a clip of clapping should pair with it.

This kind of unsupervised self-learning gives AIs the tools to operate in the real world, learning about what’s happening around it from what it sees and hears. That thought may worry some people but, for now, you can be safe in the knowledge that everything going on at DeepMind HQ is happening well away from the internet and actually interacting with the real world.

It is, however, just one way that the robots could learn to rise up and enslave us all.

What Google DeepMind has already taught AI

Google DeepMind interprets its surroundings like a child

deepmind_learning

In a bid to accelerate how DeepMind can solve problems and deal with complex situations, Google has turned to the human mind for inspiration. By teaching DeepMind to use conceptual tools to solve problems, just like a human brain would, it can learn how to solve a wide array of problems with ease.

Google’s DeepMind team summarises this with an example of how we create objects out of raw materials to build tools that solve a problem – such as building an abacus out of clay, reeds and wood to help count large numbers. AI minds, however, don’t think like this.

AIs retain knowledge, but traditionally can’t make the mental leap of combining familiar concepts into something entirely new and different. Now though, thanks to a new neural network component called the Symbol-Concept Association Network (SCAN), DeepMind’s AI can mimic human vision to understand visual concept hierarchies.

In its new, snappily named paper SCAN: Learning Abstract Hierarchical Compositional Visual Concepts, the DeepMind team outline how they’ve managed to replicate human-like thought processes in an AI brain.

Essentially, DeepMind is now thinking in terms of understanding its visual world like a human child does. Its range of vision is limited and objects are brought into its line of sight. It interprets an object like an apple, hat or suitcase in terms of its physical properties – colour, shape, size – and even its position and lighting in the space.

DeepMind then combines this with lexical confirmation and descriptions about what it’s seeing. So, if it’s a red apple on a blue wall, researchers would tell the AI that it’s seeing a “red apple. Blue wall.” This means the DeepMind AI doesn’t simply look at an apple and compare it to other apple images stored in an image archive. It learns what an apple actually looks like.

SCAN knows what each component is and the base object of the suitcase too – it understands how to differentiate the objects from one another. Therefore, when asked to produce a nonsense object known as a “woog”, SCAN creates what it thinks a woog should look like from the information it has already learnt. It’s a green object, on a pink floor in front of a yellow wall, apparently.

Google DeepMind has learnt how to walk

In a move that will, almost certainly, get AI naysayers feeling nervous, DeepMind has managed to teach itself how to walk. This doesn’t mean the supercomputer is standing up and running around the DeepMind office, but it does mean that the AI understands how walking works and the art of self-balance and motor control.

You may think this isn’t anything all that tricky compared to the likes of Boston Dynamic’s various walking robots, but what DeepMind is up to goes far beyond that.

Instead of simply telling a robot how to walk, DeepMind’s AI is learning to understand digital limbs. It’s learning how to walk, how to understand its own momentum and physical space so it can overcome tasks in complex environments. It’s why humans are able to rock climb and run hurdles but also walk down the street normally – we’re not made for a single purpose.

Traditionally, teaching a robot to walk has required motion capture data to be fed into the system. Not only does this not allow for an Ai to easily adapt to a new situation, it’s timely. DeepMind managed to train an AI to walk forward without falling over, along with learning how to make it across diverse digital landscapes that required running, jumping, turning and crouching to overcome.

In another experiment, the DeepMind team also discovered that the AI had taught itself a way to naturally transition between two distinct walking styles without any human input.

Both SCAN and learned motion research were entirely separate projects at DeepMind but they both point to a new era of AI development. Instead of simply feeding a machine a load of information for it to parse, it’s learning about the world around it in the same way the human mind does.

Understandably, that’s a rather frightening thought and one of the reasons that Elon Musk wants more regulation around AI development. Still, there’s nothing creepier than watching an AI pretend to walk like a human…