DeepMind’s AI Learns The Piano From The Masters of The Past


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. Today, we will listen to a new AI from DeepMind
that is capable of creating beautiful piano music. Because there are many other algorithms that
do that, to put things into perspective, let’s talk about the two key differentiating factors
that set this method apart from previously existing techniques. One, music is typically learned from high-level
representations, such as the score or MIDI data. This is a precise representation of what needs
to be played, but they don’t tell us how to play them. These small nuances are what makes the music
come alive, and this is exactly what is missing from most of the synthesis techniques. This new method is able to learn these structures
and generates not midi signals but raw audio waveforms. And two, it is better at retaining stylistic
consistency. Most previous techniques create music that
is consistent on a shorter time-scale, but do not take into consideration what was played
30 seconds ago, and therefore they lack the high-level structure that is the hallmark
of quality songwriting. However, this new method shows stylistic consistency
over longer time periods. Let’s give it a quick listen and talk about
the architecture of this learning algorithm after that. While we listen, I’ll show you the composers
it has learned from to produce this. I have never heard any AI-generated music
before with such articulation and the harmonies are also absolutely amazing. Truly stunning results. It uses an architecture that goes by the name
autoregressive discrete autoencoder. This contains an encoder module that takes
a raw audio waveform and compresses it down into an internal representation, where the
decoder part is responsible for reconstructing the raw audio from this internal representation. Both of them are neural networks. The autoregressive part means that the algorithm
looks at previous time steps in the learned audio signals when producing new notes, and
is implemented in the encoder module. Essentially, this is what gives the algorithm
longer-term memory to remember what it played earlier. As you have seen the dataset the algorithm
learned from as the music was playing, I am also really curious how we can exert artistic
control over the output by changing the dataset. Essentially, you can likely change what the
student learns by changing the textbooks used to teach them. For now, let’s marvel at one more sound sample. This is already incredible, and I can only
imagine what we will be able to do not ten years from now, just a year from now. Thanks for watching and for your generous
support, and I’ll see you next time!