Build an AI Composer – Machine Learning for Hackers #2


I actually didn’t play anything. You just
heard AI generated music. Hello World, welcome to Sirajology! In this
episode, we’re going to train a neural network to compose music all by itself. Machine generated
music! the technical term for this is ‘music language modeling’ and it has a long history
of research behind it. Markov Models and restricted bolztman machines. Which kind of sounds like
something out of half life or bioshock. Hold on babe. I’ve got to go save the world
using my restricted boltzman machine. Music is how we communicate our emotions and
passions and its completely based on mathematical relationships. Octaves, Chords, Scales, keys,
all of it is math. At the lowest level, music is a series of sound waves that create pockets
of air pressure, and the pitch we hear depends on frequency of changes in this air pressure.
We’ve created annotation to help us map these sounds
into an instruction set. So if machine learning is all about feeding data into models to find
patterns and make predictions, could we use it to generate music all by itself? absofruitly. We’re going to be build an app that learns
how to compose british folk music by training on a dataset of british folk music. We’ll
be using Tensorflow, the sickest machine learning library ever, to do this in just 10 lines
of Python. We’ll be following the tried and true 4 step machine learning methodology
to do this. Collect a dataset, build the model, train the model, and test the model. To start
off, we’ll want to collect our dataset. So let’s import the urllib module, which
will let us download a file from the web. Once we import it we can call the URLretrieve
method to do just that. We’ll set the parameters to the link to the dataset and the name we’ll
call the downloaded file. We’re using the nottingham dataset for this demo, which is
a collection of 1000 british folk songs in MIDI format. MIDI format is perfect for us
since it encodes all the note and time information exactly how it would be written in music annotation.
It comes in a zip file, so we’ll want to unzip it as well. We can do this programmatically
using the zipfile module. We’ll extract the data from the zip and place it in the
data directory. We’ve got our data, it’s time to create
the model. But before we do that, we need to think about how we want to represent our
input data.There are 88 possible pitches in a MIDI file so we could do one vector representation
per note. But lets be more specific. At each time step in the music, there are two things
happpening. Theres the main tune or melody and then there are the supporting notes or
harmony. Let’s represent each as a vector. And to make things easier we’ll make two
assumptions. The first is that the melody is monophonic. That means only one note is
played at each time stamp. The second is that the harmony at each stamp can be classified
into a chord class. So thats two different vectors one for melody and one for harmony.
We’ll then combine them into one vector for each stamp. We can just import our ML helper class and
then call the create model method to do this. Music plays out over a period of time, its
a sequence of notes. So we need to use a sequence learning model – it has to accept a sequence
of notes as an input and output a new sequence of notes. Plain old neural nets can’t do
this. They accept fixed sized inputs like an image or a number. We’ll need a special
kind of neural network, a recurrent neural network. Yeah! Those can deal with sequences
since data doesn’t just flow one way, it loops. This allows the network to have a kind
of short term memory. Yeah, that’ll work. But wait. We want our network to not just
remember the most recent music its heard, but all the music its heard. Like, a piece
of music can have multiple themes in different parts of it (hopeful, melancholic, angry)
and if the network only remembers the most recent part which was cheery, it’s just
going to compose cheery stuff. We need a special type of recurrent neural network called a
Long short term memory network. Super specific i know. This type of network has a short term
memory that is LONG, like it can remember things from way back in the sequence of data,
and it uses everything it remembers to generate new sequences. We can add this model in our code with just
one line using our helper class. It’ll generate the sequences and chord mapping file to a
file called ‘’ in the data folder. This is just a serialized byte stream representation
of our music that we’re going to train our model with it. Now that we have our model
we can go ahead and train it. You might be thinking wait this is a little too easy, isn’t
there more to it? Well yeah, every machine learning model has a set of what are called
‘hyperparameters’. These are the parameters that we humans set for how our model operates,
like knobs on a control panel. How many layers do we want? how many iterations for training?
How many neurons? You could play around with these, turning all the knobs in different
ways to perfect your end-result, but chances someone somewhere has solved the problem you’re
working and and you can just use an existing model with pre-tuned hyperparameters to build
something awesome So now we’re ready to train our model. We
can just call the train_model method of our recurrent neural net class to do this. This’ll
get the the network to start collecting the input data piece by piece. It took me about
2 hours to train it on my 2013 macbook pro. But you don’t have to wait until its competely
done training to test it out. Just wait until you see the “Best loss so far encountered,
saving model.” message. Once you see that you can type ‘rnn_sample’ into terminal
with the flag —config file and point it to the newly generated config file in your
models folder. That will generate a new song using the newly trained model you’ve just
created. To generate music we just sample the melody and harmony at each time step and
plug it into our trained model. The model will then predict what the next notes will
be. The collection of all the predicted notes is our newly generated song. Let’s listen in to what I’ve generated.
So it sounds nice, it could better but it gives off that british folk vibe. There are
definitely some improvements that could be made. The time signature is kinda sporadic
and in terms of long-term structure, there seems to be a lack of repeated themes and
phrases. The solution may well be more data and more computing power. It usually is when
it comes to machine learning with deep neural nets. Machine learning can help us learn the
fundamental nature of how music works in ways that we haven’t even thought about. I’ve
got links below, check em out. and I’ve gotta go fix a runtime error so thanks for
watching