One of the areas of interest for the Magenta project is to empower individual expression. But how do you personalize a machine learning model and make it your own?
Training your own model like Music Transformer, MusicVAE or SketchRNN from scratch requires lots of data (millions of data points), lots of compute power (on specialized hardware like GPUs/TPUs), and hyperparameter sorcery. What if you only have a laptop with a couple minutes of training data?
Without a lot of data, a model just memorizes the training data and doesn’t generalize to produce varied outputs – it would be like trying to learn all of music theory from a single song. Fine-tuning on a smaller dataset is a popular approach, but this still requires a lot of computation to modify the full network. However, since models like MusicVAE and SketchRNN learn a latent space, we can overcome this by training a separate “personalized” model to only generate from the parts of latent space we want.
Here, we introduce this approach to quickly train a small personalized model to control a larger pretrained latent variable model, based on prior work by Engel et al. To show its application for creative interactions, we implement this in TensorFlow.js as a standalone application, so that the model training happens in real-time, in a browser, closest to the end user. The model is also available in Magenta.js.Read full post.
Last March we launched the Bach Doodle with the goal to make music composition more approachable. Users entered their own melodies, and used a machine learning model to harmonize them in the style of Bach chorales. We compiled the melodies and harmonizations users submitted into a new open source dataset.
When we put this dataset together, I was excited to find out what was in it, and most importantly, whether any of you entered the same melody. I spent some time creating a set of interactive visualizations to dig into this, and the results were super interesting! (Spoilers: more of you know the Pirates of the Caribbean theme than the Star Wars one, but neither holds a candle to Megalovania from Undertale.)Read full post.
The quality of outputs produced by deep generative models for music have seen a dramatic improvement in the last few years. Most of these models, however, perform in “offline” mode: they can take as long as they’d like before they come up with a melody!
For those of us that perform live music, this is a deal-breaker, as anything making music with us on stage has to be on the beat and in harmony. In addition to this, the generative models available tend to be agnostic to the style of a performer which could make their integration into a live set fairly awkward.
In this post I describe how I have begun exploring what it would take to take out-of-the-box generative musical models and integrate them into live performance. To do this, I make use of two “old” (for today’s standards) Magenta models: DrumsRNN and MelodyRNN.
You can read about the details in my ICCC paper, check out my open-source code for Python, and try it out yourself in my web app.
ICCC 2019 Paper
Read full post.
Editorial Note: At I/O 2019, we announced collaborations with two musical artists during our talk session: YACHT and The Flaming Lips. You can read about these collaborations on the Google Blog, but we are also excited about revealing more details of this work on our blog in the coming months!
In this post, Deeplocal, a creative studio we partnered with, discusses the development of “Fruit Genie”, an AI-assisted instrument debuted by The Flaming Lips on stage at I/O. They also provide instructions for you to build your own Fruit Genie at home!
We (Deeplocal) had the opportunity to work with Google’s Magenta team and The Flaming Lips to create an AI-assisted performance as part of the headline concert at I/O 2019. The result was Fruit Genie: a real-time intelligent musical instrument which combines Magenta’s Piano Genie model with a physical interface consisting of fruit (or whatever else you can dream up)!
Here is a video summary of the experience, but keep reading to get into some of the nitty-gritty details and behind-the-scenes process!
In this post we introduce the GrooVAE class of models (pronounced “groovay”) for generating and controlling expressive drum performances, along with a new dataset to train the model. We can use GrooVAE to add character to stiff electronic drum beats, to come up with drums that match your sense of groove on another instrument (or just tapping on a table), and to explore different ways of playing the same drum pattern.
Here is a GrooVAE generating drums to match the groove of a bassline with the Drumify plugin of Magenta Studio:
You can learn all about the project in our ICML paper, download the dataset and the code for python and magenta.js, and try out GrooVAE in the Colab notebook and the Groove and Drumify plugins in Magenta Studio:
Ableton Live Plugin
ICML 2019 Paper
Groove MIDI Dataset
Read full post.
Editorial Note: Matt is a Senior Scientist at Pandora and was one of the attendees at last year’s WiMIR workshop. In this post, he shares his experience and one of the demos that came out of the workshop.
This year following the conference of the International Society of Music Information Retrieval, over 70 people attended the inaugural WiMIR Workshop. It was a fantastic event to wrap up a week of non-stop music, math and machine learning. It was great to see a lot of familiar and unfamiliar faces with a wide variety of experience, and we were particularly impressed with the organisation. The day involved a quality poster session containing 18 posters from women researchers in the field, discussion groups over lunch, and workshop groups with enough time to have an in-depth, productive and insightful conversation.
Benedikte Wallace, Gabriel Meseguer-Brocal, Jan Van Balen, Karin Dressler, Matt McCallum, and Amy Hung formed the group “Building Collaborations Among Artists, Coders and Machine Learning” led by Douglas Eck. The focus of this workshop group was to explore building bridges between machine learning researchers, developers and artists by building communities and discussing the technical hurdles that must be overcome in constructing bridges between these different areas of expertise. These technical hurdles might include UX design, open source development practices, and meaningful research directions.
Have you seen today’s Doodle? Join us to celebrate J.S. Bach’s 334th birthday with the first AI-powered Google Doodle. You can create your own melody, and the machine learning model will harmonize it in Bach’s style.
In this blog post, we introduce Coconet, the machine learning model behind the Doodle. We started working on this model 3 years ago, the summer when Magenta launched. At the time we were using machine learning (ML) only to generate melodies. It’s hard to write a good melody, let alone counterpoint, where multiple melodic lines need to sound good together. Like every music student, we turned to Bach for help! Using a dataset of 306 chorale harmonizations by Bach, we were able to train machine learning models to generate polyphonic music in the style of Bach.Read full post.
In this post, we introduce GANSynth, a method for generating high-fidelity audio with Generative Adversarial Networks (GANs).
|Colab Notebook||🎵Audio Examples||📝ICLR 2019 Paper||GitHub Code|
Why generate audio with GANs?
GANs are a state-of-the-art method for generating high-quality images. However, researchers have struggled to apply them to more sequential data such as audio and music, where autoregressive (AR) models such as WaveNets and Transformers dominate by predicting a single sample at a time. While this aspect of AR models contributes to their success, it also means that sampling is painfully serial and slow, and techniques such as distillation or specialized kernels are required for real-time generation.
Rather than generate audio sequentially, GANSynth generates an entire sequence in parallel, synthesizing audio significantly faster than real-time on a modern GPU and ~50,000 times faster than a standard WaveNet. Unlike the WaveNet autoencoders from the original paper that used a time-distributed latent code, GANSynth generates the entire audio clip from a single latent vector, allowing for easier disentanglement of global features such as pitch and timbre. Using the NSynth dataset of musical instrument notes, we can independently control pitch and timbre. You can hear this in the samples below, where we first hold the timbre constant, and then interpolate the timbre over the course of the piece:Read full post.
Magenta Studio is a collection of music creativity tools built on Magenta’s open source models, available both as standalone applications and as plugins for Ableton Live. They use cutting-edge machine learning techniques for music generation.
Each of the plugins lets you utilize Magenta.js models directly within Ableton Live. The plugins read and write MIDI from Ableton’s MIDI clips. Or if you don’t have Ableton, you can just use MIDI files from your desktop.
The first release includes 5 apps: Generate, Continue, Interpolate, Groove, and Drumify.
Generate uses MusicVAE to randomly generate 4 bar phrases from its model of melodies and drum patterns learned from hundreds of thousands of songs. Continue can extend a melody or drum pattern you give it, and Interpolate can combine features of your inputs to produce new ideas or create musical transitions between phrases. Groove is like many “humanize” plugins, but it adds human-like timing and velocity to drum parts based on learned models of performances by professional drummers. Drumify is similar to Groove, but it can turn any sequence into an accompanying drum performance.
Given how simple these plugins make it to interact with complex machine learning models, we plan on using this platform to release more Magenta.js tools and models in the future.
You can read more about what these 5 plugins do and try them out yourself at g.co/magenta/studio, but in this blog post we want to focus a bit more on how we created these tools and why we did it in the first place.Read full post.
Editorial Note: One of the most rewarding experiences when putting out something for the world to use is to see someone build upon it. This is why we were very excited to see that a year after we open-sourced the code and model checkpoints for an arbitrary image stylization network architecture, Reiichiro Nakano had ported the model to TensorFlow.js. We reached out to Rei after noticing his demo online and he graciously accepted to contribute his code and model checkpoint to Magenta.js as the seed of our new image library @magenta/image. In this post, he shares his experience porting a deep learning model to TensorFlow.js, as well as optimizing it for a fast browser experience.
Shortly after deeplearn.js was released in 2017, I used it to port one of my favorite deep learning algorithms, neural style transfer, to the browser. One year later, deeplearn.js has evolved into TensorFlow.js, libraries for easy browser-based style transfer have been released, and my original demo no longer builds. So I started looking for a new project.
One of the main points of feedback I received from the community was that people wanted to provide their own style images to be used for stylization. Most style transfer models in the browser, including mine, are based on Johnson, et al 2016, which requires training a separate neural network for each style image. This means that in order to create pastiches of their own artwork, artists would have to train a separate model and port it to the browser–a process that requires a powerful GPU, several hours of training, and non-trivial technical know-how. A more desirable solution would be to consider a model that can already perform fast style transfer on any pair of content and style, and port that to the browser.Read full post.
Generating long pieces of music is a challenging problem, as music contains structure at multiple timescales, from milisecond timings to motifs to phrases to repetition of entire sections. We present Music Transformer, an attention-based neural network that can generate music with improved long-term coherence. Here are three piano performances generated by the model:
I’m a musician and a creative technologist with Google’s Pie Shop, an experience design studio tasked with translating the complex concepts behind emerging technologies at Google into tangible exhibits. For the last year or so I’ve been thinking about and designing tools that help musicians make use of Magenta’s musical models.
The project began as a browser based tool, but this summer the Pie Shop team and I also turned it into an interactive installation in the form of a latent space of melodies that you can walk on.
As a musician – someone who spent a lot of time studying and attempting to master music theory – I was initially very skeptical about applying machine learning to music. However, as a technologist and composer who uses computers as part of my music making, I saw pretty quickly how artistically interesting the idea of a musical palette could be.Read full post.
MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) is a dataset composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. This new dataset enables us to train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave.
Here’s an excerpt of music composed by a Music Transformer model by Huang et al. trained on MIDI data transcribed from the piano audio in the dataset and then synthesized using a WaveNet model also trained using MAESTRO.
We are making MAESTRO available under a Creative Commons Attribution Non-Commercial Share-Alike license. More information and download links are on the MAESTRO dataset webpage.
Full details about the dataset and our Wave2Midi2Wave process are available in our paper: Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset.Read full post.
We introduce Piano Genie, an intelligent controller that maps 8-button input to a full 88-key piano in real time:
Piano Genie is in some ways reminiscent of video games such as Rock Band and Guitar Hero that are accessible to novice musicians, with the crucial difference that users can freely improvise on Piano Genie rather than re-enacting songs from a fixed repertoire. You can try it out yourself via our interactive web demo!Read full post.
Inspired by Steve Reich’s Music for 18 musicians, I used machine learning to create a visual to go along with it:
It uses videos recorded from train windows, with landscapes that moves from right to left, to train a machine learning (ML) algorithm. First, it learns how to predict the next frame of the videos, by analyzing examples. Then it produces a frame from a first picture, then another frame from the one just generated, etc. The output becomes the input of the next calculation step. So, except for the initial image that I chose, all the other frames were generated by the algorithm. In other words, the process is a feedback loop made of an artificial neural network.full post.
Many of the generative models in Magenta.js require music to be input as a symbolic representation like MIDI, But what if you only have audio?
Try out the demo app Piano Scribe shown below to see the library in action for youself. If you don’t have recordings of a piano handy, you can try singing to it, and it will do its best!
Learn how to use the library in your own app in the documentation and share what you make using #madewithmagenta!Read full post.
Previously, we introduced MusicVAE, a hierarchical variational autoencoder over musical sequences. In this post, we demonstrate the use of MusicVAE to model a particular type of sequence: individual measures of General MIDI music with optional underlying chords.
General MIDI is a symbolic music representation that uses a standard set of 128 instrument sounds; this restriction to predefined instruments like “Honky-Tonk Piano” and “SynthStrings 1” often results in a cheesy sound reminiscent of old video game music. We use General MIDI here as basic representation to explore polyphonic music generation with multiple instruments, not because we expect it to make a comeback.
With that out of the way, here is a CodePen that demonstrates a few of the things you can do with such a model:Read full post.
I’m one of those people who always loved music but never became a musician, and was left feeling vaguely wistful by what could have been. That is until a couple of years ago, when something connected and I found a way to make a lot more room for music in my life while not straying too far from the path I was already on professionally.
The key realization was that even though I was not a musician, I could take my existing skills and interests in software development and design and use them as a lens to point toward music. This illuminated the direction I’ve been heading in ever since: Exploring intersections between music, software, design, and AI - and having a blast doing it.Read full post.
Here is a simple demo we made with it that plays an endless stream of MusicVAE samples:Read full post.
When a painter creates a work of art, she first blends and explores color options on an artist’s palette before applying them to the canvas. This process is a creative act in its own right and has a profound effect on the final work.
Musicians and composers have mostly lacked a similar device for exploring and mixing musical ideas, but we are hoping to change that. Below we introduce MusicVAE, a machine learning model that lets us create palettes for blending and exploring musical scores.
As an example, listen to this gradual blending of 2 different melodies, A and B. We’ll explain how this morph was achieved throughout the post.
Part of the goal of Magenta is to close the loop between artistic creativity and machine learning. Earlier this year, we released NSynth (Neural Audio Synthesis), a new approach to audio synthesis using neural networks.To make the algorithm more accessible to musicians, we created playable interfaces such as the Sound Maker and the Ableton Live plugin. We’ve been delighted to see the creative uses of the algorithm, from industrial dubstep to scenic atmospherics.
As an experiment in making machine learning even more tactile, immediate, playable, and fun, we’ve collaborated with Creative Lab to create NSynth Super: an open source hardware version of the instrument. Accessibility and community are key to our mission, and this hardware release is no different. On GitHub, you’ll find instructions and a list of materials and tools you’ll need to make your own NSynth Super. We’re excited to hear the new sounds and music you create with it. Learn more at g.co/nsynthsuper.Read full post.
Update (10/30/18): Read about improvements and a new dataset in The MAESTRO Dataset and Wave2Midi2Wave!
Onsets and Frames is our new model for automatic polyphonic piano music transcription. Using this model, we can convert raw recordings of solo piano performances into MIDI.
For example, have you ever made a recording of yourself improvising at the piano and later wanted to know exactly what you played? This model can automatically transcribe that piano recording into a MIDI pianoroll that could be used to play the same music on a synthesizer or as a starting point for sheet music. Automatic transcription opens up many new possibilities for analyzing music that isn’t readily available in notated form and for creating much larger training datasets for generative models.
We’re able to achieve a new state of the art by using CNNs and LSTMs to predict pitch onset events and then using those predictions to condition framewise pitch predictions.
Editorial Note: We’re excited to feature a guest blog post by another member of our extended community, Hanoi Hantrakul, whose team recently won the Outside Lands Hackathon by building an interactive application based on NSynth.Read full post.
We present Performance RNN, an LSTM-based recurrent neural network designed to model polyphonic music with expressive timing and dynamics. Here’s an example generated by the model:
Update (01/03/19): Try out the new magic-sketchpad game!
Try the sketch-rnn demo.
For mobile users on a cellular data connection: The size of this first demo is around 5 MB of data. Everytime you change the model in the demo, you will use another 5 MB of data.
We made an interactive web experiment that lets you draw together with a recurrent neural network model called sketch-rnn.Read full post.
Editorial Note: One of the best parts of working on the Magenta project is getting to interact with the awesome community of artists and coders. Today, we’re very happy to have a guest blog post by one of those community members, Parag Mital, who has implemented a fast sampler for NSynth to make it easier for everyone to generate their own sounds with the model.Read full post.
I review (with animations!) backprop and truncated backprop through time (TBPTT), and introduce a multi-scale adaptation of TBPTT to hierarchical recurrent neural networks that has logarithmic space complexity. I wished to use this to study long-term dependencies, but the implementation got too complicated and kind of collapsed under its own weight. Finally, I lay out some reasons why long-term dependencies are difficult to deal with, going above and beyond the well-studied sort of gradient vanishing that is due to system dynamics.
Last summer at Magenta, I took on a somewhat ambitious project. Whereas most of Magenta was working on the symbolic level (scores, MIDI, pianorolls), I felt that this left out several important aspects of music, such as timbre and phrasing. Instead, I decided to work on a generative model of real audio.Read full post.
Sketch-RNN, a generative model for vector drawings, is now available in Magenta. For an overview of the model, see the Google Research blog from April 2017, Teaching Machines to Draw (David Ha). For the technical machine learning details, see the arXiv paper A Neural Representation of Sketch Drawings (David Ha and Douglas Eck).
Vector drawings of flamingos from our Jupyter notebook.
In a previous post, we described the details of NSynth (Neural Audio Synthesis), a new approach to audio synthesis using neural networks. We hinted at further releases to enable you to make your own music with these technologies. Today, we’re excited to follow through on that promise by releasing a playable set of neural synthesizer instruments:
- An interactive AI Experiment made in collaboration with Google Creative Lab that lets you interpolate between pairs of instruments to create new sounds.
- A MaxForLive Device that integrates into both Max MSP and Ableton Live. It allows you to explore the space of NSynth sounds through an intuitive grid interface. [DOWNLOAD]
One of the goals of Magenta is to use machine learning to develop new avenues of human expression. And so today we are proud to announce NSynth (Neural Synthesizer), a novel approach to music synthesis designed to aid the creative process.
Unlike a traditional synthesizer which generates audio from hand-designed components like oscillators and wavetables, NSynth uses deep neural networks to generate sounds at the level of individual samples. Learning directly from data, NSynth provides artists with intuitive control over timbre and dynamics and the ability to explore new sounds that would be difficult or impossible to produce with a hand-tuned synthesizer.
The acoustic qualities of the learned instrument depend on both the model used and the available training data, so we are delighted to release improvements to both:
- A dataset of musical notes an order of magnitude larger than other publicly available corpora.
- A novel WaveNet-style autoencoder model that learns codes that meaningfully represent the space of instrument sounds.
A full description of the dataset and the algorithm can be found in our arXiv paper.Read full post.
Magenta was first announced to the public nearly one year ago at Moogfest, a yearly music festival in Durham, NC that brings together together artists, futurist thinkers, inventors, entrepreneurs, designers, engineers, scientists, and musicians to explore emerging sound technologies.
This year we will be returning to continue the conversation, share what we’ve built in the last year, and help you make music with Magenta.Read full post.
Google Creative Lab just released A.I. Duet, an interactive experiment which lets you play a music duet with the computer. You no longer need code or special equipment to play along with a Magenta music generation model. Just point your browser at A.I. Duet and use your laptop keyboard or a MIDI keyboard to make some music. You can learn more by reading Alex Chen’s Google Blog post. A.I. Duet is a really fun way to interact with a Magenta music model. As A.I. Duet is open source, it can also grow into a powerful tool for machine learning research. I learned a lot by experimenting with the underlying code.
Read full post.
We are excited to announce our new RL Tuner algorithm, a method for enchancing the performance of an LSTM trained on data using Reinforcement Learning (RL). We create an RL reward function that teaches the model to follow certain rules, while still allowing it to retain information learned from data. We use RL Tuner to teach concepts of music theory to an LSTM trained to generate melodies. The two videos below show samples from the original LSTM model, and the same model enchanced using RL Tuner.
Read full post.
Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur have extended image style transfer by creating a single network which performs more than one stylization of an image. The paper has also been summarized in a Google Research Blog post. The source code and trained models behind the paper are being released here.
The model creates a succinct description of a style. These descriptions can be combined to create new mixtures of styles. Below is a picture of Picabo stylized with a mixture of 3 different styles. Adjust the sliders below the image to create more styles.
(or Learning Music Learned From Music)
A few days ago, DeepMind posted audio synthesis results that included .wav files generated from a training data set of hours of solo piano music. Each wave file (near the bottom of their post) is 10 seconds long, and sounds very much like piano music. I took a closer look at these samples.Read full post.
The magenta team is happy to announce our first step toward providing an easy-to-use interface between musicians and TensorFlow. This release makes it possible to connect a TensorFlow model to a MIDI controller and synthesizer in real time.
Don’t have your own MIDI keyboard? There are many free software components you can download and use with our interface. Find out more details on setting up your own TensorFlow-powered MIDI rig in the README.Read full post.
One of the difficult problems in using machine learning to generate sequences, such as melodies, is creating long-term structure. Long-term structure comes very naturally to people, but it’s very hard for machines. Basic machine learning systems can generate a short melody that stays in key, but they have trouble generating a longer melody that follows a chord progression, or follows a multi-bar song structure of verses and choruses. Likewise, they can produce a screenplay with grammatically correct sentences, but not one with a compelling plot line. Without long-term structure, the content produced by recurrent neural networks (RNNs) often seems wandering and random.
But what if these RNN models could recognize and reproduce longer-term structure?Read full post.
This past June, Magenta, in parternship with the Artists and Machine Intelligence group, hosted the Music, Art and Machine Intelligence (MAMI) Conference in San Francisco. MAMI brought together artists and researchers to share their work and explore new ideas in the burgeoning space intersecting art and machine learning.Read full post.
Magenta’s primary goal is to push the envelope forward in research on music and art generation. Another goal of ours is to teach others about that research. This includes disseminating important works in the field in one place, a resource that if curated, will be valuable to the community for years to come.Read full post.
We are excited to release our first tutorial model, a recurrent neural network that generates music. It serves as an end-to-end primer on how to build a recurrent network in TensorFlow. It also demonstrates a sampling of what’s to come in Magenta. In addition, we are releasing code that converts MIDI files to a format that TensorFlow can understand, making it easy to create training datasets from any collection of MIDI files.Read full post.
We’re happy to announce Magenta, a project from the Google Brain team that asks: Can we use machine learning to create compelling art and music? If so, how? If not, why not? We’ll use TensorFlow, and we’ll release our models and tools in open source on our GitHub. We’ll also post demos, tutorial blog postings and technical papers. Soon we’ll begin accepting code contributions from the community at large. If you’d like to keep up on Magenta as it grows, you can follow us on our blog, watch our GitHub repo, and join our discussion group.Read full post.