Results | Music Transcriber

Results

Included in this section are the results of the project team's efforts. Notably, every objective undertaken had some degree of success with some greater than others. In the end, a functional system was achieved that could split the song into many parts, perform pitch detection on each of those parts individually, and transcribe it into a MIDI file.

Hybrid Demucs

After the first progress report, we began researching many different machine learning algorithms to implement source separation (also referred to as "stem splitting"). We eventually came across a tutorial published by PyTorch that implements Hybrid Demucs in a much simpler fashion than anything else we had come across.

Hybrid Demucs is a machine learning algorithm is derived from a U-Net convolutional neural network loosely based off of the Wave-U-Net architecture made for audio processing. The in depth math behind the algorithm can be found at this GitHub page. For our project, we used a pretrained model of Hybrid Demucs to speed up the beginning of our project as source separation was only one goal we had. Following this tutorial by PyTorch, we were able to implement a working stem splitter using Python.

Overall, Hybrid Demucs greatly out-performed our expectations by producing separated music reliably and relatively quickly. Provided below is an example of a song undergoing source separation through the Hybrid Demucs model. The output files consist of vocals, bass, drums, and other, where other is the remaining instruments in the song, generally piano or guitar.

MATLAB Pitch

Once we had implemented a functional song splitter using Hybrid Demucs, we then moved on to next task of detecting pitch from a music signal. After installing MATLAB's Audio Toolbox for audio and signal processing, we used the "pitch" algorithm found in this toolbox. This function finds the fundamental frequencies of a given audio signal using the Fast Fourier Transform. We used this function in conjunction with a step function and smoothing to produce discrete pitches that corresponds to musical notes.

The only downside of this function is that it does not work when more than one note is played at a time. So, this function was merely a stepping stone towards our final goal of polyphonic pitch detection (detecting multiple notes being played at the same time).

Basic Pitch

To solve the issue of polyphonic pitch detection, we came across yet another machine learning algorithm that supposedly could perform polyphonic pitch detection and output sheet music. This machine learning algorithm was called Basic Pitch. This is a Python library created by Spotify that automatically generates basic music transcription. The neural network model is a lightweight architecture developed by Spotify's Audio Intelligence Lab. Basic Pitch was quite easy to integrate into our project as it is all contained in a single Python package. Using the algorithm can be done in only a few lines of code.

However, unlike Hybrid Demucs, Basic Pitch underperformed our expectations for a neural-network-based transcription tool. As expected, simple piano songs with few notes being played at the same time generally work well for the algorithm, however, any music with complexity, or with multiple different instruments proves very challenging for the algorithm. Therefore, although Basic Pitch is part of our project and we have provided demo code to use the algorithm, we moved away from this approach and more towards a custom implementation of polyphonic pitch detection described below.

Custom Polyphonic Pitch Detection

Since we were not satisfied with the results from Basic Pitch, we then spent the rest of our project time developing a custom polyphonic (multiple note) pitch detection algorithm. We started this algorithm by analyzing the Fourier Transform plots of simple chords. The difficultly of polyphonic pitch detection presented itself immediately in the form of harmonics - frequencies that are generated overtop the "real" frequencies of a note.

For example, a C4 note will have a peak at around 262 Hz in the frequency domain. However, there will also be noticeable peaks at every frequency above 262 Hz that is a multiple of 2. This means, that the single C4 will generate frequency peaks of 262 Hz, 523 Hz, 1047 Hz, and so on. When you start to add in other notes with all of their harmonics as well, you get a plot with a number peaks that are not "real" notes being played.

After analyzing different types of chords and notes, we noticed a general pattern that arose in the magnitudes of harmonics. Generally, a harmonic above a real note has a lower magnitude than the note itself. In other words, the height of the peaks varies drastically if that peak is caused by a harmonic or real pitch. Taking this information, we wrote a MATLAB script to remove harmonics from a chord, with the function only needing to check the heights of the magnitudes of the Fourier Transform. After a while, we settled on a crude function that can remove harmonics from a chord, whilst preserving the real notes.

MIDI Files

Once we had three semi-functional pitch detection algorithms, we moved on to ways to generate sheet music from an array of pitches. After trying out several packages that didn't do what we needed and attempting to draw and plot our own sheet music (not easy), we found out that Python can create MIDI files. A Musical Instrument Digital Interface, or MIDI for short, is a standardized file format for writing and editing digital music with software. Using the Pretty_MIDI package for Python, we were able to translate the output of our pitch detection algorithms to a MIDI file that is easily read by common music software such as MuseScore.

(We used the librosa Python package several times in this project to handle Fast Fourier Transforms, loading audio signals, and much more. Specifically, to input a note to a MIDI file, you must have the pitch, start, and stop time of that note. Using librosa's onset function, we could detect when each note in a signal starts and the time between notes, giving us the start time and rough note length of every note in a signal)

Putting it all Together

At this point in the project, we had all of the separate functionally working rather together and it was time to put all of the functions together into one cohesive script. As the majority of our code was written in Python, we translated all of the MATLAB pitch detection code to Python so that it could be used without worrying about cross-platform integration.

Our final script integrates all of these previous functions. First, source separation using Hybrid Demucs, quickly plotting some spectrograms and outputting the split audio waveforms, then performing polyphonic pitch detection on each stem other than drums (as the pitch detection algorithm doesn't work great with impulse-like signals), and finally using the generated pitches to create a MIDI file and opening that MIDI file in MuseScore to be read as sheet music!

Below on the left is the initial sheet music and audio for Minuet in G Major. On the right is the output sheet music and audio waveforms from the transcription process.

Minuet in G Major

Transcribed Minuet in G Major

Below is a sample output of the main.py file culminating the entire project together. In this example, Minuet in G Major is run through the program. Although there is no bass, vocals, or drums, the program still runs and puts the entire song in the "other" category corresponding to piano.