top of page

Key Concepts​​

Before diving in, it's important to cover some key concepts pertaining to music theory and signal processing.

Pitch

​

Pitch is a categorization of sounds into notes on a musical scale. For example, the note A4 (a note corresponding to A in the fourth octave) is commonly used to tune orchestras and is generally defined as a wave travelling at a frequency of 440 cycles per second or Hertz (Hz).

​

How do we determine pitch? Simply speaking, the human ear perceives pitch relatively as opposed to absolutely (unless you're one of the very few people who have perfect pitch). This means that we can take a note like A4 and pitches below that frequency (< 440) correspond to lower notes (G4, A3, etc.) whereas higher frequencies will correspond to higher notes (B4, C5, etc.)

Octave

​

An octave is a marker of the level that each pitch sits at. Think of it like a parking garage. If vehicles can be assigned to different parking spaces, but there are only so many spots what do we do when we run out? (Spoiler: we build another floor). In most Western music notation, there are only 12 spots on a scale that pitches can be assigned to. On a piano keyboard, these 12 spots can be separated into 7 white keys and 5 black keys. The white keys correspond to notes C, D, E, F, G, A, and B. The black keys that sit "between" white keys are named in relation to the white keys around them (the black key between C and D is called C# or Db depending on the piece of music, but we won't get into that for now). Once a note crosses over the end of a scale it moves into the beginning of the next octave. Higher octaves indicate higher pitches. So a 'C' note in the fourth octave is notated as 'C4'. Conveniently (or inconveniently for us), a note that is one octave higher is double the frequency of the note in the lower octave.

image.png

Wikipedia Contributors, “Octave,” Wikipedia, Feb. 18, 2019. https://en.wikipedia.org/wiki/Octave

Harmonics

​

The fundamental frequency (notated as f0) tells us the pitch of the note, however a sound wave usually produces additional frequencies called harmonics that sit above the fundamental frequency. These are calculated by taking multiples of the f0. Recall that an octave is double the frequency of a single note; this is the same relation as the first and second harmonic of that note.

Polyphony

 

Polyphony is the playing of multiple notes together at once. This is also known as a chord if these notes are played for a longer duration. Pitch detection algorithms work better on monophonic passages of music, where only one note is played at a time.

Stems

 

A stem is a collection of audio sources mixed together for the purpose of processing later. Specifically, in this project, a stem is a collection of instruments fitting into one category all in one stem. A music stem splitter would split the song into multiple stems belonging to different categories like bass, drums, vocals

Fast Fourier Transform (FFT)

 

An FFT (Fast Fourier Transform) is a faster, computer-centric Discrete Fourier Transform function. Its importance stems from its ability to quickly and reliably transform data to the Fourier basis. 

Convolutional Neural Network (CNN)

 

Convolutional neural networks are a subset of machine learning primarily used for recognition purposes. For a more in-depth explanation, please visit this website.

Summation of Residual Harmonics (SRH)

 

The summation of residual harmonics (SRH) method takes the residual signal, the signal taken from the difference of the Energy of k harmonics and the Energy of (k-0.5) harmonics, and adds them up for all harmonics k=2 to N. It then adds this to the Energy of the original signal. The Energy of the signal is the area under the curve of the original signal, in other words an integration.

image.png
bottom of page