Fractal of Periodic Musical Elements
a re-taxonomy of music for machine learning
Time-independent music notation
To use the taxonomy - which I will propose in the next paragraph - with machine learning, first, a music notation has to be established that is both understood by machine learning libraries and possible to be synthesized with elements from the taxonomy. In programming languages, multidimensional arrays and tensors are a way of representing mathematical matrices. Machine learning libraries use the data (images, videos, music files, etc.) notated in such matrices to train models. The main difference between an array and a tensor is that tensors can be backed by the accelerator memory like GPU, whereas arrays are computed with CPU. Both are notated in a similar way. A mathematical matrix is a table made of rows and columns containing numbers. In a form of an array, it is notated in square brackets notation with data listed after commas. For instance, one row: [1, 2, 3, 4, 5] is called a one-dimensional array. Arrays can be nested inside other arrays. For instance, a 2-dimensional array (two rows in one column) can be notated like this: [ [1,2,3,4,5], [1,2,3,4,5] ].
The notation that I would like to propose is not a score and not an audio file, but is inspired by both of them. Audio files contain data with a certain sampling rate, which means that a piece of information about a continuous signal is sampled into a discrete signal. At fast sampling rates like the quality of a CD recording (44100 Hz), the discreteness of this signal is not perceivable to the human ear, just as if the signal was continuous11. In my notation, inspired by the sampling rate of an audio recording, musical data is notated in "samples" - one-dimensional arrays containing two types of values. I call these values "frequencies" (for their pitch) and "amplitudes" (for their loudness), but they should not be understood as physical frequencies and amplitudes as defined in digital signal processing. I call them this way, because the way in which they represent durations and dynamics is inspired by, although not mimicking, the real behavior of continuous signals. I call my notation time-independent because it only represents an order in which equally-spaced samples of frequencies and their amplitudes should be executed. Durations and dynamics of the notes can be read from this data, with the following rules.
Rhythm
If a pianist interpreting classical music notation would see two notes of the same pitch and duration notated one after another, he would play them as two separate notes with a short but noticeable break in between. The same notation played by a computer, sending midi notes to a synthesizer generating a simple sinusoidal wave would sound like one continuous tone. In my notation, a note with the same pitch occurring in two and more consecutive samples means that the duration of the first one of them is prolonged until the sample where the value is not present. If two notes of the same duration and pitch should be performed one after another, there has to be a musical rest lying on a sample in between them.
This way, a simple two-dimensional array with different sampling rates is enough to write any combination of harmony and rhythm. The duration of each sample has to be equal to the shortest note in the notated composition. For instance, if a triplet eight-note is the shortest value in a piece, three triplet eight-notes of the same pitch mean a quarter note in this notation. Musical rests occupy space in a matrix as elements of an array marked “undefined”. Their durations connect similarly.
Dynamics
In physics, two sinusoidal waves of the same frequency, and in the same phase, when superimposed with each other are perceived as one frequency of the same pitch, but with an amplitude equal to the sum of the amplitudes of these frequencies. On a more general level, this phenomenon is described by a family of Fourier transforms, which allow the decomposition of any waveform to a set of sinusoidal waves. In my notation, the loudness of frequencies is notated in a similar way - by applying a number of frequencies with the same pitch located on the same sample. The softest frequency in a notated piece should have an amplitude of 1 and each louder frequency should have higher amplitudes marked accordingly.
Structure
All information about the used pitches and their dynamics is notated in a structure: [ [ frequencies ], [ amplitudes ] ]. For example, a C major triad could be notated as follows:
[ <- opening array of arrays
[ <- opening array for frequencies
[ 261.62, 329.62, 391.99 ], <- sample of frequencies 1
[ 349.22, 440.00, 523.25 ], <- sample of frequencies 2
[ 391.99, 493.88, 587.32 ], <- sample of frequencies 3
[ 261.62, 329.62, 391.99 ] <- sample of frequencies 4
], <- closing array for frequencies
[ <- opening array for amplitudes
[ 1, 1, 1 ], <- sample of amplitudes 1
[ 2, 2, 2 ], <- sample of amplitudes 2
[ 3, 3, 3 ], <- sample of amplitudes 3
[ 1, 1, 1 ] <- sample of amplitudes 4
] <- closing array for amplitudes
] <- closing array of arrays
Following the described way of reading rhythms and dynamics, the notation allows for building up complex musical structures of harmony, rhythm, and dynamics by simple additive synthesis of matrices. What is more, midi-like notes based on integer numbers could be used instead of frequencies. Frequencies can be mapped onto integer numbers with the use of associative arrays. In programming languages, associative arrays are data types that store collections of pairs containing keys and values, for instance: [ midi note number 60, frequency 261.62 ]. However, when calculating values of a musical scale it is important to know that while frequencies belong to an exponential function y=x^2, integer numbers belong to a linear function y=x12. For this reason, in my notation, I call the system using frequencies “exponential” and the system using integer numbers “linear”. I explain this further in the next paragraph.
Fig. 2. In music notation; two upper music lines contain notes of the same pitch in the same location. Superimposed, the amplitude of this pitch doubles. On a spectrogram: the two upper music lines played together by a synthesizer, sound like the bottom music stave.