Probabilistically Modeling Scale Theory

Matt Chiu, Eastman School of Music

This paper draws from research in natural language processing (NLP) -- a branch of computational linguistics -- to develop a probabilistic, machine-learning model for studying scales and macroharmonies. While the word2vec algorithm is usually used for text, using word2vec on a windowed corpus derives vector representations for scales/macroharmonic collections (as PC sets). These vector representations, known as embeddings, are derived from contextual and syntactic placement of a collection. Comparing major, diatonic scale embeddings learned from Mozart pieces in the Yale Classical Archives corpus returns a circle-of-fifths analog. Deriving embeddings for other composer -- Liszt, Saint-Saëns, and Debussy -- reveal stylistic differences in scale treatment. We then demonstrate one analytical application of embeddings with an analysis of Lili Boulanger's "Parfois, je suis triste."