Speech synthesis is an area of computer science with many practical uses, such as enabling people with visual impairments to take part of text and to provide more human-like feedback from information systems. A similar area of research is text-to-song, where systems comparable to those used in text-to-speech provide mappings from text to melodic units of song.
This paper discusses how a text-to-song algorithm can be developed and what parameters affect what emotion is communicated. Fifty participants listened to music generated with our algorithm. Results show that tempo and mode both heavily account for what emotion is communicated; a melody performed with a tempo of 250 bpm was perceived as significantly more happy than a performance with a tempo of 120 bpm, and a melody in major tonality was perceived as significantly more happy than a melody in minor tonality.
Combined, these parameters gave even more significant results. A fast tempo combined with major tonality produced a performance that was perceived as even more happy. The opposite was observed when a slow tempo was combined with minor tonality. When a fast tempo was combined with a minor tonality the average answer was neutral with answers distributed over the whole spectrum from sad to happy.
A slow tempo combined with a major tonality gave almost identical results. We concluded that generating emotionally expressive song with the use of an algorithm is definitely possible, but that the methodology can be improved in order to convey emotions even more clearly.
Source: KTH
Authors: Gramfors, Dexter | Johansson, Andreas