Separation of singing voice and music is an interesting research topic since singing voice contains abundant information, such as melody, singer’s characteristic, lyrics, emotion, etc. All of these resources in singing voice are useful for music information retrieval, singer identification, melody extraction, audio content analysis, or even karaoke gaming.
At the same time, it is also a challenging topic because existing methods are still not so practical. Repetition is a special characteristic of music. Most songs have their own repeating accompaniment structures over which the singers lay varying vocals on them. This work studies the repeating structure of music and implement the algorithm based on the repeating pattern of the music background. Using repeating pattern to extract the singing voice from music has its advantage of being simple, fast, blind and automatic.
Our basic algorithm has four key issues to address during implementation.
- Repeating period identification: finding the repeating period in mixture.
- Repeating segment modeling: using the repeating period to segment the music into several segments and defining the repeating segment.
- Repeating patterns extraction: using the repeating segment model to further remove the singing voice from the mixture.
- Result analysis: determining the effectiveness of the algorithm by calculating the energy of mixture signal, original vocal signal, original music signal, separated vocal signal and separated music signal.
Repeating Period Identification:
With time interval of 0.04 seconds, 2048 samples and frequency of 44100 HZ, we calculate the Short-Time Fourier transform of mixture signal in MATLAB , we could obtain the mixture spectrogram for the whole song. Using the autocorrelation on mixture spectrogram , that is, comparing the segment and its lagged version over successive time interval to measure the similarity in the segment.
After executing the algorithm, we are expected to get two separated signal from original mixture audio: music signal and voice signal. It is expected that we could hear a clear difference between music signal and voice signal as long as there exists the repeating pattern in original mixture audio.
We could see that the spectrograms for original voice and separated voice are similar. This reflects that we do extract the singing voice from the music. As for the spectrograms for original music and separated music, we could find that there are more high frequency elements in separated music than in original music because the voice in mixture audio could not be completely removed.
This algorithm depending on identifying the repeating pattern in music to separate the singing voice from the music. Consequently, this algorithm is highly sensitive to the repeating period of the music. Identifying the accurate repeating pattern is the core of the algorithm. As long as we could obtain the repeating period of a mixture, we could effectively filter the singing voice from the mixture audio. The disadvantage of this algorithm is that this algorithm still assigns some music in separated voice signal due to the reason that only the parts that have highly repeating pattern of music get separated.
However, although this algorithm could not get 100% original voice signal, its advantage is still laudable. There is only averaged about 17.1% contamination by the other channel which keep in separated voice. The suppression reaches 5:1.71. since our algorithm does not delve into the complex frameworks of music and it is applicable to most mixture audio, it has the advantage of fast, simple, blind and automatic.
Source: Cornell University
Author: Tengli Fu