TY - GEN
T1 - Enhancing music information retrieval by incorporating image-based local features
AU - Kaliciak, Leszek
AU - Horsburgh, Ben
AU - Song, Dawei
AU - Wiratunga, Nirmalie
AU - Pan, Jeff
PY - 2012
Y1 - 2012
N2 - This paper presents a novel approach to music genre classification. Having represented music tracks in the form of two dimensional images, we apply the "bag of visual words" method from visual IR in order to classify the songs into 19 genres. By switching to visual domain, we can abstract from musical concepts such as melody, timbre and rhythm. We obtained classification accuracy of 46% (with 5% theoretical baseline for random classification) which is comparable with existing state-of-the-art approaches. Moreover, the novel features characterize different properties of the signal than standard methods. Therefore, the combination of them should further improve the performance of existing techniques. The motivation behind this work was the hypothesis, that 2D images of music tracs (spectrograms) perceived as similar would correspond to the same music genres. Conversely, it is possible to treat real life images as spectrograms and utilize music-based features to represent these images in a vector form. This points to an interesting interchangeability between visual and music information retrieval.
AB - This paper presents a novel approach to music genre classification. Having represented music tracks in the form of two dimensional images, we apply the "bag of visual words" method from visual IR in order to classify the songs into 19 genres. By switching to visual domain, we can abstract from musical concepts such as melody, timbre and rhythm. We obtained classification accuracy of 46% (with 5% theoretical baseline for random classification) which is comparable with existing state-of-the-art approaches. Moreover, the novel features characterize different properties of the signal than standard methods. Therefore, the combination of them should further improve the performance of existing techniques. The motivation behind this work was the hypothesis, that 2D images of music tracs (spectrograms) perceived as similar would correspond to the same music genres. Conversely, it is possible to treat real life images as spectrograms and utilize music-based features to represent these images in a vector form. This points to an interesting interchangeability between visual and music information retrieval.
KW - Co-occurrence matrix
KW - Colour moments
KW - Fourier transform
KW - K-means algorithm
KW - Local features
UR - http://www.scopus.com/inward/record.url?scp=84871580503&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-35341-3_19
DO - 10.1007/978-3-642-35341-3_19
M3 - Conference contribution
AN - SCOPUS:84871580503
SN - 9783642353406
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 226
EP - 237
BT - Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings
T2 - 8th Asia Information Retrieval Societies Conference, AIRS 2012
Y2 - 17 December 2012 through 19 December 2012
ER -