|

Chronological Ordering of the Audio Data using 2D Spectrograms

Authors: Alfimtsev A.N. , Nazarova S.I. Published: 17.06.2015
Published in issue: #3(102)/2015  
DOI: 10.18698/0236-3933-2015-3-127-139

 
Category: Instrument Engineering, Metrology, Information-Measuring Instruments and Systems | Chapter: Instrumentation and Methods to Transform Images and Sound  
Keywords: pattern recognition, two-dimensional spectrogram, feature set, similarity matrix, chronological ordering

The paper introduces an automatic quantitative method for both the speech fragments analysis and chronological ordering. The method consists of the following: audio fragments are initially presented in the form of two-dimensional spectrograms, then a large set of 1025 numerical descriptors extracting from both the raw spectrograms and their transforms is analyzed. The similarity value between two audio fragments is computed using a variation of the Weighted K-Nearest Neighbor scheme. A similarity tree is designed to visualize differences between the audio fragments. Some speech fragments of well-known politicians were used for the study. The proposed method proves to be efficient for chronological ordering of the audio fragments. It seems to introduce new ways of developing software systems for automated processing of audio archives and analysis of the speech characteristics.

References

[1] Tzanetakis G., Cook P. Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 2002, vol. 10, pp. 293-302.

[2] Guo G., Li S.Z. Content-based audio classification and retrieval by support vector machines. IEEE Transactions on Neural Networks, 2003, vol. 14, pp. 209-215.

[3] Li T., Ogihara M., Li Q. A comparative study on content-based music genre classification. SIGIR 03, 2003, pp. 282-289.

[4] Bagci U., Erzin E. Automatic Classification of Musical Genres Using Inter-Genre Similarity. IEEE Signal Processing Letters, 2007, vol. 14, pp. 521-524.

[5] Yang Y.H. et al. Toward multi-modal music emotion classification. Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing, 2008, pp. 70-79.

[6] Zlatintsi A., Maragos P. Multiscale fractal analysis of musical instrument signals with application to recognition. IEEE Transactions on Audio, Speech and Language Processing, 2013, vol. 21, pp. 737-748.

[7] McFee B., Barrington L., Lanckriet G.R.G. Learning content similarity for music recommendation. IEEE Transactions on Audio, Speech and Language Processing, 2012, vol. 20, pp. 2207-2218.

[8] Serra Y. et al. Predictability of music descriptor time series and its application to cover song detection. IEEE Transactions on Audio, Speech and Language Processing, 2012, vol. 20, pp. 514-525.

[9] Manders A.J., Simpson D.M., Bell S.L. Objective prediction of the sound quality of music processed by an adaptive feedback canceller. IEEE Transactions on Audio, Speech and Language Processing, 2012, vol. 20, pp. 1734-1745.

[10] Downie D. The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research. Acoustical Science and Technology, 2008, vol. 29, pp. 247-255.

[11] Casey M. et al. Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE, 2008, vol. 96, pp. 668-695.

[12] George J., Shamir L. Computer analysis of similarities between albums in popular music. Pattern Recognition Letters, 2014, vol. 45, pp. 78-84.

[13] Wndchrm - an open source utility for biological image analysis / L. Shamir et al. // Source Code For Biology And Medicine. 2008. URL: http://www.scfbm.org/content/3/1/13 (accessed: 01.10.2014).

[14] Shamir L. Evaluation of face datasets as tools for assessing the performance of face recognition methods. International Journal of Computer Vision, 2008, vol. 79, pp. 225-230.

[15] Orlov N. et al. WND-CHARM: Multipurpose image classification using compound image transforms. Pattern Recognition Letters, 2008, vol. 29, pp. 1684-1693.

[16] Deshpande H., Singh R., Nam U. Classification of music signals in the visual domain. Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-01), 2001, vol. 1, pp. 1-10.

[17] Holzapfel A., Stylianou Y. Musical genre classification using nonnegative matrix factorization-based features. IEEE Transactions on Audio, Speech and Language Processing, 2008, vol. 16, pp. 424-434.

[18] Costa Y.M.G. et al. Music genre recognition using spectrograms. 18th International Conference on Systems, Signals and Image Processing, 2011, pp. 1-4.

[19] Shamir L. et al. IICBU 2008 - A proposed benchmark suite for biological image analysis. Source Code for Biology and Medicine, 2008, vol. 46, pp. 943-947.

[20] Shamir L. Automatic morphological classification of galaxy images. Monthly Notices of the Royal Astronomical Society, 2009, vol. 399, pp. 1367-1372.

[21] Shamir L. Computer analysis reveals similarities between the artistic styles of Van Gogh and Pollock. Leonardo, 2012, vol. 45, pp. 149-154.

[22] Lim J.S. Two-Dimensional signal and image processing. Prentice Hall, 1990, pp. 4245.

[23] Gabor D. Theory of communication. Journal of IEEE, 1946, vol. 93, pp. 429-457.

[24] Gregorescu C., Petkov N., Kruizinga P. Comparison of texture features based on Gabor filters. IEEE Transactions on Image Processing, 2002, vol. 11, pp. 1160-1167.

[25] Hadjidementriou E., Grossberg M., Nayar S. Spatial information in multiresolution histograms. IEEE Conference on Computer Vision and Pattern Recognition, 2001, vol. 1, p. 702.

[26] Prewitt J.M. Object enhancement and extraction.Picture processing and psychopictoris. Academic Press, 1970, pp. 75-149.