Keywords:-
Article Content:-
Abstract
This paper discusses the models and methods of machine learning methods used in environmental sound classification systems. A sound classification subsystem could be implemented in the various Smart City, Smart Farming systems, healthcare etc. An automated sound classification system could be decomposed into following subsystems: the audio representation, the features extraction, the classifier and the accuracy estimation. This paper deals with a convolutional neural network with attention mechanism for sound classification and autoencoder for data augmentation. The objective of the paper is to develop the optimal sound classifier for datasets with lack of observations.
References:-
References
Kong Q., Xu Y., Wang W. & Plumbley M.D. (2017). Convolutional gated recurrent neural network incorporate spatial features for audio tagging. The 2017 International Joint Conference on Neural Networks (IJCNN 2017), Anchorage, Alaska,
DOI:https://doi.org/10.1109/IJCNN.2017.7966291.
Alias F., Socoro J.C. & Sevillano X. (2016). A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Applied Sciences, № 6(5):143.
Bertin-Mahieux T., Eck D. & Mandel M. (2011). Automatic tagging of audio: the state-of-the-art. Machine audition: principles, algorithms and systems. Visnyk IGI Global, pp. 334–352.
Camastra F. & Vinciarelli A. (2015). Machime learning for Audio, Image and Video analysis. London: Springer-Verlag.
Piczak K.J. (2015). ESC: Dataset for Environmental Sound Classification. Proceedings of the 23rd Annual ACM Conference on Multimedia, Brisbane, Australia,
DOI: http://dx.doi.org/10.1145/2733373.2806390.
Sturm B.L. (2014). A Survey of Evaluation in Music Genre Recognition. Adaptive Multi-media Retrieval: Semantics, Context, and Adaptation. Lecture Notes in Computer Science, Vol. 8382, pp. 29–66, DOI: https://doi.org/10.1007/978-3-319-12093-5_2.
Xu Y., Huang Q., Wang W., Foster P., Sigtia S., Jackson P.J.B & Plumbley M.D. (2017). Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging. IEEE/ACM transactions on audio, speech and language processing, Vol. 25, Issue 6, pp. 1230–1241,
DOI:https://doi.org/10.1109/TASLP.2017.2690563
Sharma J., Granmo O.-C. & Goodwin M. (2020). Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network. Preprint arXiv.org, 13 p, URL: https://arxiv.org/abs/1908.11219.
Tang B., Li Y., Li X., Xu L. & etc. (2019). Deep CNN Framework for Environmental Sound Classification using Weighting Filters. International Conference on Mechatronics and Automation (ICMA 2019): Proceedings of 2019 IEEE International Conference. (Tianjin, 4-7 August 2019). Tianjin, China, pp. 2297-2302.
Zhang Z., Xu S., Qiao T. & etc. Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification. Pattern Recognition and Computer Vision (PRCV 2019): Lecture Notes in Computer Science. (Xi’an, 8-11 November 2019). Vol. 11857, pp. 261-271. URL: https://arxiv.org/abs/1907.02230