Analysis of the influence of sound signal processing parameters on the quality voice command recognition
Keywords:speech recognition, voice commands, melcepstral coefficients, dynamic time warping
AbstractIntroduction. For the task of voice control over different devices recognition of single (isolated) voice commands is required. Typically, this control method requires high reliability (at least 95% accuracy voice recognition). It should be noted that voice commands are often pronounced in high noisiness. All presently known methods and algorithms of speech recognition do not allow to clearly determine which parameters of sound signal can provide the best results.
The main part. On the first level of voice recognition is about preprocessing and extracting of acoustic features that have a number of useful features – they are easily calculated, providing a compact representation of the voice commands that are resistant to noise interference; On the next level given command is looked for in the reference dictionary. To get MFCC coefficients input file has to be divided into frames. Each frame is measured by a window function and processed by discrete Fourier transform. The resulting representation of signal in the frequency domain is divided into ranges using a set of triangular filters. The last step is to perform discrete cosine transform. Method of dynamic time warping allows to get a value that is an inverse of degree of similarity between given command and a reference.
Conclusions. Research has shown that in the field of voice commands recognition optimum results in terms of quality / performance can be achieved using the following parameters of sound signal processing:8 kHz sample rate, frame duration 70–120 ms, Hamming weighting function of a window, number of Fourier samples is 512.
Яцковський В.С. Алгоритм оцінювання темпу музикальних сигналів / В. С. Яцковський, Д.М. Бруслік // Електроніка та системи управління. – 2012. – № 31. – с. 5-9.
Dhingra S. D. Isolated speech recognition using MFCC and DTW / S. D. Dhingra, G. Nijhawan, P. Pandit // International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. – 2007. – Vol. 2, No 8. – pp. 4085 - 4092.
Гладышев К. К. Информативные признаки на основе линейных спек-тральных корней в системах распознавания команд: автореф. дис. на соискание ученой степени канд. техн. наук: спец. 05.13.01 – «Системный анализ, управление и обработка информации» / Гладышев Константин Константинович; СПб. госуд. унив. телекоммуникаций им. проф. М.А. Бонч–Бруевича. – СПб, 2010. – 16 с.
Al–Naymat G. SparseDTW: A Novel Approach to Speed up Dynamic Time Warping. / G. Al–Naymat, S. Chawla, J. Taheri // The 2009 Australasian Data Mining. – 2009. – Vol. 101 – pp. 117–127.
Muda L. Voice Recognition Algorithms using Mel–Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. / L. Muda, M. Begam, I. Elamvazuthi // Journal of computing. – 2010. – Vol. 2, No 3. – pp.
Колоколов А. С. Обработка сигнала в частотной области при распознавании речи. / А. С. Колоколов // Проблемы управления. – № 3. – 2006 г. c. 13–18.
Yatskovsky V.S. and Bruslik D.N. (2012) Algorithm of tempo estimation of musical signals. Electronics and Control Systems. No 31, pp. 5-9.
Dhingra S.D. and Nijhawan G. (2007) Isolated speech recognition using MFCC and DTW. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. Vol. 2, No. 8, pp. 4085 – 4092.
Gladyshev K. K. (2010) Informativnye priznaki na osnove linejnyh spektral'nyh kornej v sistemah raspoznavanija komand. Diss. Cand.. Tekhn. nauk [Informative features based on linear spectral roots in commands recognition systems. Cand. Sci. diss.]. Saint-Petersburg, The Bonch-Bruevich Saint - Petersburg State University of Telecommunications, 16 p. Available at: www.sut.ru/doci/nauka/avtoref/glad.doc
Al-Naymat G., Chawla S. and Taheri J. (2009) SparseDTW: A Novel Approach to Speed up Dynamic Time Warping. The 2009 Australasian Data Mining. Vol. 101, Melbourne, Australia, ACM Digital Library, pp. 117-127.
Muda L., Begam M. and Elamvazuthi I. (2010) Voice Recognition Algorithms using Mel–Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. Journal of computing. Vol. 2, No 3, pp. 138–143.
Kolokolov A.S. (2006) Frequency domain signal processing in speech recognition. Control Science. No 3, pp. 13-18. (In Russian)
How to Cite
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).