Analysis of the influence of sound signal processing parameters on the quality voice command recognition

Authors

DOI:

https://doi.org/10.20535/RADAP.2014.56.34-41

Keywords:

speech recognition, voice commands, melcepstral coefficients, dynamic time warping

Abstract

Introduction. For the task of voice control over different devices recognition of single (isolated) voice commands is required. Typically, this control method requires high reliability (at least 95% accuracy voice recognition). It should be noted that voice commands are often pronounced in high noisiness. All presently known methods and algorithms of speech recognition do not allow to clearly determine which parameters of sound signal can provide the best results.
The main part. On the first level of voice recognition is about preprocessing and extracting of acoustic features that have a number of useful features – they are easily calculated, providing a compact representation of the voice commands that are resistant to noise interference; On the next level given command is looked for in the reference dictionary. To get MFCC coefficients input file has to be divided into frames. Each frame is measured by a window function and processed by discrete Fourier transform. The resulting representation of signal in the frequency domain is divided into ranges using a set of triangular filters. The last step is to perform discrete cosine transform. Method of dynamic time warping allows to get a value that is an inverse of degree of similarity between given command and a reference.
Conclusions. Research has shown that in the field of voice commands recognition optimum results in terms of quality / performance can be achieved using the following parameters of sound signal processing:8 kHz sample rate, frame duration 70–120 ms, Hamming weighting function of a window, number of Fourier samples is 512.

Author Biographies

L. P. Dyuzhayev, National Technical University of Ukraine, Kyiv Politechnic Institute, Kiev

Cand. of Sci. (Techn), Assoc. Prof.

V. Yu. Koval, LLC «Central Industrial Group», Kiev

Коваль В.Ю.

References

Перелік посилань

Яцковський В.С. Алгоритм оцінювання темпу музикальних сигналів / В. С. Яцковський, Д.М. Бруслік // Електроніка та системи управління. – 2012. – № 31. – с. 5-9.

Dhingra S. D. Isolated speech recognition using MFCC and DTW / S. D. Dhingra, G. Nijhawan, P. Pandit // International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. – 2007. – Vol. 2, No 8. – pp. 4085 - 4092.

Гладышев К. К. Информативные признаки на основе линейных спек-тральных корней в системах распознавания команд: автореф. дис. на соискание ученой степени канд. техн. наук: спец. 05.13.01 – «Системный анализ, управление и обработка информации» / Гладышев Константин Константинович; СПб. госуд. унив. телекоммуникаций им. проф. М.А. Бонч–Бруевича. – СПб, 2010. – 16 с.

Al–Naymat G. SparseDTW: A Novel Approach to Speed up Dynamic Time Warping. / G. Al–Naymat, S. Chawla, J. Taheri // The 2009 Australasian Data Mining. – 2009. – Vol. 101 – pp. 117–127.

Muda L. Voice Recognition Algorithms using Mel–Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. / L. Muda, M. Begam, I. Elamvazuthi // Journal of computing. – 2010. – Vol. 2, No 3. – pp.

Колоколов А. С. Обработка сигнала в частотной области при распознавании речи. / А. С. Колоколов // Проблемы управления. – № 3. – 2006 г. c. 13–18.

References

Yatskovsky V.S. and Bruslik D.N. (2012) Algorithm of tempo estimation of musical signals. Electronics and Control Systems. No 31, pp. 5-9.

Dhingra S.D. and Nijhawan G. (2007) Isolated speech recognition using MFCC and DTW. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. Vol. 2, No. 8, pp. 4085 – 4092.

Gladyshev K. K. (2010) Informativnye priznaki na osnove linejnyh spektral'nyh kornej v sistemah raspoznavanija komand. Diss. Cand.. Tekhn. nauk [Informative features based on linear spectral roots in commands recognition systems. Cand. Sci. diss.]. Saint-Petersburg, The Bonch-Bruevich Saint - Petersburg State University of Telecommunications, 16 p. Available at: www.sut.ru/doci/nauka/avtoref/glad.doc

Al-Naymat G., Chawla S. and Taheri J. (2009) SparseDTW: A Novel Approach to Speed up Dynamic Time Warping. The 2009 Australasian Data Mining. Vol. 101, Melbourne, Australia, ACM Digital Library, pp. 117-127.

Muda L., Begam M. and Elamvazuthi I. (2010) Voice Recognition Algorithms using Mel–Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. Journal of computing. Vol. 2, No 3, pp. 138–143.

Kolokolov A.S. (2006) Frequency domain signal processing in speech recognition. Control Science. No 3, pp. 13-18. (In Russian)

Published

2014-04-03

How to Cite

Дюжаєв, Л. .П. and Коваль, В. (2014) “Analysis of the influence of sound signal processing parameters on the quality voice command recognition”, Visnyk NTUU KPI Seriia - Radiotekhnika Radioaparatobuduvannia, 0(56), pp. 34-41. doi: 10.20535/RADAP.2014.56.34-41.

Issue

Section

Computing methods in radio electronics