Algorithm for Processing Audio Signals Using Machine Learning




drone, small unmanned aerial vehicle, spectrum, signal processing, signal detection, convolutional neural networks, deep learning


Small unmanned aerial vehicles (UAVs) rapidly develop and are implemented in various industries to make people’s lives easier. However, there are potential risks in their use, such as unauthorized surveillance of critical infrastructure objects and the delivery of explosive devices, which poses a significant threat to public and national security. The acoustic method promises direction for solving this issue by analyzing the sound characteristics and Doppler shift signatures of UAVs, using microphone arrays and machine learning techniques. The aim of this article is to develop an algorithm for effective detection and classification of drone audio signals using a deep learning convolutional neural network (CNN), constructing its architecture, and evaluating its performance. Before submitting the drone audio dataset into the neural network, the quality of the audio recordings is improved through normalization, Wiener filtering, and segmentation. The audio is segmented into frames with a duration of 25 ms and a 50% overlap, applying Hamming windowing for better accuracy in the time domain, as temporal precision is crucial in audio signal processing. The obtained data is divided into three sets in a 60/20/20 ratio: for training, validation, and testing purposes. Next, the data is represented by a simplified set of features, extracting mel-spectrograms from each frame of the processed audio signals to capture their temporal and spectral characteristics. The frequency range of analysis corresponds to the working frequency limits of the microphone model (20 Hz - 20 kHz), with a frequency resolution of 50 Hz and 30 working mel frequency bands. Using the training data and the extracted audio features, a neural network architecture is developed to investigate the performance of the drone detection and classification algorithm. It consists of 10 pairs of convolutional layers, ReLU activation, batch normalization, and max-pooling layers. The number of these layers is determined by the size of the pooling window along the time dimension. This follows by flattening, dropout, fully connected, and Softmax layers. A classification layer is applied to normalize the output data and obtain final probabilities. The Adam optimizer is chosen for model training. Based on the dataset set, the initial learning rate is set to 0.001, gradually decreasing by a factor of 10 after 75% of the epochs to enhance convergence. The accuracy of the input data recognition reaches 99%, and the F1 score of the trained model is 0.93, indicating a high level of overall architecture performance. The maximum distance of effective detection of drones by the algorithm is 200 m.

Author Biographies

S. O. Sokolskyi , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine

Postgraduate Student

A. V. Movchanyuk , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine

Candidate of Technical Sciences, Associate Professor



Park S., Kim H. T., Lee S., Joo H. and Kim H. (2021). Survey on Anti-Drone Systems: Components, Designs, and Challenges. IEEE Access, Vol. 9, pp. 42635-42659. DOI: 10.1109/ACCESS.2021.3065926.

Kozeruk S. O., Korzhyk O. V. (2022). Detection, Localization and Identification of Small Aircraft by Acoustic Radiation. Visnyk NTUU KPI Seriia - Radiotekhnika Radioaparatobuduvannia, Iss. 89, pp. 29-38. DOI:10.20535/RADAP.2019.76.15-20.

Junfeng Guo, Ishtiaq Ahmad and KyungHi Chang (2020). Classification, positioning, and tracking of drones by HMM using acoustic circular microphone array beamforming. EURASIP Journal on Wireless Communications and Networking, Iss. 9, pp. 29-38. DOI:10.1186/s13638-019-1632-9.

Yousaf, J., Zia, H., Alhalabi, M. et al. (2020). Drone and Controller Detection and Localization: Trends and Challenges. EAppl. Sci., Vol. 12(24), pp. 1-22. DOI:10.3390/app122412612.

Mazumder J., Raj A. B. (2020). Detection and Classification of UAV Using Propeller Doppler Profiles for Counter UAV Systems. 2020 5th International Conference on Communication and Electronics Systems (ICCES), pp. 221–227. DOI:10.1109/ICCES48766.2020.91380.

Al-Emadi S., Al-Ali A. and Al-Ali A. (2021). Audio-Based Drone Detection and Identification Using Deep Learning Techniques with Dataset Enhancement through Generative Adversarial Networks. Sensors, Vol. 21(15), pp. 1-26. DOI:10.3390/s21154953.

Subbotin S. O. (2020). Neironni merezhi: teoriia ta praktyka: navch. posibnyk [Neural Networks: Theory and Practice]. Zhytomyr: O. O. Yevenok, 184 p.

Mahdavi F., Rajabi R. (2020). Drone Detection Using Convolutional Neural Networks. 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp. 1-5. DOI:10.1109/ICSPIS51611.2020.9349620.

Zeghidour N., Xu Q., Liptchinsky V., et al. (2019). Fully Convolutional Speech Recognition, Vol. 2. pp. 1-5. DOI: 10.48550/arXiv.1812.06864.

Sokolskyi S. O., Movchaniuk A. V. (2023). Electro-Acoustic Path of the Detector for Detection of Small Unmanned Aerial Vehicles. Visnyk VPI, Iss. 2, pp. 135-144. DOI:10.31649/1997-9266-2023-167-2-135-144.

Singh J. (2019). An introduction to audio processing and machine learning using Python., accessed on: Aug 19, 2023.

Pratheeksha N. (2018). The dummy's guide to MFCC., accessed on: Sep 3, 2023.

Ignatenko G. S., Lamchanovskii A. G. (2019). Classification of audio signals using neural networks. International Scientific Journal «Young Scientist», Iss. 48(286), pp. 23-25.

Nair Vinod; Hinton Geoffrey E. (2010). Rectified Linear Units Improve Restricted Boltzmann Machines. 27th International Conference on International Conference on Machine Learning, pp. 807–814. DOI:10.5555/3104322.3104425.

Diederik P. Kingma, Jimmy Lei Ba (2015). Adam: A method for stochastic optimization. 3rd International Conference for Learning Representations, pp. 1-15. DOI:10.48550/arXiv.1412.6980.



How to Cite

Сокольський , С. О. and Мовчанюк , А. В. (2023) “Algorithm for Processing Audio Signals Using Machine Learning”, Visnyk NTUU KPI Seriia - Radiotekhnika Radioaparatobuduvannia, (93), pp. 39-51. doi: 10.20535/RADAP.2023.93.39-51.



Telecommunication, navigation, radar systems, radiooptics and electroacoustics