• Title/Summary/Keyword: Spectrogram

Search Result 234, Processing Time 0.03 seconds

Deep Learning Music Genre Classification System Model Improvement Using Generative Adversarial Networks (GAN) (생성적 적대 신경망(GAN)을 이용한 딥러닝 음악 장르 분류 시스템 모델 개선)

  • Bae, Jun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.842-848
    • /
    • 2020
  • Music markets have entered the era of streaming. In order to select and propose music that suits the taste of music consumers, there is an active demand and research on an automatic music genre classification system. We propose a method to improve the accuracy of genre unclassified songs, which was a lack of the previous system, by using a generative adversarial network (GAN) to further develop the automatic voting system for deep learning music genre using Softmax proposed in the previous paper. In the previous study, if the spectrogram of the song was ambiguous to grasp the genre of the song, it was forced to leave it as an unclassified song. In this paper, we proposed a system that increases the accuracy of genre classification of unclassified songs by converting the spectrogram of unclassified songs into an easy-to-read spectrogram using GAN. And the result of the experiment was able to derive an excellent result compared to the existing method.

Environmental Sound Classification for Selective Noise Cancellation in Industrial Sites (산업현장에서의 선택적 소음 제거를 위한 환경 사운드 분류 기술)

  • Choi, Hyunkook;Kim, Sangmin;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.25 no.6
    • /
    • pp.845-853
    • /
    • 2020
  • In this paper, we propose a method for classifying environmental sound for selective noise cancellation in industrial sites. Noise in industrial sites causes hearing loss in workers, and researches on noise cancellation have been widely conducted. However, the conventional methods have a problem of blocking all sounds and cannot provide the optimal operation per noise type because of common cancellation method for all types of noise. In order to perform selective noise cancellation, therefore, we propose a method for environmental sound classification based on deep learning. The proposed method uses new sets of acoustic features consisting of temporal and statistical properties of Mel-spectrogram, which can overcome the limitation of Mel-spectrogram features, and uses convolutional neural network as a classifier. We apply the proposed method to five-class sound classification with three noise classes and two non-noise classes. We confirm that the proposed method provides improved classification accuracy by 6.6% point, compared with that using conventional Mel-spectrogram features.

Multi-Emotion Recognition Model with Text and Speech Ensemble (텍스트와 음성의 앙상블을 통한 다중 감정인식 모델)

  • Yi, Moung Ho;Lim, Myoung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.11 no.8
    • /
    • pp.65-72
    • /
    • 2022
  • Due to COVID-19, the importance of non-face-to-face counseling is increasing as the face-to-face counseling method has progressed to non-face-to-face counseling. The advantage of non-face-to-face counseling is that it can be consulted online anytime, anywhere and is safe from COVID-19. However, it is difficult to understand the client's mind because it is difficult to communicate with non-verbal expressions. Therefore, it is important to recognize emotions by accurately analyzing text and voice in order to understand the client's mind well during non-face-to-face counseling. Therefore, in this paper, text data is vectorized using FastText after separating consonants, and voice data is vectorized by extracting features using Log Mel Spectrogram and MFCC respectively. We propose a multi-emotion recognition model that recognizes five emotions using vectorized data using an LSTM model. Multi-emotion recognition is calculated using RMSE. As a result of the experiment, the RMSE of the proposed model was 0.2174, which was the lowest error compared to the model using text and voice data, respectively.

A General Acoustic Drone Detection Using Noise Reduction Preprocessing (환경 소음 제거를 통한 범용적인 드론 음향 탐지 구현)

  • Kang, Hae Young;Lee, Kyung-ho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.5
    • /
    • pp.881-890
    • /
    • 2022
  • As individual and group users actively use drones, the risks (Intrusion, Information leakage, and Sircraft crashes and so on) in no-fly zones are also increasing. Therefore, it is necessary to build a system that can detect drones intruding into the no-fly zone. General acoustic drone detection researches do not derive location-independent performance by directly learning drone sound including environmental noise in a deep learning model to overcome environmental noise. In this paper, we propose a drone detection system that collects sounds including environmental noise, and detects drones by removing noise from target sound. After removing environmental noise from the collected sound, the proposed system predicts the drone sound using Mel spectrogram and CNN deep learning. As a result, It is confirmed that the drone detection performance, which was weak due to unstudied environmental noises, can be improved by more than 7%.

A Novel Approach to COVID-19 Diagnosis Based on Mel Spectrogram Features and Artificial Intelligence Techniques

  • Alfaidi, Aseel;Alshahrani, Abdullah;Aljohani, Maha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.195-207
    • /
    • 2022
  • COVID-19 has remained one of the most serious health crises in recent history, resulting in the tragic loss of lives and significant economic impacts on the entire world. The difficulty of controlling COVID-19 poses a threat to the global health sector. Considering that Artificial Intelligence (AI) has contributed to improving research methods and solving problems facing diverse fields of study, AI algorithms have also proven effective in disease detection and early diagnosis. Specifically, acoustic features offer a promising prospect for the early detection of respiratory diseases. Motivated by these observations, this study conceptualized a speech-based diagnostic model to aid in COVID-19 diagnosis. The proposed methodology uses speech signals from confirmed positive and negative cases of COVID-19 to extract features through the pre-trained Visual Geometry Group (VGG-16) model based on Mel spectrogram images. This is used in addition to the K-means algorithm that determines effective features, followed by a Genetic Algorithm-Support Vector Machine (GA-SVM) classifier to classify cases. The experimental findings indicate the proposed methodology's capability to classify COVID-19 and NOT COVID-19 of varying ages and speaking different languages, as demonstrated in the simulations. The proposed methodology depends on deep features, followed by the dimension reduction technique for features to detect COVID-19. As a result, it produces better and more consistent performance than handcrafted features used in previous studies.

Recognition of Overlapped Sound and Influence Analysis Based on Wideband Spectrogram and Deep Neural Networks (광역 스펙트로그램과 심층신경망에 기반한 중첩된 소리의 인식과 영향 분석)

  • Kim, Young Eon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.421-430
    • /
    • 2018
  • Many voice recognition systems use methods such as MFCC, HMM to acknowledge human voice. This recognition method is designed to analyze only a targeted sound which normally appears between a human and a device one. However, the recognition capability is limited when there is a group sound formed with diversity in wider frequency range such as dog barking and indoor sounds. The frequency of overlapped sound resides in a wide range, up to 20KHz, which is higher than a voice. This paper proposes the new recognition method which provides wider frequency range by conjugating the Wideband Sound Spectrogram and the Keras Sequential Model based on DNN. The wideband sound spectrogram is adopted to analyze and verify diverse sounds from wide frequency range as it is designed to extract features and also classify as explained. The KSM is employed for the pattern recognition using extracted features from the WSS to improve sound recognition quality. The experiment verified that the proposed WSS and KSM excellently classified the targeted sound among noisy environment; overlapped sounds such as dog barking and indoor sounds. Furthermore, the paper shows a stage by stage analyzation and comparison of the factors' influences on the recognition and its characteristics according to various levels of noise.

Study on Discrimination between Natural Earthquakes and Man-made Explosions using Wonju KSRS Data (원주 KSRS 자료를 이용한 자연지진과 인공지진 구별에 관한 연구)

  • Kang, Ik-Bum;Kim, Sung-Bae;Suh, Man-Cheol;Jun, Myung-Soon
    • Journal of the Korean Geophysical Society
    • /
    • v.3 no.1
    • /
    • pp.25-36
    • /
    • 2000
  • 3-D Spectrograms for 22 events are drawn to discern about whether those are earthquakes or explosions. Generally, in case of explosions relative to the case of earthquakes, amplitude of P phase is more dominantly shown. According to the results on logarithm of spectral ratio of P (Pn, Pg)/Lg after removing free-surface effects from 3-D (U-D, N-S, E-W) seismogram, $-1.2{\sim}-0.9$ is shown for earthquakes and $-0.7{\sim}-0.1$ if shown for explosions. This result is consistent with previous researches (Kim Park, 1997) that -0.6 of spectral ratio between P and Lg after taking logarithm may be the criterion for the discrimination between earthquakes and explosions in Korea. In addition, Complexity is applied to two events as another discrimination method. The value of Complexity of explosion is much smaller than that of earthquake. This may be due to well-developed P-wave in explosion compared to that in earthquake. This result is in accordance with that of 3-D Spectrogram.

  • PDF

Open and Short Circuit Switches Fault Detection of Voltage Source Inverter Using Spectrogram

  • Ahmad, N.S.;Abdullah, A.R.;Bahari, N.
    • Journal of international Conference on Electrical Machines and Systems
    • /
    • v.3 no.2
    • /
    • pp.190-199
    • /
    • 2014
  • In the last years, fault problem in power electronics has been more and more investigated both from theoretical and practical point of view. The fault problem can cause equipment failure, data and economical losses. And the analyze system require to ensure fault problem and also rectify failures. The current errors on these faults are applied for identified type of faults. This paper presents technique to detection and identification faults in three-phase voltage source inverter (VSI) by using time-frequency distribution (TFD). TFD capable represent time frequency representation (TFR) in temporal and spectral information. Based on TFR, signal parameters are calculated such as instantaneous average current, instantaneous root mean square current, instantaneous fundamental root mean square current and, instantaneous total current waveform distortion. From on results, the detection of VSI faults could be determined based on characteristic of parameter estimation. And also concluded that the fault detection is capable of identifying the type of inverter fault and can reduce cost maintenance.

The Effect of Helium Gas Intake on the Characteristics Change of the Acoustic Organs for Voice Signal Analysis Parameter Application (음성신호 분석 요소의 적용으로 헬륨가스 흡입이 음성 기관의 특성 변화에 미치는 영향)

  • Kim, Bong-Hyun;Cho, Dong-Uk
    • The KIPS Transactions:PartB
    • /
    • v.18B no.6
    • /
    • pp.397-404
    • /
    • 2011
  • In this paper, we were carried out experiments to apply parameter of voice analysis to measure changing characteristic articulator according to inhale the helium gas. The helium gas was used to overcome air embolism nitrogen gas to deal a fatal blow in body nitrogen gas by diver. However, the helium gas has been much trouble interpretation about abnormal voice of diver to cause squeaky voice of low articulation. Therefor, we was carried out experiments about pitch and spectrogram measurement, analysis based on to influence in acoustic organs before and after of inhaled helium gas.

A Study on Speaker Identification Parameter Using Difference and Correlation Coeffieicent of Digit_sound Spectrum (숫자음의 스펙트럼 차이값과 상관계수를 이용한 화자인증 파라미터 연구)

  • Lee, Hoo-Dong;Kang, Sun-Mee;Chang, Moon-Soo;Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.131-142
    • /
    • 2004
  • Speaker identification system basically functions by comparing spectral energy of an individual production model with that of an input signal. This study aimed to develop a new speaker identification system from two parameters from the spectral energy of numeric sounds: difference sum and correlation coefficient. A narrow-band spectrogram yielded more stable spectral energy across time than a wide-band one. In this paper, we collected empirical data from four male speakers and tested the speaker identification system. The subjects produced 18 combinations of three-digit numeric. sounds !en times each. Five productions of each three-digit number were statistically averaged to make a model for each speaker. Then, the remaining five productions were tested on the system. Results showed that when the threshold for the absolute difference sum was set to 1200, all the speakers could not pass the system while everybody could pass if set to 2800. The minimum correlation coefficient to allow all to pass was 0.82 while the coefficient of 0.95 rejected all. Thus, both threshold levels can be adjusted to the need of speaker identification system, which is desirable for further study.

  • PDF