• Title/Summary/Keyword: Mel-Frequency Cepstral Coefficients (MFCC)

Search Result 51, Processing Time 0.023 seconds

Classification of Phornographic Videos Using Audio Information (오디오 신호를 이용한 음란 동영상 판별)

  • Kim, Bong-Wan;Choi, Dae-Lim;Bang, Man-Won;Lee, Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.207-210
    • /
    • 2007
  • As the Internet is prevalent in our life, harmful contents have been increasing on the Internet, which has become a very serious problem. Among them, pornographic video is harmful as poison to our children. To prevent such an event, there are many filtering systems which are based on the keyword based methods or image based methods. The main purpose of this paper is to devise a system that classifies the pornographic videos based on the audio information. We use Mel-Cepstrum Modulation Energy (MCME) which is modulation energy calculated on the time trajectory of the Mel-Frequency cepstral coefficients (MFCC) and MFCC as the feature vector and Gaussian Mixture Model (GMM) as the classifier. With the experiments, the proposed system classified the 97.5% of pornographic data and 99.5% of non-pornographic data. We expect the proposed method can be used as a component of the more accurate classification system which uses video information and audio information simultaneously.

  • PDF

Classification of Phornographic Video with using the Features of Multiple Audio (다중 오디오 특징을 이용한 유해 동영상의 판별)

  • Kim, Jung-Soo;Chung, Myung-Bum;Sung, Bo-Kyung;Kwon, Jin-Man;Koo, Kwang-Hyo;Ko, Il-Ju
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.522-525
    • /
    • 2009
  • This paper proposed the content-based method of classifying filthy Phornographic video, which causes a big problem of modern society as the reverse function of internet. Audio data was used to extract the features from Phornographic video. There are frequency spectrum, autocorrelation, and MFCC as the feature of audio used in this paper. The sound that could be filthy contents was extracted, and the Phornographic was classified by measuring how much percentage of relevant sound was corresponding with the whole audio of video. For the experiment on the proposed method, The efficiency of classifying Phornographic was measured on each feature, and the measured result and comparison with using multi features were performed. I can obtain the better result than when only one feature of audio was extracted, and used.

  • PDF

Proposed Efficient Architectures and Design Choices in SoPC System for Speech Recognition

  • Trang, Hoang;Hoang, Tran Van
    • Journal of IKEEE
    • /
    • v.17 no.3
    • /
    • pp.241-247
    • /
    • 2013
  • This paper presents the design of a System on Programmable Chip (SoPC) based on Field Programmable Gate Array (FPGA) for speech recognition in which Mel-Frequency Cepstral Coefficients (MFCC) for speech feature extraction and Vector Quantization for recognition are used. The implementing process of the speech recognition system undergoes the following steps: feature extraction, training codebook, recognition. In the first step of feature extraction, the input voice data will be transformed into spectral components and extracted to get the main features by using MFCC algorithm. In the recognition step, the obtained spectral features from the first step will be processed and compared with the trained components. The Vector Quantization (VQ) is applied in this step. In our experiment, Altera's DE2 board with Cyclone II FPGA is used to implement the recognition system which can recognize 64 words. The execution speed of the blocks in the speech recognition system is surveyed by calculating the number of clock cycles while executing each block. The recognition accuracies are also measured in different parameters of the system. These results in execution speed and recognition accuracy could help the designer to choose the best configurations in speech recognition on SoPC.

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

Feature Extraction Algorithm for Underwater Transient Signal Using Cepstral Coefficients Based on Wavelet Packet (웨이브렛 패킷 기반 캡스트럼 계수를 이용한 수중 천이신호 특징 추출 알고리즘)

  • Kim, Juho;Paeng, Dong-Guk;Lee, Chong Hyun;Lee, Seung Woo
    • Journal of Ocean Engineering and Technology
    • /
    • v.28 no.6
    • /
    • pp.552-559
    • /
    • 2014
  • In general, the number of underwater transient signals is very limited for research on automatic recognition. Data-dependent feature extraction is one of the most effective methods in this case. Therefore, we suggest WPCC (Wavelet packet ceptsral coefficient) as a feature extraction method. A wavelet packet best tree for each data set is formed using an entropy-based cost function. Then, every terminal node of the best trees is counted to build a common wavelet best tree. It corresponds to flexible and non-uniform filter bank reflecting characteristics for the data set. A GMM (Gaussian mixture model) is used to classify five classes of underwater transient data sets. The error rate of the WPCC is compared using MFCC (Mel-frequency ceptsral coefficients). The error rates of WPCC-db20, db40, and MFCC are 0.4%, 0%, and 0.4%, respectively, when the training data consist of six out of the nine pieces of data in each class. However, WPCC-db20 and db40 show rates of 2.98% and 1.20%, respectively, while MFCC shows a rate of 7.14% when the training data consists of only three pieces. This shows that WPCC is less sensitive to the number of training data pieces than MFCC. Thus, it could be a more appropriate method for underwater transient recognition. These results may be helpful to develop an automatic recognition system for an underwater transient signal.

Applying the Bi-level HMM for Robust Voice-activity Detection

  • Hwang, Yongwon;Jeong, Mun-Ho;Oh, Sang-Rok;Kim, Il-Hwan
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.1
    • /
    • pp.373-377
    • /
    • 2017
  • This paper presents a voice-activity detection (VAD) method for sound sequences with various SNRs. For real-time VAD applications, it is inadequate to employ a post-processing for the removal of burst clippings from the VAD output decision. To tackle this problem, building on the bi-level hidden Markov model, for which a state layer is inserted into a typical hidden Markov model (HMM), we formulated a robust method for VAD not requiring any additional post-processing. In the method, a forward-inference-ratio test was devised to detect the speech endpoints and Mel-frequency cepstral coefficients (MFCC) were used as the features. Our experiment results show that, regarding different SNRs, the performance of the proposed approach is more outstanding than those of the conventional methods.

Active Sonar Target/Nontarget Classification Using Real Sea-trial Data (실제 해상 실험 데이터를 이용한 능동소나 표적/비표적 식별)

  • Seok, J.W.
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.10
    • /
    • pp.1637-1645
    • /
    • 2017
  • Target/Nontarget classification can be divided into the study of shape estimation of the target analysing reflected echo signal and of type classification of the target using acoustical features. In active sonar system, the feature vectors are extracted from the signal reflected from the target, and an classification algorithm is applied to determine whether the received signal is a target or not. However, received sonar signals can be distorted in the underwater environments, and the spatio-temporal characteristics of active sonar signals change according to the aspect of the target. In addition, it is very difficult to collect real sea-trial data for research. In this paper, target/non-target classification were performed using real sea-trial data. Feature vectors are extracted using MFCC(Mel-Frequency Cepstral Coefficients), filterbank energy in the Fourier spectrum and wavelet domain. For the performance verification, classification experiments were performed using backpropagation neural network classifiers.

A Study on Speech Recognition System Using Continuous HMM (연속분포 HMM을 이용한 음성인식 시스템에 관한 연구)

  • Kim, Sang-Duck;Lee, Geuk
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 1998.10a
    • /
    • pp.221-225
    • /
    • 1998
  • 본 논문에서는 연속분포(Continuous) HMM(hidden Markov model)을 기반으로 하여 한국어 고립단어인식 시스템을 설계, 구현하였다. 시스템의 학습과 평가를 위해 자동차 항법용 음성 명령어 도메인에서 추출한 10개의 고립단어를 대상으로 음성 데이터 베이스를 구축하였다. 음성 특징 파라미터로는 MFCCs(Mel Frequency Cepstral Coefficients)와 차분(delta) MFCC 그리고 에너지(energy)를 사용하였다. 학습 데이터로부터 추출한 18개의 유사 음소(phoneme-like unit : PLU)를 인식단위로 HMM 모델을 만들었고 조음 결합 현상(채-articulation)을 모델링 하기 위해 트라이폰(triphone) 모델로 확장하였다. 인식기 평가는 학습에 참여한 음성 데이터와 학습에 참여하지 않은 화자가 발성한 음성 데이터를 이용해 수행하였으며 평균적으로 97.5%의 인식성능을 얻었다.

  • PDF

A Study on the HMM Structure for Classifying Dog Breeds (개의 품종 분류를 위한 HMM 구조의 연구)

  • Lim, Seong-Min;Kim, Yoon-Joong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.477-479
    • /
    • 2012
  • 개의 발성은 성도의 물리적인 특징에 따라 고유의 특정 포먼트를 만들어 내며 개의 품종에 따라 다른 물리적 특징을 가지므로 개의 발성을 HMM(Hidden Markov Model)으로 모델링하여 개의 품종을 분류하는 연구를 하였다. 주파수 특징은 MFCC(Mel Frequency Cepstral Coefficients) 12차, 에너지 컴포넌트 1차, 델타 13차, 억셀러레이션(Acceleration) 13차, 총 39차 벡터를 사용하였다. 개의 품종 분류에 적합한 HMM 구조의 설계를 위하여 기본 좌우 모델, 좌우 모델, 좌우 모델2, 전후진 모델, 총 4가지를 제안하고 실험하여 성능을 비교분석하였다. 이 중 전후진 모델이 가장 바람직한 모델로 검증 되었다. 본 모델은 다음과 같은 장점을 갖는다. (1) 기본 좌우 모델과 마찬가지로 1~2회 발성을 갖는 데이터가 입력되어도 처음에서 마지막 상태까지의 이동단계가 최소 3번까지 가능하므로 적은 횟수의 발성 데이터도 처리가 가능하다. (2) 다수 반복된 발성 데이터의 신호도 처리가 가능하다. 즉, 본 모델은 상태의 이동이 후진도 가능하므로 5회이상 반복된 발성 데이터의 신호의 처리도 가능하다.

Personal Information Extraction Using A Microphone Array (마이크로폰어레이를 이용한 사용자 정보추출)

  • Kim, Hye-Jin;Yoon, Ho-Sub
    • The Journal of Korea Robotics Society
    • /
    • v.3 no.2
    • /
    • pp.131-136
    • /
    • 2008
  • This paper proposes a method to extract the personal information using a microphone array. Useful personal information, particularly customers, is age and gender. On the basis of this information, service applications for robots can satisfy users by offering services adaptive to the special needs of specific user groups that may include adults and children as well as females and males. We applied Gaussian Mixture Model (GMM) as a classifier and Mel Frequency Cepstral coefficients (MFCCs) as a voice feature. The major aim of this paper is to discover the voice source parameters of age and gender and to classify these two characteristics simultaneously. For the ubiquitous environment, voices obtained by the selected channels in a microphone array are useful to reduce background noise.

  • PDF