• Title/Summary/Keyword: Sound recognition

Search Result 311, Processing Time 0.033 seconds

Development of Context Awareness and Service Reasoning Technique for Handicapped People (멀티 모달 감정인식 시스템 기반 상황인식 서비스 추론 기술 개발)

  • Ko, Kwang-Eun;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.1
    • /
    • pp.34-39
    • /
    • 2009
  • As a subjective recognition effect, human's emotion has impulsive characteristic and it expresses intentions and needs unconsciously. These are pregnant with information of the context about the ubiquitous computing environment or intelligent robot systems users. Such indicators which can aware the user's emotion are facial image, voice signal, biological signal spectrum and so on. In this paper, we generate the each result of facial and voice emotion recognition by using facial image and voice for the increasing convenience and efficiency of the emotion recognition. Also, we extract the feature which is the best fit information based on image and sound to upgrade emotion recognition rate and implement Multi-Modal Emotion recognition system based on feature fusion. Eventually, we propose the possibility of the ubiquitous computing service reasoning method based on Bayesian Network and ubiquitous context scenario in the ubiquitous computing environment by using result of emotion recognition.

Dual CNN Structured Sound Event Detection Algorithm Based on Real Life Acoustic Dataset (실생활 음향 데이터 기반 이중 CNN 구조를 특징으로 하는 음향 이벤트 인식 알고리즘)

  • Suh, Sangwon;Lim, Wootaek;Jeong, Youngho;Lee, Taejin;Kim, Hui Yong
    • Journal of Broadcast Engineering
    • /
    • v.23 no.6
    • /
    • pp.855-865
    • /
    • 2018
  • Sound event detection is one of the research areas to model human auditory cognitive characteristics by recognizing events in an environment with multiple acoustic events and determining the onset and offset time for each event. DCASE, a research group on acoustic scene classification and sound event detection, is proceeding challenges to encourage participation of researchers and to activate sound event detection research. However, the size of the dataset provided by the DCASE Challenge is relatively small compared to ImageNet, which is a representative dataset for visual object recognition, and there are not many open sources for the acoustic dataset. In this study, the sound events that can occur in indoor and outdoor are collected on a larger scale and annotated for dataset construction. Furthermore, to improve the performance of the sound event detection task, we developed a dual CNN structured sound event detection system by adding a supplementary neural network to a convolutional neural network to determine the presence of sound events. Finally, we conducted a comparative experiment with both baseline systems of the DCASE 2016 and 2017.

Establishment of the Korean Standard Vocal Sound into Character Conversion Rule (한국어 음가를 한글 표기로 변환하는 표준규칙 제정)

  • 이계영;임재걸
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.2
    • /
    • pp.51-64
    • /
    • 2004
  • The purpose of this paper is to establish the Standard Korean Vocal Sound into Character Conversion Rule (Standard VSCC Rule) by reversely applying the Korean Standard Pronunciation Rule that regulates the way of reading written Hangeul sentences. The Standard VSCC Rule performs a crucially important role in Korean speech recognition. The general method of speech recognition is to find the most similar pattern among the standard voice patterns to the input voice pattern. Each of the standard voice patterns is an average of several sample voice patterns. If the unit of the standard voice pattern is a word, then the number of entries of the standard voice pattern will be greater than a few millions (taking inflection and postpositional particles into account). This many entries require a huge database and an impractically too many comparisons in the process of finding the most similar pattern. Therefore, the unit of the standard voice pattern should be a syllable. In this case, we have to resolve the problem of the difference between the Korean vocal sounds and the writing characters. The process of converting a sequence of Korean vocal sounds into a sequence of characters requires our Standard VSCC Rule. Making use of our Standard VSCC Rule, we have implemented a Korean vocal sounds into Hangeul character conversion system. The Korean Standard Pronunciation Rule consists of 30 items. In order to show soundness and completeness of our Standard VSCC Rule, we have tested the conversion system with various data sets reflecting all the 30 items. The test results will be presented in this paper.

On the speaker's position estimation using TDOA algorithm in vehicle environments (자동차 환경에서 TDOA를 이용한 화자위치추정 방법)

  • Lee, Sang-Hun;Choi, Hong-Sub
    • Journal of Digital Contents Society
    • /
    • v.17 no.2
    • /
    • pp.71-79
    • /
    • 2016
  • This study is intended to compare the performances of sound source localization methods used for stable automobile control by improving voice recognition rate in automobile environment and suggest how to improve their performances. Generally, sound source location estimation methods employ the TDOA algorithm, and there are two ways for it; one is to use a cross correlation function in the time domain, and the other is GCC-PHAT calculated in the frequency domain. Among these ways, GCC-PHAT is known to have stronger characteristics against echo and noise than the cross correlation function. This study compared the performances of the two methods above in automobile environment full of echo and vibration noise and suggested the use of a median filter additionally. We found that median filter helps both estimation methods have good performances and variance values to be decreased. According to the experimental results, there is almost no difference in the two methods' performances in the experiment using voice; however, using the signal of a song, GCC-PHAT is 10% more excellent than the cross correlation function in terms of the recognition rate. Also, when the median filter was added, the cross correlation function's recognition rate could be improved up to 11%. And in regarding to variance values, both methods showed stable performances.

Comparison of ICA Methods for the Recognition of Corrupted Korean Speech (잡음 섞인 한국어 인식을 위한 ICA 비교 연구)

  • Kim, Seon-Il
    • 전자공학회논문지 IE
    • /
    • v.45 no.3
    • /
    • pp.20-26
    • /
    • 2008
  • Two independent component analysis(ICA) algorithms were applied for the recognition of speech signals corrupted by a car engine noise. Speech recognition was performed by hidden markov model(HMM) for the estimated signals and recognition rates were compared with those of orginal speech signals which are not corrupted. Two different ICA methods were applied for the estimation of speech signals, one of which is FastICA algorithm that maximizes negentropy, the other is information-maximization approach that maximizes the mutual information between inputs and outputs to give maximum independence among outputs. Word recognition rate for the Korean news sentences spoken by a male anchor is 87.85%, while there is 1.65% drop of performance on the average for the estimated speech signals by FastICA and 2.02% by information-maximization for the various signal to noise ratio(SNR). There is little difference between the methods.

Development of Realtime Phonetic Typewriter (실시간 음성타자 시스템 구현)

  • Cho, W.Y.;Choi, D.I.
    • Proceedings of the KIEE Conference
    • /
    • 1999.11c
    • /
    • pp.727-729
    • /
    • 1999
  • We have developed a realtime phonetic typewriter implemented on IBM PC with sound card based on Windows 95. In this system, analyzing of speech signal, learning of neural network, labeling of output neurons and visualizing of recognition results are performed on realtime. The developing environment for speech processing is established by adding various functions, such as editing, saving, loading of speech data and 3-D or gray level displaying of spectrogram. Recognition experimental using Korean phone had a 71.42% for 13 basic consonant and 90.01% for 7 basic vowel accuracy.

  • PDF

Ortho-phonic Alphabet Creation by the Musical Theory and its Segmental Algorithm (악리론으로 본 정음창제와 정음소 분절 알고리즘)

  • Chin, Yong-Ohk;Ahn, Cheong-Keung
    • Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.49-59
    • /
    • 2001
  • The phoneme segmentation is a very difficult problem in speech sound processing because it has found out segmental algorithm in many kinds of allophone and coarticulation's trees. Thus system configuration for the speech recognition and voice retrieval processing has a complex system structure. To solve it, we discuss a possibility of new segmental algorithm, which is called the minus a thirds one or plus in tripartitioning(삼분손익) of twelve temporament(12 율려), first proposed by Prof. T. S. Han. It is close to oriental and western musical theory. He also has suggested a 3 consonant and 3 vowel phonemes in Hunminjungum(훈민정음) invented by the King Sejong in the 15th century. In this paper, we suggest to newly name it as ortho-phonic phoneme(OPP/정음소), which carries the meaning of 'the absoluteness and independency'. OPP also is acceptable to any other languages, for example IPA. Lastly we know that this algorithm is constantly applicable to the global language and is very useful to construct a voice recognition and retrieval structuring engineering.

  • PDF

A Study about the Construction of Intelligence Data Base for Micro Defect Evaluation (미소 결함 평가를 위한 지능형 데이터베이스 구축에 관한 연구)

  • 김재열
    • Proceedings of the Korean Society of Machine Tool Engineers Conference
    • /
    • 2000.04a
    • /
    • pp.585-590
    • /
    • 2000
  • Recently, It is gradually raised necessity that thickness of thin film is measured accuracy and managed in industrial circles and medical world. Ultrasonic Signal processing method is likely to become a very powerful method for NDE method of detection of microdefects and thickness measurement of thin film below the limit of Ultrasonic distance resolution in the opaque materials, provides useful information that cannot be obtained by a conventional measuring system. In the present research, considering a thin film below the limit of ultrasonic distance resolution sandwiched between three substances as acoustical analysis model, demonstrated the usefulness of ultrasonic Signal processing technique using information of ultrasonic frequency for NDE of measurements of thin film thickness, sound velocity, and step height, regardless of interference phenomenon. Numeral information was deduced and quantified effective information from the image. Also, pattern recognition of a defected input image was performed by neural network algorithm. Input pattern of various numeral was composed combinationally, and then, it was studied by neural network. Furthermore, possibility of pattern recognition was confirmed on artifical defected input data formed by simulation. Finally, application on unknown input pattern was also examined.

  • PDF

Environments of Hoarseness in Children (소아애성에 영향을 주는 환경에 대한 연구)

  • 안철민;박상준;이건영
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.8 no.2
    • /
    • pp.173-177
    • /
    • 1997
  • The speech movements are acquired activity, not determined by instincts or by biologic inheritance either. The child listens to the sound from the surrounding persons, observes the speech movement of the people and tried to imitate them. Then the child acquires their specific phonation pattern. We guessed that the parents influences to the child are very important in the developing of the speech movements. Because the parents are first contact person to the baby. The recognition of parents about the voice changes in the child will be important too. And social environments such as kindergarden, school, friends contact with, can influence to the voice of the child. We investigated the state of the voice, parents influence and social environmental factor. In the bases of this study, we knew that the parents recognition about the voice changes of child, faulty vocal habits of child, social environmental factors influenced to the voice of child. And we thought we have to do our best for the early detection of voice changes and proper treatment.

  • PDF

Performance Comparison of Classification Algorithms in Music Recognition using Violin and Cello Sound Files (바이올린과 첼로 연주 데이터를 이용한 분류 알고리즘의 성능 비교)

  • Kim Jae Chun;Kwak Kyung sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.5C
    • /
    • pp.305-312
    • /
    • 2005
  • Three classification algorithms are tested using musical instruments. Several classification algorithms are introduced and among them, Bayes rule, NN and k-NN performances evaluated. ZCR, mean, variance and average peak level feature vectors are extracted from instruments sample file and used as data set to classification system. Used musical instruments are Violin, baroque violin and baroque cello. Results of experiment show that the performance of NN algorithm excels other algorithms in musical instruments classification.