• 제목/요약/키워드: phonetic system

검색결과 313건 처리시간 0.023초

오디오 신호에 기반한 음란 동영상 판별 (Classification of Phornographic Videos Based on the Audio Information)

  • 김봉완;최대림;이용주
    • 대한음성학회지:말소리
    • /
    • 제63호
    • /
    • pp.139-151
    • /
    • 2007
  • As the Internet becomes prevalent in our lives, harmful contents, such as phornographic videos, have been increasing on the Internet, which has become a very serious problem. To prevent such an event, there are many filtering systems mainly based on the keyword-or image-based methods. The main purpose of this paper is to devise a system that classifies pornographic videos based on the audio information. We use the mel-cepstrum modulation energy (MCME) which is a modulation energy calculated on the time trajectory of the mel-frequency cepstral coefficients (MFCC) as well as the MFCC as the feature vector. For the classifier, we use the well-known Gaussian mixture model (GMM). The experimental results showed that the proposed system effectively classified 98.3% of pornographic data and 99.8% of non-pornographic data. We expect the proposed method can be applied to the more accurate classification system which uses both video and audio information.

  • PDF

음성 자료에 대한 규칙 기반 Named Entity 인식 (Rule-based Named Entity (NE) Recognition from Speech)

  • 김지환
    • 대한음성학회지:말소리
    • /
    • 제58호
    • /
    • pp.45-66
    • /
    • 2006
  • In this paper, a rule-based (transformation-based) NE recognition system is proposed. This system uses Brill's rule inference approach. The performance of the rule-based system and IdentiFinder, one of most successful stochastic systems, are compared. In the baseline case (no punctuation and no capitalisation), both systems show almost equal performance. They also have similar performance in the case of additional information such as punctuation, capitalisation and name lists. The performances of both systems degrade linearly with the number of speech recognition errors, and their rates of degradation are almost equal. These results show that automatic rule inference is a viable alternative to the HMM-based approach to NE recognition, but it retains the advantages of a rule-based approach.

  • PDF

한국인을 위한 영어 발음 교정 시스템에 대한 성능 평가 (Performance Evaluation of English word Pronunciation Correction system)

  • 김무중;김효숙;김병기
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.71-74
    • /
    • 2003
  • In this paper, we present some of experimental results developed in computer-based English Pronunciation Correction System for Korean speakers. The aim of the system is to detect incorrectly pronounced phonemes in spoken words and to give correction comment to users. Speech data were collected from 254 native speakers and 411 Koreans, then used for phoneme modeling and test. We built two types of acoustic phoneme models: native speaker model and Korean speaker model. We also built langugage models to reflect Koreans' commonly occurred mispronunications. The detection rate was over 90% in insertion/deletion/replacement of phonemes, but we got under 75% detection rate in diphthong split and accents.

  • PDF

자동 대소문자 식별을 이용한 영어 음성인식 결과의 가독성 향상 (Readability Enhancement of English Speech Recognition Output Using Automatic Capitalisation Classification)

  • 김지환
    • 대한음성학회지:말소리
    • /
    • 제61호
    • /
    • pp.101-111
    • /
    • 2007
  • A modified speech recogniser have been proposed for automatic capitalisation generation to improve the readability of English speech recognition output. In this modified speech recogniser, every word in its vocabulary is duplicated: once in a de-caplitalised form and again in the capitalised forms. In addition its language model is re-trained on mixed case texts. In order to evaluate the performance of the proposed system, experiments of automatic capitalisation generation were performed for 3 hours of Broadcast News(BN) test data using the modified HTK BN transcription system. The proposed system produced an F-measure of 0.7317 for automatic capitalisation generation with an SER of 48.55, a precision of 0.7736 and a recall of 0.6942.

  • PDF

휴대용 화상전송 원격정비 감시시스템의 개발 (A Development of Image Transfer Remote Maintenance Monitoring System for Hand Held Device)

  • 김동완;박성원
    • 전기학회논문지P
    • /
    • 제58권3호
    • /
    • pp.276-284
    • /
    • 2009
  • In this paper, we develop the image transfer remote maintenance monitoring system for hand held device which can compensate defects of human mistake. The human mistakes always happen when the worker communicate information each other to check and maintenance the equipment of the power plant under bad circumstance such as small place and long distance in power plant. A worker couldn't converse with other when in noisy place like Power plant. So, we make some hand device for handy size and able to converse in noisy place. The developed system can have improvement of productivity through increasing plant operation time. And developed system is composed of advanced H/W(hard ware) system and S/W(soft ware)system. The H/W system consist of media server unit, communication equipment with hand held device, portable camera, mike and head set. The advanced s/w system consist of data base system, client pc(personal computer) real time monitoring system which has server GUI(graphic user interface) program, wireless monitoring program and wire ethernet communication program. The client GUI program is composed of total solution program as pc camera program, and phonetic conversation program etc.. We analyzed the required items and investigated applicable part in the image transfer remote maintenance monitoring system with hand held device. Also we investigated linkage of communication protocol for developed prototype, developed software tool of two-way communication and realtime recording skill of voice with image. We confirmed the efficiency by the field test in preventive maintenance of plant power.

자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리 (Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation)

  • 박혜영;김형순
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.137-150
    • /
    • 2002
  • Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.

  • PDF

결정트리기반 음성인식 시스템에서의 음소지속시간 사용방법 (A phoneme duration modeling in a speech recognition system based on decision tree state tying)

  • 구명완;김호경
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.197-200
    • /
    • 2002
  • In this paper, we propose a phoneme duration modeling in a speech recognition system based on disicion tree state tying. We assume that phone duration has a Gamma distribution. In a training mode, we model mean and variance of each state duration in context-independent phone model based on decision tree state tying. In a recognition mode, we get mean and variance of each context-dependent phone duration form state duration information obtaind during training mode. We make a comparative study of the proposed meth with conventinal methods. Our method results in good performance compared with conventional methods.

  • PDF

MMSE-STSA 추정치에 기반한 후처리를 갖는 마이크로폰 배열을 이용한 음성 개선 (Speech Enhancement Using Microphone Array with MMSE-STSA Estimator Based Post-Processing)

  • 권홍석;손종목;배건성
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.187-190
    • /
    • 2002
  • In this paper, a speech enhancement system using microphone array with MMSE-STSA (Minimum Mean Square Error-Short Time Spectral Amplitude) estimator based post-processing is proposed. Speech enhancement is first carried out by conventional delay-and-sum beamforming (DSB). A new MMSE-STSA estimator is then obtained by refining MMSE-STSA estimators from each microphone, which is applied to the output of conventional DSB to obtain additional speech enhancement. Computer simulation for white and pink noises show that the proposed system is superior to other approaches.

  • PDF

음성 인터페이스 기반의 재고 관리 시스템의 설계 및 구현 (Design and Implementation of Vocal Interface-Inventory Management System)

  • 박세진;권철홍
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.119-122
    • /
    • 2002
  • This paper focuses on building up a database of commercial stocks using XML syntax and looks into a way of building up a system with the combination of XML and XSLT that provides connectivity to client-server databases through vocal means. The use of XSLT has several advantages. Most importantly, it can transform a type of data into different formats. A vocal interface minimizes some space and time limits imposed on users outside premises when they need an instant connection to their database. In this fashion, the users can check information on stock lists without being pressurized by certain limits. PC, PDAs and cellular phones are some examples of mobile connection. The use of VoiceXML creates vocal applications. In VoiceXML servies, users can gain immediate access to data upon the input of their voices and the DTMF signals of the telephone.

  • PDF

TTS DB 압축을 위한 광대역 파형보간 부호기 구현 (Implementation of Wideband Waveform Interpolation Coder for TTS DB Compression)

  • 양희식;한민수
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.143-158
    • /
    • 2005
  • The adequate compression algorithm is essential to achieve high quality embedded TTS system. in this paper, we Propose waveform interpolation coder for TTS corpus compression after many speech coder investigation. Unlike speech coders in communication system, compression rate and anality are more important factors in TTS DB compression than other performance criteria. Thus we select waveform interpolation algorithm because it provides good speech quality under high compression rate at the cost of complexity. The implemented coder has bit rate 6kbps with quality degradation 0.47. The performance indicates that the waveform interpolation is adequate for TTS DB compression with some further study.

  • PDF