• Title/Summary/Keyword: Korean Speech Engineering Systems

Search Result 105, Processing Time 0.021 seconds

Automatic Detection of Korean Accentual Phrase Boundaries

  • Lee, Ki-Yeong;Song, Min-Suck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1E
    • /
    • pp.27-31
    • /
    • 1999
  • Recent linguistic researches have brought into focus the relations between prosodic structures and syntactic, semantic or phonological structures. Most of them prove that prosodic information is available for understanding syntactic, semantic and discourse structures. But this result has not been integrated yet into recent Korean speech recognition or understanding systems. This study, as a part of integrating prosodic information into the speech recognition system, proposes an automatic detection technique of Korean accentual phrase boundaries by using one-stage DP, and the normalized pitch pattern. For making the normalized pitch pattern, this study proposes a method of modified normalization for Korean spoken language. For the experiment, this study employs 192 sentential speech data of 12 men's voice spoken in standard Korean, in which 720 accentual phrases are included, and 74.4% of the accentual phrase boundaries are correctly detected while 14.7% are the false detection rate.

  • PDF

Virtual Dialog System Based on Multimedia Signal Processing for Smart Home Environments (멀티미디어 신호처리에 기초한 스마트홈 가상대화 시스템)

  • Kim, Sung-Ill;Oh, Se-Jin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.173-178
    • /
    • 2005
  • This paper focuses on the use of the virtual dialog system whose aim is to build more convenient living environments. In order to realize this, the main emphasis of the paper lies on the description of the multimedia signal processing on the basis of the technologies such as speech recognition, speech synthesis, video, or sensor signal processing. For essential modules of the dialog system, we incorporated the real-time speech recognizer based on HM-Net(Hidden Markov Network) as well as speech synthesis into the overall system. In addition, we adopted the real-time motion detector based on the changes of brightness in pixels, as well as the touch sensor that was used to start system. In experimental evaluation, the results showed that the proposed system was relatively easy to use for controlling electric appliances while sitting in a sofa, even though the performance of the system was not better than the simulation results owing to the noisy environments.

A Nonuniform Sampling Technique and Its Application to Speech Coding (비균등 표본화 기법과 음성 부호화로의 응용)

  • Iem, Byeong-Gwan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.1
    • /
    • pp.28-32
    • /
    • 2014
  • For a signal such as speech showing piece-wise linear shape in a very short time period, a nonuniform sampling method based on the inflection point detection (IPD) is proposed to reduce data rate. The method exploits the geometrical characteristics of signal further than the existing local maxima/minima detection (MMD) based sampling method. As results, the reconstructed signal by the interpolation of the IPD based sampled data resembles the original speech more. Computer simulation shows that the proposed IPD based method produces about 9~23 dB improvement over the existing MMD method. To show the usefulness of the IPD technique, it is applied to speech coding, and compared to the continuously variable slope delta modulation (CVSD). The nonuniformly sampled data is binary coded with one bit flag set "1". Noninflection samples are not sent, but only flag bits set 0 are sent. The method shows 0.3 ~ 9 dB SNR and 0.5 ~ 1.3 mean opinion score (MOS) improvements over the CVSD.

Recent Approaches to Dialog Management for Spoken Dialog Systems

  • Lee, Cheong-Jae;Jung, Sang-Keun;Kim, Kyung-Duk;Lee, Dong-Hyeon;Lee, Gary Geun-Bae
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.1
    • /
    • pp.1-22
    • /
    • 2010
  • A field of spoken dialog systems is a rapidly growing research area because the performance improvement of speech technologies motivates the possibility of building systems that a human can easily operate in order to access useful information via spoken languages. Among the components in a spoken dialog system, the dialog management plays major roles such as discourse analysis, database access, error handling, and system action prediction. This survey covers design issues and recent approaches to the dialog management techniques for modeling the dialogs. We also explain the user simulation techniques for automatic evaluation of spoken dialog systems.

Context-Awareness based Home Assistant using Open Source Home Py (오픈 소스 Home Py를 이용한 상황인식 홈 비서)

  • Lee, Se-Hoon;Kim, Ju-Yeon;Moon, Sung-Hyun;Lim, Su-Young;Lee, Yoon-Su
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.07a
    • /
    • pp.135-136
    • /
    • 2016
  • 본 논문은 오픈소스 Home Py를 이용해 Telegram Service를 통한 대화방식의 서비스로써 사물과 사람 간 양방향 통신을 가능하며 상황인식 서비스를 활용하여 홈 시스템 제어를 유연하게 할 수 있는 프로젝트를 구성하였다. 기존 시스템은 스마트 폰으로 가전을 제어하는 Smart Home이 현실화 되었지만, 조작법의 어려움으로 인하여 장애인, 노약자, 어린이, 임산부들의 불편함이 있다. 본 문제를 해결하기 위해 상황인식을 통해 상황에 맞는 사물들을 제어함으로써 보다 지능적인 스마트 홈 시스템을 제안한다.

  • PDF

A Variable Step-Size Adaptive Feedback Cancellation Algorithm based on GSAP in Digital Hearing Aids (가변 스텝 크기 적응 필터와 음성 검출기를 이용한 보청기용 피드백 제거 알고리즘)

  • An, Hongsub;Park, Gyuseok;Song, Jihyun;Lee, Sangmin
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.12
    • /
    • pp.1744-1749
    • /
    • 2013
  • Acoustic feedback is perceived as whistling or howling, which is a major complaint of hearing-aids users. Acoustic feedback cancellation is important in hearing-aids because acoustic feedback degrades performance of the hearing aid device by reducing maximum insertion gain. Adaptive systems for estimate acoustic feedback path and feedback suppression algorithms have been proposed in order to solve this problem. A typical feedback cancellation algorithm is LMS(least mean squares) because of its computational efficiency. However it has problem of convergence performance in high correlated input signal. In this paper, we propose a new variable step-size normalized LMS(least mean squares) algorithm using VAD(voice activity detection) to overcome the limitation of the LMS algorithm. The VAD algorithm is GSAP(global speech absence probability) and the feedback cancellation algorithm is normalized LMS. The proposed algorithm applies different step-size between voice and non-voice using VAD, for high stability, fast convergence speed and low misalignment when correlated inputs, such as speech. The result of simulation with white noise mixed speech signal, the proposed algorithm shows high performance then traditional algorithm in terms of stability, convergence speed and misalignment.

Classification of Phornographic Videos Using Audio Information (오디오 신호를 이용한 음란 동영상 판별)

  • Kim, Bong-Wan;Choi, Dae-Lim;Bang, Man-Won;Lee, Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.207-210
    • /
    • 2007
  • As the Internet is prevalent in our life, harmful contents have been increasing on the Internet, which has become a very serious problem. Among them, pornographic video is harmful as poison to our children. To prevent such an event, there are many filtering systems which are based on the keyword based methods or image based methods. The main purpose of this paper is to devise a system that classifies the pornographic videos based on the audio information. We use Mel-Cepstrum Modulation Energy (MCME) which is modulation energy calculated on the time trajectory of the Mel-Frequency cepstral coefficients (MFCC) and MFCC as the feature vector and Gaussian Mixture Model (GMM) as the classifier. With the experiments, the proposed system classified the 97.5% of pornographic data and 99.5% of non-pornographic data. We expect the proposed method can be used as a component of the more accurate classification system which uses video information and audio information simultaneously.

  • PDF

An Improved Coverless Text Steganography Algorithm Based on Pretreatment and POS

  • Liu, Yuling;Wu, Jiao;Chen, Xianyi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1553-1567
    • /
    • 2021
  • Steganography is a current hot research topic in the area of information security and privacy protection. However, most previous steganography methods are not effective against steganalysis and attacks because they are usually carried out by modifying covers. In this paper, we propose an improved coverless text steganography algorithm based on pretreatment and Part of Speech (POS), in which, Chinese character components are used as the locating marks, then the POS is used to hide the number of keywords, the retrieval of stego-texts is optimized by pretreatment finally. The experiment is verified that our algorithm performs well in terms of embedding capacity, the embedding success rate, and extracting accuracy, with appropriate lengths of locating marks and the large scale of the text database.

Voice Command through Facial Recognition Smart Mirror System (얼굴인식을 통한 음성 명령 스마트 거울 시스템)

  • Lee, Se-Hoon;Kim, Su-Min;Park, Hyun-Gyu
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.253-254
    • /
    • 2019
  • 본 논문에서는 가정 등에서 사용자의 행동 반경에 가장 많이 있는 거울에 홈 제어 및 근처 전열 기구들을 보다 쉽게 제어 할 수 있도록 Google Speech API와 Open CV 라이브러리를 사용해 음성인식을 통한 홈 제어 방안을 제시하였다. 이를 통해서 바쁜 아침에 화장 등을 하는 경우 두 손을 자유롭게 사용하면서 디바이스를 음성으로 제어 할 수 있는 편리성을 제공하였다.

  • PDF

Recognition of Emotional states in Speech using Hidden Markov Model (HMM을 이용한 음성에서의 감정인식)

  • Kim, Sung-Ill;Lee, Sang-Hoon;Shin, Wee-Jae;Park, Nam-Chun
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.10a
    • /
    • pp.560-563
    • /
    • 2004
  • 본 논문은 분노, 행복, 평정, 슬픔, 놀람 둥과 같은 인간의 감정상태를 인식하는 새로운 접근에 대해 설명한다. 이러한 시도는 이산길이를 포함하는 연속 은닉 마르코프 모델(HMM)을 사용함으로써 이루어진다. 이를 위해, 우선 입력음성신호로부터 감정의 특징 파라메타를 정의 한다. 본 연구에서는 피치 신호, 에너지, 그리고 각각의 미분계수 등의 운율 파라메타를 사용하고, HMM으로 훈련과정을 거친다. 또한, 화자적응을 위해서 최대 사후확률(MAP) 추정에 기초한 감정 모델이 이용된다. 실험 결과, 음성에서의 감정 인식률은 적응 샘플수의 증가에 따라 점차적으로 증가함을 보여준다.

  • PDF