• 제목/요약/키워드: Parts of Speech

검색결과 135건 처리시간 0.025초

휴먼-로봇 인터페이스를 위한 TTS의 개발 (Development of TTS for a Human-Robot Interface)

  • 배재현;오영환
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 춘계 학술대회 발표논문집
    • /
    • pp.135-138
    • /
    • 2006
  • The communication method between human and robot is one of the important parts for a human-robot interaction. And speech is easy and intuitive communication method for human-being. By using speech as a communication method for robot, we can use robot as familiar way. In this paper, we developed TTS for human-robot interaction. Synthesis algorithms were modified for an efficient utilization of restricted resource in robot. And synthesis database were reconstructed for an efficiency. As a result, we could reduce the computation time with slight degradation of the speech quality.

  • PDF

A Reliable Pitch Determination Algorithm (PDA) Based on Dyadic Wavelet Transform (DyWT)

  • Kim, Nam-Hoon;Kang, Yong-Sung;Ko, Han-Seok
    • 음성과학
    • /
    • 제7권4호
    • /
    • pp.3-10
    • /
    • 2000
  • This paper presents a time-based Pitch Determination Algorithm (PDA) for the reliable estimation of Pitch Period (PP) in speech signals. Based on the Dyadic Wavelet Transform (DyWT) , the proposed PDA detects the presence of Glottal Closure Instants (GCI) and uses the information to determine the pitch period. We also examine the problem of conventional PDAs based on DyWT; their performance is compared with the proposition of this paper. The effectiveness of the proposed method is tested with real speech signals containing a transition between the voiced and the unvoiced interval where the energy of the voiced signal is unsteady. The result shows that the proposed method provides good performance in estimating both the unsteady GCI positions as well as the steady parts.

  • PDF

Implementation of Extracting Specific Information by Sniffing Voice Packet in VoIP

  • Lee, Dong-Geon;Choi, WoongChul
    • International journal of advanced smart convergence
    • /
    • 제9권4호
    • /
    • pp.209-214
    • /
    • 2020
  • VoIP technology has been widely used for exchanging voice or image data through IP networks. VoIP technology, often called Internet Telephony, sends and receives voice data over the RTP protocol during the session. However, there is an exposition risk in the voice data in VoIP using the RTP protocol, where the RTP protocol does not have a specification for encryption of the original data. We implement programs that can extract meaningful information from the user's dialogue. The meaningful information means the information that the program user wants to obtain. In order to do that, our implementation has two parts. One is the client part, which inputs the keyword of the information that the user wants to obtain, and the other is the server part, which sniffs and performs the speech recognition process. We use the Google Speech API from Google Cloud, which uses machine learning in the speech recognition process. Finally, we discuss the usability and the limitations of the implementation with the example.

자바를 이용한 음성인식 시스템에 관한 연구 (Study of Speech Recognition System Using the Java)

  • 최광국;김철;최승호;김진영
    • 한국음향학회지
    • /
    • 제19권6호
    • /
    • pp.41-46
    • /
    • 2000
  • 본 논문에서는 자바를 사용하여 연속분포 HMM 알고리즘과 Browser-embedded 모델로 음성인식시스템을 구현하였다. 이 시스템은 웹상에서 음성분석, 처리, 인식과정을 실행할 수 있도록 설계되었으며, 클라이언트에서는 자바애플릿을 이용하여 음성의 끝점검출과 MFCC와 에너지 그리고 델타계수들을 추출하여 소켓을 통해 서버로 전송하고, 서버는 HMM 인식기와 학습DB를 이용하여 인식을 수행하고 인식된 결과는 클라이언트에 전송되어 문자로 출력되어진다. 또한 이 시스템은 플랫폼에 독립적인 시스템으로 네트웍상에서 구축되었기 때문에 높은 에러율을 갖고 있지만 멀티미디어 분야에 접목시켰다는 의의와 향후에 새로운 정보통신 서비스가 될 가능성이 있음을 알 수 있었다.

  • PDF

멀티채널 AMR 음성부호화기의 실시간 구현 (Real-time Implementation of Multi-channel AMR Speech Coder)

  • 지덕구;박만호;김형중;윤병식;최송인
    • 한국음향학회지
    • /
    • 제20권8호
    • /
    • pp.19-23
    • /
    • 2001
  • 고속 저전력의 DSP (Programmable Digital Signal Processor)가 개발됨에 따라 이동통신 분야에서 시스템 및 단말기 등이 DSP를 사용하여 구현되고 있다. 본 논문에서는 DSP를 사용한 AMR (Adaptive Multi-rate) 음성부호화기의 멀티 채널 실시간 구현에 관하여 논한다. AMR 음성부호화 알고리즘을 250 MHz로 동작하는 32비트 정수형 DSP 칩인 TMS320C6202를 사용하여 구현하였다. 실시간 동작을 위하여 cross compile, 선형 어셈블리 최적화, TMS320C62xx 어셈블리 최적화 작업을 수행하였다. AMR 음성부호화기에 음성 데이터 입출력 기능 및 외부 CPU와의 통신기능을 포함하였다. DSP EVM 보드를 사용하여 AMR 음성부호화기를 개발하였고, ETRI에서 개발중인 비동기 IMT-2000 시스템 상에서 동작 및 기능을 검증하였다.

  • PDF

Decision-Tree-Based Markov Model for Phrase Break Prediction

  • Kim, Sang-Hun;Oh, Seung-Shin
    • ETRI Journal
    • /
    • 제29권4호
    • /
    • pp.527-529
    • /
    • 2007
  • In this paper, a decision-tree-based Markov model for phrase break prediction is proposed. The model takes advantage of the non-homogeneous-features-based classification ability of decision tree and temporal break sequence modeling based on the Markov process. For this experiment, a text corpus tagged with parts-of-speech and three break strength levels is prepared and evaluated. The complex feature set, textual conditions, and prior knowledge are utilized; and chunking rules are applied to the search results. The proposed model shows an error reduction rate of about 11.6% compared to the conventional classification model.

  • PDF

Specifics of Speech Development of Children with Cerebral Palsy

  • Zavitrenko, Dolores;Rizhniak, Renat;Snisarenko, Iryna;Pasichnyk, Natalia;Babenko, Tetyana;Berezenko, Natalia
    • International Journal of Computer Science & Network Security
    • /
    • 제22권11호
    • /
    • pp.157-162
    • /
    • 2022
  • Cerebral palsy is one of the most serious forms of disorders of the psychophysical development of children, which manifests itself in disturbances of motor functions, which are often combined with speech disorders, other complications of the formation of higher mental functions, and often with a decrease in intelligence. The article will discuss the speech disorder in children with cerebral palsy. Emphasis is placed on some important aspects, which should bear in mind, investigating the problem of specifics of speech development of children with cerebral palsy. In particular at the heart of speech disorders in the cerebral palsy is not only damage to certain structures of the brain, but also the later formation or underdevelopment of those parts of the cerebral cortex, which are of major importance in linguistic and mental activity. This is an ontogenetically young region of the cerebral cortex, which is most rapidly developing after birth (premotor, frontal, temmono-temporal). It is important to take into account, that children with cerebral palsy have disturbances of phonemic perception. Often, children do not distinguish between hearing sounds, cannot repeat component rows, allocate sounds in words. At dysarthria, there are violations of pronunciation of vowel and consonant sounds, tempo of speech, modulation of voice, breathing, phonation, as well as asynchronous breathing, alignment and articulation. As a result, we identified the main features and specifics of the speech development of children with cerebral palsy and described the conditions necessary for the full development of language. Language disturbances in children's cerebral palsy depend on the localization and severity of brain damage. Great importance in the mechanism of speech disorders has a pathology that limits the ability of movement and knowledge of the world.

초등학교 6학년 국어교과서의 어휘 통계조사 (Statistical Survey of Vocabulary in Korean Textbook for Elementary School 6th-Grade)

  • 김종영;김철수
    • 한국콘텐츠학회논문지
    • /
    • 제12권5호
    • /
    • pp.515-524
    • /
    • 2012
  • 본 연구는 초등학교 6학년 국어교과서 4종(6-1 읽기, 6-1 말하기 듣기 쓰기, 6-2 읽기, 6-2 말하기 듣기 쓰기)에 나타나는 어휘들에 대한 통계(전체 음절수, 음절종류, 음절 출현빈도, 어절 개수, 어절 종류, 어절 평균길이, 어절 출현빈도, 품사 등)를 조사하였다. 한글 음절수는 194,683개, 음절종류는 1,290개, 평균 음절 출현빈도는 150.9회이다. 어절 개수는 70,185개, 어절 종류는 22,647개, 어절 평균 출현빈도는 3.1회이다. 평균 음절 길이는 2.8음절이며, 가장 긴 어절은 10음절이다. 품사는 읽기 교과는 명사가 말하기 듣기 쓰기교과는 동사가 약간 많다.

음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구 (A Study on the Improvement of DTW with Speech Silence Detection)

  • 김종국;조왕래;배명진
    • 음성과학
    • /
    • 제10권4호
    • /
    • pp.117-124
    • /
    • 2003
  • Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.

  • PDF

로컬 프레임 속도 변경에 의한 데이터 증강을 이용한 트랜스포머 기반 음성 인식 성능 향상 (Improving transformer-based speech recognition performance using data augmentation by local frame rate changes)

  • 임성수;강병옥;권오욱
    • 한국음향학회지
    • /
    • 제41권2호
    • /
    • pp.122-129
    • /
    • 2022
  • 본 논문은 프레임 속도를 국부적으로 조절하는 데이터 증강을 이용하여 트랜스포머 기반 음성 인식기의 성능을 개선하는 방법을 제안한다. 먼저, 원래의 음성데이터에서 증강할 부분의 시작 시간과 길이를 랜덤으로 선택한다. 그 다음, 선택된 부분의 프레임 속도는 선형보간법을 이용하여 새로운 프레임 속도로 변경된다. 월스트리트 저널 및 LibriSpeech 음성데이터를 이용한 실험결과, 수렴 시간은 베이스라인보다 오래 걸리지만, 인식 정확도는 대부분의 경우에 향상됨을 보여주었다. 성능을 더욱 향상시키기 위하여 변경 부분의 길이 및 속도 등 다양한 매개변수를 최적화하였다. 제안 방법은 월스트리트 저널 및 LibriSpeech 음성 데이터에서 베이스라인과 비교하여 각각 11.8 % 및 14.9 %의 상대적 성능 향상을 보여주는 것으로 나타났다.