Search | Korea Science

Segmentation of Continuous Speech based on PCA of Feature Vectors (주요고유성분분석을 이용한 연속음성의 세그멘테이션)

신옥근
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.2
- /
- pp.40-45
- /
- 2000
In speech corpus generation and speech recognition, it is sometimes needed to segment the input speech data without any prior knowledge. A method to accomplish this kind of segmentation, often called as blind segmentation, or acoustic segmentation, is to find boundaries which minimize the Euclidean distances among the feature vectors of each segments. However, the use of this metric alone is prone to errors because of the fluctuations or variations of the feature vectors within a segment. In this paper, we introduce the principal component analysis method to take the trend of feature vectors into consideration, so that the proposed distance measure be the distance between feature vectors and their projected points on the principal components. The proposed distance measure is applied in the LBDP(level building dynamic programming) algorithm for an experimentation of continuous speech segmentation. The result was rather promising, resulting in 3-6% reduction in deletion rate compared to the pure Euclidean measure.
PDF

복지정보통신

정성기
- Product Safety
- /
- s.67
- /
- pp.66-69
- /
- 1999
최근 고령인구와 장애인구에 대한 사회적 배려나 관심이 지속적으로 증가로 인하여 이들에게 실질적인 혜택을 주기위한 기술적 발전과 그에 따른 산업 변화가 일고 있다. 거리에서 쉽게 볼 수 있는 거리의 음성신호를 장착한 신호등에서부터 공중전화, 자동차에 이르기까지 다양하다. 그 중에서 특히 장애인이나 노인들에게 제공되는 정보통신에 대한 시책들은 아직은 미흡하다.
PDF

Multi-Modal User Distance Estimation System based on Mobile Device (모바일 디바이스 기반의 멀티 모달 사용자 거리 추정 시스템)

Oh, Byung-Hun;Hong, Kwang-Seok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.14 no.2
- /
- pp.65-71
- /
- 2014
This paper present the multi-modal user distance estimation system using mono camera and mono microphone basically equipped with a mobile device. In case of a distance estimation method using an image, we is estimated a distance of the user through the skin color region extraction step, a noise removal step, the face and eyes region detection step. On the other hand, in case of a distance estimation method using speech, we calculates the absolute difference between the value of the sample of speech input. The largest peak value of the calculated difference value is selected and samples before and after the peak are specified as the ROI(Region of Interest). The samples specified perform FFT(Fast Fourier Transform) and calculate the magnitude of the frequency domain. Magnitude obtained is compared with the distance model to calculate the likelihood. We is estimated user distance by adding with weights in the sorted value. The result of an experiment using the multi-modal method shows more improved measurement value than that of single modality.
https://doi.org/10.7236/JIIBC.2014.14.2.65 인용 PDF KSCI

comparison of Speech Enhancement Methods Using Multiresolutional Signal Analysis (다해상도 신호해석을 이용한 음성개선 방식 비교)

한미경;석종원배건성
- Proceedings of the IEEK Conference
- /
- 1998.10a
- /
- pp.1251-1254
- /
- 1998
본 논문에서는 최근들어 널리 연구되고 있는 다해상도 신호해석 방법인 웨이브렛 변환, 웨이브렛 패킷, 그리고 코사인 패킷 알고리듬을 음성개선에 이용하여 각각의 성능을 비교하였으며, 또한 이를 기존의 스펙트럼차감법의 성능과 비교 분석 하였다. 성능비교의 척도로는 SNR과 ㅋ스트랄 거리를 이용하였다. 실험결과 SNR면에서는 코사인 패킷이 가장 좋은 결과를 보였다. 그리고 ㅋ스트랄 거리의 경우 코사인 패킷과 웨이브렛 패켓이 훨씬 나은 결과를 보였으며 주관적인 청취결과 역시 코사인 패킷이 가장 좋은 결과를 보였고, 기존의 스펙트럼 차감법은 musical noise의 영향으로 인해 상대적으로 다른 방식에 비해 합성음의 음질이 많이 떨어짐을 확인할 수 있었다.
PDF

Design of voice warning system using bluetooth and ultrasonic sensor (블루투스 및 초음파 센서를 이용한 위험감지 음성 시스템 설계)

Park, Joon-Hoon;Kim, Jin-Min;Park, Min-Kyu
- Proceedings of the KIEE Conference
- /
- 2008.07a
- /
- pp.1531-1532
- /
- 2008
본 논문은 앞을 볼 수 없는 시각 장애인의 보행 시 장애물에 대한 안전을 고려한 시스템으로 보행 시 전방에 보행 장애 물체가 나타날 경우 초음파 센서가 장애물의 위치거리를 측정하고 측정된 거리 데이터를 근거리 무선통신을 통해 송.수신하여 이를 최종적으로 사용자인 시각 장애인에게 음성으로 사전 경고함으로써 발생할 수 있는 위험 요소를 대처하는데 기여할 수 있도록 설계하였다. 설계된 시스템은 IT관련 기술을 활용함으로써 휴먼 IT기술을 구현하고자 하였다.
PDF

Speech Visualization of Korean Vowels Based on the Distances Among Acoustic Features (음성특징의 거리 개념에 기반한 한국어 모음 음성의 시각화)

Pok, Gouchol
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.12 no.5
- /
- pp.512-520
- /
- 2019
It is quite useful to represent speeches visually for learners who study foreign languages as well as the hearing impaired who cannot directly hear speeches, and a number of researches have been presented in the literature. They remain, however, at the level of representing the characteristics of speeches using colors or showing the changing shape of lips and mouth using the animation-based representation. As a result of such approaches, those methods cannot tell the users how far their pronunciations are away from the standard ones, and moreover they make it technically difficult to develop such a system in which users can correct their pronunciation in an interactive manner. In order to address these kind of drawbacks, this paper proposes a speech visualization model based on the relative distance between the user's speech and the standard one, furthermore suggests actual implementation directions by applying the proposed model to the visualization of Korean vowels. The method extract three formants F1, F2, and F3 from speech signals and feed them into the Kohonen's SOM to map the results into 2-D screen and represent each speech as a pint on the screen. We have presented a real system implemented using the open source formant analysis software on the speech of a Korean instructor and several foreign students studying Korean language, in which the user interface was built using the Javascript for the screen display.
https://doi.org/10.17661/jkiiect.2019.12.5.512 인용 PDF KSCI

A Speaker Dependent Speech Recognition Method Using LSP Parameters for Small Training Data (적은 훈련 데이터를 이용한 LSP 파라메터 기반의 화자종속 음성인식에 관한 연구)

곽수주
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.373-376
- /
- 1998
통신 수단의 발달로 휴대단말기의 사용이 증가하고 있으며, 이와 함께 휴대단말기에서의 음성인식에 대한 수요도 증가하고 있다. 휴대단말기의 경우 저 전송율을 가지는 음성 부호화기를 사용하게 되며, 이러한 저전송율의 음성 부호화기에서의 음성인식을 수행할 경우 인식 성능이 저하되는 현상을 보이게 된다. 본 논문에서는 이러한 문제를 해결하기 위하여 LSP 파라메터 기반의 거리척도에 관하여 비교 검토하였으며, 적은 훈련 데이터에서 사용 가능한 화자 종속 음성인식 방법으로 Dynamic Time Warping(DTW)과 변형된 Hidden Markov Model(HMM)에 관하여 검토하였다. QCELP 음성 부호화기에서 인식 어휘 당 2번의 훈련 데이터만을 이용한 화자종속 인식방법을 사용한 결과 95% 이상의 인식 성능을 얻을 수 있었다.
PDF

Speech Recognition Imptovement Using Extraction Selective Observation in DHMM (선별적인 관측열 추출을 통한 DHMM 음성인식의 성능 개선)

김우창;조선호;고수정;이정현
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10b
- /
- pp.374-376
- /
- 2000
음성인식 시스템에 사용하는 알고리즘 중에 하나인 DHMM은 코드북을 이용하여 음성의 프레임들에 대한 특징을 관측열로 추출하여 음성의 패턴에 대한 훈련과 인식을 수행하게 된다. 그러나 음성은 유성음과 무성음의 특징 차이가 많이 나게 되므로 하나의 코드북을 이용하게 되면 코드북 오차에 의하여 성질이 전혀 다른 코드북 인덱스를 DHMM의 관측열로 사용하게 된다. 본 논문에서는 음성의 유성음과 무성음에 대한 선별적인 작업을 통해 서로 다른 코드북을 만들어 관측열을 추출하고 선행 관측과 현 관측과의 거리 비교 연산을 통하여 관측의 시간축을 정규화한 관측열을 음성인식에 사용하였다. 본 논문에서 제시하는 인식 방법을 사용하여 실험한 결과, 기존의 인식 방법보다 5.33% 향상된 결과를 얻었다.
PDF

Implementation of Speech Recognition System for Car Navigation (차량 항법용 음성 인식 시스템 구현)

김지성
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06c
- /
- pp.51-54
- /
- 1998
본 논문에서는 자동차 잡음 환경에서 녹음된 데이터 베이스를 이용하여 인식 시스템의 성능을 향상시키기 위한 효율적인 잡음 제거 방법을 연구하였다. 먼저, 잡음 및 주변 환경 변화에 강인한 것으로 알려져 있는 특징 벡터들의 인식 성능을 비교하교, 가중 켑스트랄 거리 측정 방법을 이용한 인식 실험을 통하여 시스템의 성능 향상을 확인하였다. 실험 결과, 본 논문에서 기준 시스템으로 사용한 LPC 켑스트럼의 경우에 비하여 MFCC나 root-cepstrum을 사용한 경우 인식률이 향상되었다. 켑스트럼간의 거리 측정에 있어서는 RPS와 BPL과 같은 가중 켑스트랄 거리 측정 함수들이 인식 성능 향상에 도움을 주었다. 또한 켑스트럼 평균 차감법이라는 간단한 잡음 제거기술을 적용하여 자동차 잡음 환경에서 인식 성능 향상을 보였다. 마지막으로, 차량 항법용 음성 인식 시스템의 실시간구현을 위하여 여러 경우의 인식 성능을 비교하고, 메모리 량과 실행 시간 등을 고려하여 최적 시스템을 제시하였다.
PDF

A Study on Processing of Speech Recognition Korean Words (한글 단어의 음성 인식 처리에 관한 연구)

Nam, Kihun
- The Journal of the Convergence on Culture Technology
- /
- v.5 no.4
- /
- pp.407-412
- /
- 2019
In this paper, we propose a technique for processing of speech recognition in korean words. Speech recognition is a technology that converts acoustic signals from sensors such as microphones into words or sentences. Most foreign languages have less difficulty in speech recognition. On the other hand, korean consists of vowels and bottom consonants, so it is inappropriate to use the letters obtained from the voice synthesis system. That improving the conventional structure speech recognition can the correct words recognition. In order to solve this problem, a new algorithm was added to the existing speech recognition structure to increase the speech recognition rate. Perform the preprocessing process of the word and then token the results. After combining the result processed in the Levenshtein distance algorithm and the hashing algorithm, the normalized words is output through the consonant comparison algorithm. The final result word is compared with the standardized table and output if it exists, registered in the table dose not exists. The experimental environment was developed by using a smartphone application. The proposed structure shows that the recognition rate is improved by 2% in standard language and 7% in dialect.
https://doi.org/10.17703/JCCT.2019.5.4.407 인용 PDF KSCI

Search Result 135, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)