Search | Korea Science

Factors for Speech Signal Time Delay Estimation (음성 신호를 이용한 시간지연 추정에 미치는 영향들에 관한 연구)

Kwon, Byoung-Ho;Park, Young-Jin;Park, Youn-Sik
- Transactions of the Korean Society for Noise and Vibration Engineering
- /
- v.18 no.8
- /
- pp.823-831
- /
- 2008
Since it needs the light computational load and small database, sound source localization method using time delay of arrival(TDOA method) is applied at many research fields such as a robot auditory system, teleconferencing and so on. Researches for time delay estimation, which is the most important thing of TDOA method, had been studied broadly. However studies about factors for time delay estimation are insufficient, especially in case of real environment application. In 1997, Brandstein and Silverman announced that performance of time delay estimation deteriorates as reverberant time of room increases. Even though reverberant time of room is same, performance of estimation is different as the specific part of signals. In order to know that reason, we studied and analyzed the factors for time delay estimation using speech signal and room impulse response. In result, we can know that performance of time delay estimation is changed by different R/D ratio and signal characteristics in spite of same reverberant time. Also, we define the performance index(PI) to show a similar tendency to R/D ratio, and propose the method to improve the performance of time delay estimation with PI.
https://doi.org/10.5050/KSNVN.2008.18.8.823 인용 PDF KSCI

Korean Lexical Disambiguation Based on Statistical Information (통계정보에 기반을 둔 한국어 어휘중의성해소)

박하규;김영택
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.19 no.2
- /
- pp.265-275
- /
- 1994
Lexical disambiguation is one of the most basic areas in natural language processing such as speech recognition/synthesis, information retrieval, corpus tagging/ etc. This paper describes a Korean lexical disambiguation mechanism where the disambigution is perfoemed on the basis of the statistical information collected from corpora. In this mechanism, the token tags corresponding to the results of the morphological analysis are used instead of part of speech tags for the purpose of detail disambiguation. The lexical selection function proposed shows considerably high accuracy, since the lexical characteristics of Korean such as concordance of endings or postpositions are well reflected in it. Two disambiguation methods, a unique selection method and a multiple selection method, are provided so that they can be properly according to the application areas.
PDF

Human Voice, This Mystery

Horiuchi, Terumichi
- Proceedings of the KSPS conference
- /
- 1996.10a
- /
- pp.378-378
- /
- 1996
Human beings and chimpanzees are very much alike. and scientists say there is only 1% difference between them. Contrary to our expectations, the difference lies not in brains but in tracheas ( windpipes ). Those of human beings are bigger and longer than those of chimpanzees. Thu means more air is inspired and expired as breath. About breath there are interesting descriptions in the Bible. In the Genesis it says God made a man out of soil and breathed life-giving breath into his nostrils and the man began to live. In other part it says life exists between incoming breath and outgoing breath. Thus breath plays key role is our life. In Hebrew and Greek, breath and spirit are the same words. In Hebrew it is ‘Luahf’ and in Greek, ‘Pneuma’ With breath and mouth organs human beings produced voice, and with haritage and through leaning we train our voice to reach the level of language which convey our culture. My contention is to realize the gift of voice and train it so that it can perform proper function as a tool of conveying our thought and culture. This is a kind of practice of speech and it may be called speechology. It includes the following practical methods: 1. Try to read aloud. 2. Encourage recitation, 3. Make public speaking as possible. 4. Learn theories of phonetics; such as about pronunciation, accent, intonation, prominence, assimilation and so on.
PDF

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

Lee, Suk-Myung;Choi, Jeung-Yoon
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.2
- /
- pp.167-173
- /
- 2013
This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.
https://doi.org/10.7776/ASK.2013.32.2.167 인용 PDF KSCI

Evaluation of Teaching English Intonation through Native Utterances with Exaggerated Intonation (억양이 과장된 원어민 발화를 통한 영어 억양 교육과 평가)

Yoon, Kyu-Chul
- Phonetics and Speech Sciences
- /
- v.3 no.1
- /
- pp.35-43
- /
- 2011
The purpose of this paper is to evaluate the viability of employing the intonation exaggeration technique proposed in [4] in teaching English prosody to university students. Fifty-six female university students, twenty-two in a control group and the other thirty-four in an experimental group, participated in a teaching experiment as part of their regular coursework for a five-and-a-half week period. For the study material of the experimental group, a set of utterances was synthesized whose intonation contours had been exaggerated whereas the control group was given the same set without any intonation modification. Recordings from both before and after the teaching experiment were made and one sentence set was chosen for analysis. The parameters analyzed were the pitch range, words containing the highest and lowest pitch points, and the 3-dimensional comparison of the three prosodic features [2]. An AXB and subjective rating test were also performed along with a qualitative screening of the individual intonation contours. The results showed that the experimental group performed slightly better in that their intonation contour was more similar to that of the model native speaker's utterance. This appears to suggest that the intonation exaggeration technique can be employed in teaching English prosody to students.
PDF

wheelchair system design on speech recognition function (음성인식 기능을 탑재한 다기능 휠체어 시스템 설계 및 구현)

김정훈;류홍석;강재명;강성인;김관형;이상배
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2002.05a
- /
- pp.1-5
- /
- 2002
The purpose of this paper is developing a speech recognition module in a wheelchair for the sake of convenience. of the disability. For this system, we used TMS320C32 as a main processor; eliminated noise by applying Winer filler while considering characteristics of noise environment in pre-processing stage, and; extracted 12 feature patterns per france using LPC&Cepstrum. Then, we implemented the hybrid form combining DTW (Dynamic Time Warping), which is generally used for isolated words in the conventional algorithms, in the recognition Part, and NN (Neural network) to prevent any error of recognition. In this research, we achieved a recognition rate of more than 96% on isolated words when DTW and Hybrid forms were individually experimented in noise environment
PDF

A Digital Audio Respose System Based on the RELP Algorithm (RELP 방식을 이용한 디지털 음성 응답기)

김상용;은종관
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.21 no.6
- /
- pp.7-16
- /
- 1984
This paper describes the overall procedure of the development of a digital audio response system. It has been developed specifically as an answering system to the inquiries of telephone numbers from subscribers. The system has been realized based on the residual excited linear prediction (RELP) algorithm that incorporates a pitch predictive loop. Its major advantage over other similar systems is that it produces high quality of synthetic speech, although its memory size is relatively small. The hardware which consists of a speech synthesizer, a controller and an I/O part has been constructed using 2900 series bit-slice microprocessors and an INTEL 8085 microprocessor. The system is capable of real time processing, reliable, and adaptable to other applications.
PDF

A Neural Network Based Korean Segmental Duration Modeling Using Tonal Information of Phonemes (음소별 성조 정보를 이용한 신경망 기반의 한국어 음소 지속시간 모델링)

김은경;이상호;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.6
- /
- pp.84-88
- /
- 1999
The accurate estimation of segmental duration is crucial for natural-sounding text-to-speech synthesis. For predicting Korean segmental durations, conventional methods utilized phonemic context, part-of-speech context and locational information in prosodic phrase. In this paper, the tonal information of phonemes is employed for more accurate prediction. After defining two non-boundary tones and six boundary tones, we annotated the tonal label on each syllable of 400 sentences. To predict segmental duration using tonal information, we constructed neural networks with a real-valued output node predicting phonemic duration and trained them by backpropagation algorithm. Experimental results showed that the proposed features are effective for predicting Korean segmental durations, and we got 0.863 correlation coefficient of the observed durations and predicted ones.
PDF

The Therapeutic Effects of $SKTCLP^{(R)}$ in Patients with Mutational Dysphonia (생리적 발성 기법의 변성발성장애 치료 적용 효과)

Kim, Seong-Tae;Nam, Soon-Yuhl
- Phonetics and Speech Sciences
- /
- v.3 no.2
- /
- pp.99-105
- /
- 2011
The treatment for patients with mutational dysphonia typically is useful with vegetative phonation, but has not yet been studied. This study attempts to identify the effect of $SKTCLP^{(R)}$ using throat clearing and laughing in patients with mutational dysphonia. The study, which was designed by the author, included 26 patients aged from 14 to 32 years (mean: 18.7 years) who had been diagnosed with mutational dysphonia between January 2007 and June 2010. Voice therapy for these patients included $SKTCLP^{(R)}$, ranging from two to seven sessions (mean: 3.8 sessions). Results were evaluated by videostroboscopy, perceptual evaluation of GRBAS scale, aerodynamic test, and acoustic analysis before and after therapy. Most patients could phonate with low pitch from the beginning and sustain with normal pitch sound in the last session. We had found that glottic gap reduced after therapy and anterior-posterior compression of superior laryngeal part at the first time, and these patients had complete closure of the glottis after treatment. The results of acoustic and aerodynamic measures after treatment indicated significant decreases in Fo, Jitter, Shimmer, SFF, and SPI, and increases in MPT, Psub, and vocal efficiency (p<.05). $SKTCLP^{(R)}$ may be a useful treatment method in managing mutational dysphonia. We can suggest this technique may be useful in improving the voice quality of other functional dysphonia having glottal chink or functional aphonia.
PDF

Voice Activity Detection in Noisy Environment based on Statistical Nonlinear Dimension Reduction Techniques (통계적 비선형 차원축소기법에 기반한 잡음 환경에서의 음성구간검출)

Han Hag-Yong;Lee Kwang-Seok;Go Si-Yong;Hur Kang-In
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.5
- /
- pp.986-994
- /
- 2005
This Paper proposes the likelihood-based nonlinear dimension reduction method of the speech feature parameters in order to construct the voice activity detecter adaptable in noisy environment. The proposed method uses the nonlinear values of the Gaussian probability density function with the new parameters for the speec/nonspeech class. We adapted Likelihood Ratio Test to find speech part and compared its performance with that of Linear Discriminant Analysis technique. In experiments we found that the proposed method has the similar results to that of Gaussian Mixture Models.
PDF KSCI

Search Result 439, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)