통합 검색 | Korea Science

TV 가이드 영역에서의 음성기반 멀티모달 사용 유형 분석 (Speech-Oriented Multimodal Usage Pattern Analysis for TV Guide Application Scenarios)

김지영;이경님;홍기형
- 대한음성학회지:말소리
- /
- 제58호
- /
- pp.101-117
- /
- 2006
The development of efficient multimodal interfaces and fusion algorithms requires knowledge of usage patterns that show how people use multiple modalities. We analyzed multimodal usage patterns for TV-guide application scenarios (or tasks). In order to collect usage patterns, we implemented a multimodal usage pattern collection system having two input modalities: speech and touch-gesture. Fifty-four subjects participated in our study. Analysis of the collected usage patterns shows a positive correlation between the task type and multimodal usage patterns. In addition, we analyzed the timing between speech-utterances and their corresponding touch-gestures that shows the touch-gesture occurring time interval relative to the duration of speech utterance. We believe that, for developing efficient multimodal fusion algorithms on an application, the multimodal usage pattern analysis for the given application, similar to our work for TV guide application, have to be done in advance.
PDF

러시아어 발화시 억양의 역할 (On the Role of the Phatic Function of Intonation in Russian)

박근우
- 음성과학
- /
- 제4권1호
- /
- pp.81-89
- /
- 1998
This paper investigates the phatic function of intonation in Russian by recording and analysing 11 female native speakers of standard Moscow Russian. This paper shows that differences in intonation pattern of a sentence are associated with differences in degree of listener's involvement in the speech. Intonation pattern of an utterance having phatic function appears to be determined by 1) the speaker's readiness to talk to evoke the listener's attention ; 2) the speaker's intention to continue the communication. Some emphasis is placed on the relationship between intonation pattern of an utterance and speaker-listener interaction.
PDF

SPEECH SYNTHESIS USING LARGE SPEECH DATA-BASE

Lee, Kyu-Keon;Mochida, Takemi;Sakurai, Naohiro;Shirai, Katasuhiko
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1994년도 FIFTH WESTERN PACIFIC REGIONAL ACOUSTICS CONFERENCE SEOUL KOREA
- /
- pp.949-956
- /
- 1994
In this paper, we introduce a new speech synthesis method for Japanese and Korean arbitrary sentences using the natural speech data-base. Also, application of this method to a CAI system is discussed. In our synthesis method, a basic sentence and basic accent-phrases are selected from the data-base against a target sentence. Factors for those selections are phrase dependency structure (separation degree), number of morae, type of accent and phonemic labels. The target pitch pattern and phonemic parameter series are generated using those selected basic units. As the pitch pattern is generated using patterns which are directly extracted form real speech, it is expected to be more natural than any other pattern which is estimated by any model. Until now, we have examined this method on Japanese sentence speech and affirmed that the synthetic sound preserves human-like features fairly well. Now we extend this method to Korean sentence speech synthesis. Further more, we are trying to apply this synthesis unit to a CAI system.
PDF

순환 신경망 모델을 이용한 한국어 음소의 음성인식에 대한 연구 (A Study on the Speech Recognition of Korean Phonemes Using Recurrent Neural Network Models)

김기석;황희영
- 대한전기학회논문지
- /
- 제40권8호
- /
- pp.782-791
- /
- 1991
In the fields of pattern recognition such as speech recognition, several new techniques using Artifical Neural network Models have been proposed and implemented. In particular, the Multilayer Perception Model has been shown to be effective in static speech pattern recognition. But speech has dynamic or temporal characteristics and the most important point in implementing speech recognition systems using Artificial Neural Network Models for continuous speech is the learning of dynamic characteristics and the distributed cues and contextual effects that result from temporal characteristics. But Recurrent Multilayer Perceptron Model is known to be able to learn sequence of pattern. In this paper, the results of applying the Recurrent Model which has possibilities of learning tedmporal characteristics of speech to phoneme recognition is presented. The test data consist of 144 Vowel+ Consonant + Vowel speech chains made up of 4 Korean monothongs and 9 Korean plosive consonants. The input parameters of Artificial Neural Network model used are the FFT coefficients, residual error and zero crossing rates. The Baseline model showed a recognition rate of 91% for volwels and 71% for plosive consonants of one male speaker. We obtained better recognition rates from various other experiments compared to the existing multilayer perceptron model, thus showed the recurrent model to be better suited to speech recognition. And the possibility of using Recurrent Models for speech recognition was experimented by changing the configuration of this baseline model.

말씨와 활동성의 체질특성 문항에 대한 통계적 분석 (A Statistical Analysis of the questionnaire concerning Sasang Constitutional Characteristics on 'Pattern of speech and activity')

문성택;이시우;김홍기;김종열
- 한국한의학연구원논문집
- /
- 제13권1호통권19호
- /
- pp.85-92
- /
- 2007
To evaluate the suitability and effectiveness of the questionnaire concerning personal properties on 'pattern of speech and activity' according to the Sasang constitution that were used in Iksan Wonkwang Oriental Medicine, we analyzed the data of 1,335 patients obtained through the electronic chart in the aspect of 'relative discrimination ability' to Sasang constitutions and 'response ratio' using statistical Package SPSS. In categories of 'speech pattern', No.2 (speak mildly and softly) was effectively discriminating Soeum type. No.4 (talkative) and No.7 (speak fast) were effective factors for the discrimination of Soyang type, though No.4 (talkative) was needed to be improved in response ratio. The category of 'activity pattern' has shown high response ratio but low discriminating power. However, No.2 (keep staying home but avoid going out) in this category was effectively discriminating Soeum type. The discriminating power of 'pattern of speech and activity' for the age group less than 20 years old was too low, so it is necessary to develop the questionnaire for the elementary to high school students as well as for the preschoolers.
PDF

유창성과 비유창성 화자의 발성 종결-개시 차이에 관한 예비연구 (A Preliminary Study on Differences of Phonatory Offset-Onset between the Fluency and a Dysfluency)

한지연;이옥분
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2006년도 춘계 학술대회 발표논문집
- /
- pp.109-112
- /
- 2006
This study investigated the acoustical characteristics of phonatory offset-onset mechanisms. And this study shows the comparative results between non-stutterers (N=3) and a stutterer (N=1). Phonatory offset-onset means a laryngeal articulatory in the connected speech. In the phonetic context V_V), pattern 0(there is no changes) appeared in all subjects, and pattern 4(this indicate the trace of glottal fry and closure in spectrogram)was only in a Stutterer. In high vowels(/i/, /u/), pattern 3 and 4 appeared only in a stutterer. Although there is no common pattern among the non-stutterers, individual's preference pattern was founded. This study offers the key to an understanding of physiological movement on a block of stutter.
PDF

스펙트럼 패턴 기반의 잡음 환경에 강인한 음성의 끝점 검출 기법 (Spectral Pattern Based Robust Speech Endpoint Detection in Noisy Environments)

박진수;이윤재;이인호;고한석
- 말소리와 음성과학
- /
- 제1권4호
- /
- pp.111-117
- /
- 2009
In this paper, a new speech endpoint detector in noisy environment is proposed. According to the previous research, the energy feature in the speech region is easily distinguished from that in the speech absent region. In conventional method, the endpoint can be found by applying the edge detection filter that finds the abrupt changing point in feature domain. However, since the frame energy feature is unstable in noisy environment, the accurate edge detection is not possible. Therefore, in this paper, the novel feature extraction method based on spectrum envelop pattern is proposed. Then, the edge detection filter is applied to the proposed feature for detection of the endpoint. The experiments are performed in the car noise environment and a substantial improvement was obtained over the conventional method.
PDF

자유 대화에서의 한국어 원어민 화자와 한국어 고급 학습자들의 발화 속도 비교 (A Comparative Study on the Speech Rate of Advanced Korean(L2) Learners and Korean Native Speakers in Conversational Speech)

홍민경
- 한국어교육
- /
- 제29권3호
- /
- pp.345-363
- /
- 2018
The purpose of this study is to compare the speech rate of advanced Korean(L2) learners and Korean native speakers in spontaneous utterances. Specifically, the current study investigated the difference of the two groups' speech pattern according to utterance length. Eight advanced Korean(L2) learners and eight Korean native speakers participated in this study. The data were collected by recording their conversation and physical measurements (speaking rate, articulatory rates, pause and several types of speech disfluency) were taken on extracted 120 utterances from 12 out of the 16 participants. The findings show that advanced Korean learners' speech pattern is similar to that of Koreans in the short-length utterance. However, in the long-length utterance, two groups show different speech patterns; while the articulatory rate of Korean native speakers increased in the long-length utterance, that of Korean learners decreased. This suggests that the frequency of speech disfluency factors might affect this result.

Speech Recognition by Neural Net Pattern Recognition Equations with Self-organization

Kim, Sung-Ill;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- 제22권2E호
- /
- pp.49-55
- /
- 2003
The modified neural net pattern recognition equations were attempted to apply to speech recognition. The proposed method has a dynamic process of self-organization that has been proved to be successful in recognizing a depth perception in stereoscopic vision. This study has shown that the process has also been useful in recognizing human speech. In the processing, input vocal signals are first compared with standard models to measure similarities that are then given to a process of self-organization in neural net equations. The competitive and cooperative processes are conducted among neighboring input similarities, so that only one winner neuron is finally detected. In a comparative study, it showed that the proposed neural networks outperformed the conventional HMM speech recognizer under the same conditions.
PDF KSCI

음성 질의 기반 디지털 사진 검색 기법 (A Query-by-Speech Scheme for Photo Albuming)

김태성;서영주;이용주;김회린
- 대한음성학회지:말소리
- /
- 제57호
- /
- pp.99-112
- /
- 2006
In this paper, we introduce two retrieval methods for photos with speech documents. We compare the pattern of speech query with those of speech documents recorded in digital cameras, and measure the similarities, and retrieve photos corresponding to the speech documents which have high similarity scores. As the first approach, a phoneme recognition scheme is used as the pre-processor for the pattern matching, and in the second one, the vector quantization (VQ) and the dynamic time warping (DTW) are applied to match the speech query with the documents in signal domain itself. Experimental results show that the performance of the first approach is highly dependent on that of phoneme recognition while the processing time is short. The second method provides a great improvement of performance. While the processing time is longer than that of the first method due to DTW, but we can reduce it by taking approximated methods.
PDF

검색결과 412건 처리시간 0.023초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)