Search | Korea Science

Speech-Oriented Multimodal Usage Pattern Analysis for TV Guide Application Scenarios (TV 가이드 영역에서의 음성기반 멀티모달 사용 유형 분석)

Kim Ji-Young;Lee Kyong-Nim;Hong Ki-Hyung
- MALSORI
- /
- no.58
- /
- pp.101-117
- /
- 2006
The development of efficient multimodal interfaces and fusion algorithms requires knowledge of usage patterns that show how people use multiple modalities. We analyzed multimodal usage patterns for TV-guide application scenarios (or tasks). In order to collect usage patterns, we implemented a multimodal usage pattern collection system having two input modalities: speech and touch-gesture. Fifty-four subjects participated in our study. Analysis of the collected usage patterns shows a positive correlation between the task type and multimodal usage patterns. In addition, we analyzed the timing between speech-utterances and their corresponding touch-gestures that shows the touch-gesture occurring time interval relative to the duration of speech utterance. We believe that, for developing efficient multimodal fusion algorithms on an application, the multimodal usage pattern analysis for the given application, similar to our work for TV guide application, have to be done in advance.
PDF

On the Role of the Phatic Function of Intonation in Russian (러시아어 발화시 억양의 역할)

Park, Kun-Woo
- Speech Sciences
- /
- v.4 no.1
- /
- pp.81-89
- /
- 1998
This paper investigates the phatic function of intonation in Russian by recording and analysing 11 female native speakers of standard Moscow Russian. This paper shows that differences in intonation pattern of a sentence are associated with differences in degree of listener's involvement in the speech. Intonation pattern of an utterance having phatic function appears to be determined by 1) the speaker's readiness to talk to evoke the listener's attention ; 2) the speaker's intention to continue the communication. Some emphasis is placed on the relationship between intonation pattern of an utterance and speaker-listener interaction.
PDF

SPEECH SYNTHESIS USING LARGE SPEECH DATA-BASE

Lee, Kyu-Keon;Mochida, Takemi;Sakurai, Naohiro;Shirai, Katasuhiko
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.949-956
- /
- 1994
In this paper, we introduce a new speech synthesis method for Japanese and Korean arbitrary sentences using the natural speech data-base. Also, application of this method to a CAI system is discussed. In our synthesis method, a basic sentence and basic accent-phrases are selected from the data-base against a target sentence. Factors for those selections are phrase dependency structure (separation degree), number of morae, type of accent and phonemic labels. The target pitch pattern and phonemic parameter series are generated using those selected basic units. As the pitch pattern is generated using patterns which are directly extracted form real speech, it is expected to be more natural than any other pattern which is estimated by any model. Until now, we have examined this method on Japanese sentence speech and affirmed that the synthetic sound preserves human-like features fairly well. Now we extend this method to Korean sentence speech synthesis. Further more, we are trying to apply this synthesis unit to a CAI system.
PDF

A Study on the Speech Recognition of Korean Phonemes Using Recurrent Neural Network Models (순환 신경망 모델을 이용한 한국어 음소의 음성인식에 대한 연구)

김기석;황희영
- The Transactions of the Korean Institute of Electrical Engineers
- /
- v.40 no.8
- /
- pp.782-791
- /
- 1991
In the fields of pattern recognition such as speech recognition, several new techniques using Artifical Neural network Models have been proposed and implemented. In particular, the Multilayer Perception Model has been shown to be effective in static speech pattern recognition. But speech has dynamic or temporal characteristics and the most important point in implementing speech recognition systems using Artificial Neural Network Models for continuous speech is the learning of dynamic characteristics and the distributed cues and contextual effects that result from temporal characteristics. But Recurrent Multilayer Perceptron Model is known to be able to learn sequence of pattern. In this paper, the results of applying the Recurrent Model which has possibilities of learning tedmporal characteristics of speech to phoneme recognition is presented. The test data consist of 144 Vowel+ Consonant + Vowel speech chains made up of 4 Korean monothongs and 9 Korean plosive consonants. The input parameters of Artificial Neural Network model used are the FFT coefficients, residual error and zero crossing rates. The Baseline model showed a recognition rate of 91% for volwels and 71% for plosive consonants of one male speaker. We obtained better recognition rates from various other experiments compared to the existing multilayer perceptron model, thus showed the recurrent model to be better suited to speech recognition. And the possibility of using Recurrent Models for speech recognition was experimented by changing the configuration of this baseline model.

A Statistical Analysis of the questionnaire concerning Sasang Constitutional Characteristics on 'Pattern of speech and activity' (말씨와 활동성의 체질특성 문항에 대한 통계적 분석)

Moon, Seong-Taek;Lee, Si-Woo;Kim, Hong-Gie;Kim, Jong-Yeol
- Korean Journal of Oriental Medicine
- /
- v.13 no.1 s.19
- /
- pp.85-92
- /
- 2007
To evaluate the suitability and effectiveness of the questionnaire concerning personal properties on 'pattern of speech and activity' according to the Sasang constitution that were used in Iksan Wonkwang Oriental Medicine, we analyzed the data of 1,335 patients obtained through the electronic chart in the aspect of 'relative discrimination ability' to Sasang constitutions and 'response ratio' using statistical Package SPSS. In categories of 'speech pattern', No.2 (speak mildly and softly) was effectively discriminating Soeum type. No.4 (talkative) and No.7 (speak fast) were effective factors for the discrimination of Soyang type, though No.4 (talkative) was needed to be improved in response ratio. The category of 'activity pattern' has shown high response ratio but low discriminating power. However, No.2 (keep staying home but avoid going out) in this category was effectively discriminating Soeum type. The discriminating power of 'pattern of speech and activity' for the age group less than 20 years old was too low, so it is necessary to develop the questionnaire for the elementary to high school students as well as for the preschoolers.
PDF

A Preliminary Study on Differences of Phonatory Offset-Onset between the Fluency and a Dysfluency (유창성과 비유창성 화자의 발성 종결-개시 차이에 관한 예비연구)

Han Ji-Yeon;Lee Ok-Bun
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.109-112
- /
- 2006
This study investigated the acoustical characteristics of phonatory offset-onset mechanisms. And this study shows the comparative results between non-stutterers (N=3) and a stutterer (N=1). Phonatory offset-onset means a laryngeal articulatory in the connected speech. In the phonetic context V_V), pattern 0(there is no changes) appeared in all subjects, and pattern 4(this indicate the trace of glottal fry and closure in spectrogram)was only in a Stutterer. In high vowels(/i/, /u/), pattern 3 and 4 appeared only in a stutterer. Although there is no common pattern among the non-stutterers, individual's preference pattern was founded. This study offers the key to an understanding of physiological movement on a block of stutter.
PDF

Spectral Pattern Based Robust Speech Endpoint Detection in Noisy Environments (스펙트럼 패턴 기반의 잡음 환경에 강인한 음성의 끝점 검출 기법)

Park, Jin-Soo;Lee, Yoon-Jae;Lee, In-Ho;Ko, Han-Seok
- Phonetics and Speech Sciences
- /
- v.1 no.4
- /
- pp.111-117
- /
- 2009
In this paper, a new speech endpoint detector in noisy environment is proposed. According to the previous research, the energy feature in the speech region is easily distinguished from that in the speech absent region. In conventional method, the endpoint can be found by applying the edge detection filter that finds the abrupt changing point in feature domain. However, since the frame energy feature is unstable in noisy environment, the accurate edge detection is not possible. Therefore, in this paper, the novel feature extraction method based on spectrum envelop pattern is proposed. Then, the edge detection filter is applied to the proposed feature for detection of the endpoint. The experiments are performed in the car noise environment and a substantial improvement was obtained over the conventional method.
PDF

A Comparative Study on the Speech Rate of Advanced Korean(L2) Learners and Korean Native Speakers in Conversational Speech (자유 대화에서의 한국어 원어민 화자와 한국어 고급 학습자들의 발화 속도 비교)

Hong, Minkyoung
- Journal of Korean language education
- /
- v.29 no.3
- /
- pp.345-363
- /
- 2018
The purpose of this study is to compare the speech rate of advanced Korean(L2) learners and Korean native speakers in spontaneous utterances. Specifically, the current study investigated the difference of the two groups' speech pattern according to utterance length. Eight advanced Korean(L2) learners and eight Korean native speakers participated in this study. The data were collected by recording their conversation and physical measurements (speaking rate, articulatory rates, pause and several types of speech disfluency) were taken on extracted 120 utterances from 12 out of the 16 participants. The findings show that advanced Korean learners' speech pattern is similar to that of Koreans in the short-length utterance. However, in the long-length utterance, two groups show different speech patterns; while the articulatory rate of Korean native speakers increased in the long-length utterance, that of Korean learners decreased. This suggests that the frequency of speech disfluency factors might affect this result.

Speech Recognition by Neural Net Pattern Recognition Equations with Self-organization

Kim, Sung-Ill;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.2E
- /
- pp.49-55
- /
- 2003
The modified neural net pattern recognition equations were attempted to apply to speech recognition. The proposed method has a dynamic process of self-organization that has been proved to be successful in recognizing a depth perception in stereoscopic vision. This study has shown that the process has also been useful in recognizing human speech. In the processing, input vocal signals are first compared with standard models to measure similarities that are then given to a process of self-organization in neural net equations. The competitive and cooperative processes are conducted among neighboring input similarities, so that only one winner neuron is finally detected. In a comparative study, it showed that the proposed neural networks outperformed the conventional HMM speech recognizer under the same conditions.
PDF KSCI

A Query-by-Speech Scheme for Photo Albuming (음성 질의 기반 디지털 사진 검색 기법)

Kim Tae-Sung;Suh Young-Joo;Lee Yong-Ju;Kim Hoi-Rin
- MALSORI
- /
- no.57
- /
- pp.99-112
- /
- 2006
In this paper, we introduce two retrieval methods for photos with speech documents. We compare the pattern of speech query with those of speech documents recorded in digital cameras, and measure the similarities, and retrieve photos corresponding to the speech documents which have high similarity scores. As the first approach, a phoneme recognition scheme is used as the pre-processor for the pattern matching, and in the second one, the vector quantization (VQ) and the dynamic time warping (DTW) are applied to match the speech query with the documents in signal domain itself. Experimental results show that the performance of the first approach is highly dependent on that of phoneme recognition while the processing time is short. The second method provides a great improvement of performance. While the processing time is longer than that of the first method due to DTW, but we can reduce it by taking approximated methods.
PDF

Search Result 411, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)