• Title/Summary/Keyword: 자동 화자 인식

Search Result 48, Processing Time 0.018 seconds

Comparative Study on Cognitive Scheme of Movement Verbs (이동동사의 인지 도식에 관한 비교 연구)

  • 오현금;남기춘
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2002.05a
    • /
    • pp.59-64
    • /
    • 2002
  • 인지심리학 및 인지언어학 분야에서 시도한 어휘 표상, 특히 움직임과 관련된 동사의 인지도식에 관한 연구들을 비교해보고자 한다. 인간의 언어학적인 지식을 도식적으로 표상 하고자 하는 노력은 언어의 통사적인 외형에만 치중하는 연구에서는 언어의 의미구조를 파악하기 힘들다고 판단하고 의미적인 범주화를 중요시하게 되었다. 본 연구에서는 시각적 이미지 도식을 중점적으로 살펴보기로 한다. 이미지 도식은 공간적 위치 관계, 이동, 형상 등에 관한 지각과 결부되어 있다. 이미지로 나타낸 표상은 근본적으로 세상의 인식과 세상에 대한 행동방법을 사용하게 하는 유추적이고 은유적인 원칙에 기초하고 있다. 이러한 점에 있어서, 언술을 발화한 화자는 어느 정도 주관적인 행동의 능력과 그가 인식한 개념화에서부터 문자화시킨 표상을 구성한다. 인지 원칙에 입각한 의미 표상에 중점을 둔 도식으로는, Langacker, Lakoff, Talmy의 도식이 있다. 프랑스에서 톰 R. Thom과 같은 수학자들은 질적인 현상에 관심을 가져 형역학(morphodynamique)이론을 확립하였는데, 이 이론은 요즘의 인지 연구에 수학적 기초를 제공하였다. R. Thom, J. Petitot-Cocorda의 도식 및 구조 의미론의 창시자라고 불리는 B.Pottier의 도식이 여기에 속한다 J.-P. Descles가 제시한 인지연산문법(Grammaire Applicative et Cognitive)은 다른 인지문법과는 달리 정보 자동처리과정에서 사용할 수 있는 연산자와 피연산자의 관계에 기초한 수학적 연산작용을 발전시켰다. 동사의 의미는 의미-인지 도식으로 설명되는데, 이것은 서로 다른 연산자와 피연산자로 구성된 형식화된 표현이다. 인간의 인지 기능은 언어로 표현되며, 언어는 인간의 의사소통, 사고 행위 및 인지학습의 핵심적 기능을 담당한다. 인간의 언어정보처리 메카니즘은 매우 복잡한 과정이기 때문에 언어정보처리와 관련된 언어심리학, 인지언어학, 형식언어학, 신경해부학 및 인공지능학 등의 관련된 분야의 학제적 연구가 필요하다.

  • PDF

Concept-based Translation System in the Korean Spoken Language Translation System (한국어 대화체 음성언어 번역시스템에서의 개념기반 번역시스템)

  • Choi, Un-Cheon;Han, Nam-Yong;Kim, Jae-Hoon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.8
    • /
    • pp.2025-2037
    • /
    • 1997
  • The concept-based translation system, which is a part of the Korean spoken language translation system, translates spoken utterances from Korean speech recognizer into one of English, Japanese and Korean in a travel planning task. Our system regulates semantic rather than the syntactic category in order to process the spontaneous speech which tends to be regarded as the one ungrammatical and subject to recognition errors. Utterances are parsed into concept structures, and the generation module produces the sentence of the specified target language. We have developed a token-separator using base-words and an automobile grammar corrector for Korean processing. We have also developed postprocessors for each target language in order to improve the readability of the generation results.

  • PDF

Gender Classification of Speakers Using SVM

  • Han, Sun-Hee;Cho, Kyu-Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.59-66
    • /
    • 2022
  • This research conducted a study classifying gender of speakers by analyzing feature vectors extracted from the voice data. The study provides convenience in automatically recognizing gender of customers without manual classification process when they request any service via voice such as phone call. Furthermore, it is significant that this study can analyze frequently requested services for each gender after gender classification using a learning model and offer customized recommendation services according to the analysis. Based on the voice data of males and females excluding blank spaces, the study extracts feature vectors from each data using MFCC(Mel Frequency Cepstral Coefficient) and utilizes SVM(Support Vector Machine) models to conduct machine learning. As a result of gender classification of voice data using a learning model, the gender recognition rate was 94%.

Method of Automatically Generating Metadata through Audio Analysis of Video Content (영상 콘텐츠의 오디오 분석을 통한 메타데이터 자동 생성 방법)

  • Sung-Jung Young;Hyo-Gyeong Park;Yeon-Hwi You;Il-Young Moon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.557-561
    • /
    • 2021
  • A meatadata has become an essential element in order to recommend video content to users. However, it is passively generated by video content providers. In the paper, a method for automatically generating metadata was studied in the existing manual metadata input method. In addition to the method of extracting emotion tags in the previous study, a study was conducted on a method for automatically generating metadata for genre and country of production through movie audio. The genre was extracted from the audio spectrogram using the ResNet34 artificial neural network model, a transfer learning model, and the language of the speaker in the movie was detected through speech recognition. Through this, it was possible to confirm the possibility of automatically generating metadata through artificial intelligence.

An Efficient Lipreading Method Based on Lip's Symmetry (입술의 대칭성에 기반한 효율적인 립리딩 방법)

  • Kim, Jin-Bum;Kim, Jin-Young
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.37 no.5
    • /
    • pp.105-114
    • /
    • 2000
  • In this paper, we concentrate on an efficient method to decrease a lot of pixel data to be processed with an Image transform based automatic lipreading It is reported that the image transform based approach, which obtains a compressed representation of the speaker's mouth, results in superior lipreading performance than the lip contour based approach But this approach produces so many feature parameters of the lip that has much data and requires much computation time for recognition To reduce the data to be computed, we propose a simple method folding at the vertical center of the lip-image based on the symmetry of the lip In addition, the principal component analysis(PCA) is used for fast algorithm and HMM word recognition results are reported The proposed method reduces the number of the feature parameters at $22{\sim}47%$ and improves hidden Markov model(HMM)word recognition rates at $2{\sim}3%$, using the folded lip-image compared with the normal method using $16{\times}16$ lip-image.

  • PDF

A Speech Translation System for Hotel Reservation (호텔예약을 위한 음성번역시스템)

  • 구명완;김재인;박상규;김우성;장두성;홍영국;장경애;김응인;강용범
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.24-31
    • /
    • 1996
  • In this paper, we present a speech translation system for hotel reservation, KT_STS(Korea Telecom Speech Translation System). KT-STS is a speech-to-speech translation system which translates a spoken utterance in Korean into one in Japanese. The system has been designed around the task of hotel reservation(dialogues between a Korean customer and a hotel reservation de나 in Japan). It consists of a Korean speech recognition system, a Korean-to-Japanese machine translation system and a korean speech synthesis system. The Korean speech recognition system is an HMM(Hidden Markov model)-based speaker-independent, continuous speech recognizer which can recognize about 300 word vocabularies. Bigram language model is used as a forward language model and dependency grammar is used for a backward language model. For machine translation, we use dependency grammar and direct transfer method. And Korean speech synthesizer uses the demiphones as a synthesis unit and the method of periodic waveform analysis and reallocation. KT-STS runs in nearly real time on the SPARC20 workstation with one TMS320C30 DSP board. We have achieved the word recognition rate of 94. 68% and the sentence recognition rate of 82.42% after the speech recognition tests. On Korean-to-Japanese translation tests, we achieved translation success rate of 100%. We had an international joint experiment in which our system was connected with another system developed by KDD in Japan using the leased line.

  • PDF

Speech Recognition Using Linear Discriminant Analysis and Common Vector Extraction (선형 판별분석과 공통벡터 추출방법을 이용한 음성인식)

  • 남명우;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.35-41
    • /
    • 2001
  • This paper describes Linear Discriminant Analysis and common vector extraction for speech recognition. Voice signal contains psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word spelled out by different speakers can be very different heard. This property of speech signal make it very difficult to extract common properties in the same speech class (word or phoneme). Linear algebra method like BT (Karhunen-Loeve Transformation) is generally used for common properties extraction In the speech signals, but common vector extraction which is suggested by M. Bilginer et at. is used in this paper. The method of M. Bilginer et al. extracts the optimized common vector from the speech signals used for training. And it has 100% recognition accuracy in the trained data which is used for common vector extraction. In spite of these characteristics, the method has some drawback-we cannot use numbers of speech signal for training and the discriminant information among common vectors is not defined. This paper suggests advanced method which can reduce error rate by maximizing the discriminant information among common vectors. And novel method to normalize the size of common vector also added. The result shows improved performance of algorithm and better recognition accuracy of 2% than conventional method.

  • PDF

A Method of Generating Table-of-Contents for Educational Video (교육용 비디오의 ToC 자동 생성 방법)

  • Lee Gwang-Gook;Kang Jung-Won;Kim Jae-Gon;Kim Whoi-Yul
    • Journal of Broadcast Engineering
    • /
    • v.11 no.1 s.30
    • /
    • pp.28-41
    • /
    • 2006
  • Due to the rapid development of multimedia appliances, the increasing amount of multimedia data enforces the development of automatic video analysis techniques. In this paper, a method of ToC generation is proposed for educational video contents. The proposed method consists of two parts: scene segmentation followed by scene annotation. First, video sequence is divided into scenes by the proposed scene segmentation algorithm utilizing the characteristics of educational video. Then each shot in the scene is annotated in terms of scene type, existence of enclosed caption and main speaker of the shot. The ToC generated by the proposed method represents the structure of a video by the hierarchy of scenes and shots and gives description of each scene and shot by extracted features. Hence the generated ToC can help users to perceive the content of a video at a glance and. to access a desired position of a video easily. Also, the generated ToC automatically by the system can be further edited manually for the refinement to effectively reduce the required time achieving more detailed description of the video content. The experimental result showed that the proposed method can generate ToC for educational video with high accuracy.