• Title/Summary/Keyword: 어휘학습시스템

Search Result 109, Processing Time 0.035 seconds

Improve Performance of Phrase-based Statistical Machine Translation through Standardizing Korean Allomorph (한국어의 이형태 표준화를 통한 구 기반 통계적 기계 번역 성능 향상)

  • Lee, Won-Kee;Kim, Young-Gil;Lee, Eui-Hyun;Kwon, Hong-Seok;Jo, Seung-U;Cho, Hyung-Mi;Lee, Jong-Hyeok
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.285-290
    • /
    • 2016
  • 한국어는 형태론적으로 굴절어에 속하는 언어로서, 어휘의 형태가 문장 속에서 문법적인 기능을 하게 되고, 형태론적으로 풍부한 언어라는 특징 때문에 조사나 어미와 같은 기능어들이 다양하게 내용어들과 결합한다. 이와 같은 특징들은 한국어를 대상으로 하는 구 기반 통계적 기계번역 시스템에서 데이터 부족문제(Data Sparseness problem)를 더욱 크게 부각시킨다. 하지만, 한국어의 몇몇 조사와 어미는 함께 결합되는 내용어에 따라 의미는 같지만 두 가지의 형태를 가지는 이형태로 존재한다. 따라서 본 논문에서 이러한 이형태들을 하나로 표준화하여 데이터부족 문제를 완화하고, 베트남-한국어 통계적 기계 번역에서 성능이 개선됨을 보였다.

  • PDF

A Extraction of Definitional Answer Sentence for a Definitional Question-Answering System (정의형 질의응답시스템을 위한 정의형 정답 문장 추출)

  • Ko, Byeong Il;Kang, Yu Hwan;Shin, Seung Eun;S, Young Hoon
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2004.11a
    • /
    • pp.470-475
    • /
    • 2004
  • In this paper, we propose a method to extract a definitional answer sentence for a Definitional Question-Answering System. definitional answer sentence patterns are manually constructed with restriction rules to patterns, and a ranking information of the pattern using its frequency from the corpus. answer sentence pattern consists of the syntactic structure of a definitional answer sentence, and clue words. this system show 83% accuracy for untrained corpus.

  • PDF

Improve Performance of Phrase-based Statistical Machine Translation through Standardizing Korean Allomorph (한국어의 이형태 표준화를 통한 구 기반 통계적 기계 번역 성능 향상)

  • Lee, Won-Kee;Kim, Young-Gil;Lee, Eui-Hyun;Kwon, Hong-Seok;Jo, Seung-U;Cho, Hyung-Mi;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.285-290
    • /
    • 2016
  • 한국어는 형태론적으로 굴절어에 속하는 언어로서, 어휘의 형태가 문장 속에서 문법적인 기능을 하게 되고, 형태론적으로 풍부한 언어라는 특징 때문에 조사나 어미와 같은 기능어들이 다양하게 내용어들과 결합한다. 이와 같은 특징들은 한국어를 대상으로 하는 구 기반 통계적 기계번역 시스템에서 데이터 부족 문제(Data Sparseness problem)를 더욱 크게 부각시킨다. 하지만, 한국어의 몇몇 조사와 어미는 함께 결합되는 내용어에 따라 의미는 같지만 두 가지의 형태를 가지는 이형태로 존재한다. 따라서 본 논문에서 이러한 이형태들을 하나로 표준화하여 데이터부족 문제를 완화하고, 베트남-한국어 통계적 기계 번역에서 성능이 개선됨을 보였다.

  • PDF

A Study on the Artificial Neural Networks for the Sentence-level Prosody Generation (문장단위 운율발생용 인공신경망에 관한 연구)

  • 신동엽;민경중;강찬구;임운천
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.53-56
    • /
    • 2000
  • 무제한 어휘 음성합성 시스템의 문-음성 합성기는 합성음의 자연감을 높이기 위해 여러 가지 방법을 사용하게되는데 그중 하나가 자연음에 내재하는 운을 법칙을 정확히 구현하는 것이다. 합성에 필요한 운율법칙은 언어학적 정보를 이용해 구현하거나, 자연음을 분석해 구한 운을 정보로부터 운율 법칙을 추출하여 합성에 이용하고 있다. 이와 같이 구한 운을 법칙이 자연음에 존재하는 운율 법칙을 전부 반영하지 못했거나, 잘못 구현되는 경우에는 합성음의 자연성이 떨어지게 된다. 이런 점을 고려하여 우리는 자연음의 운율 정보를 이용해 인공 신경망을 훈련시켜, 문장단위 운율을 발생시킬 수 있는 방식을 제안하였다. 운율의 세 가지 요소는 피치, 지속시간, 크기 변화가 있는데, 인공 신경망은 문장이 입력되면, 각 해당 음소의 지속시간에 따른 피치 변화와 크기 변화를 학습할 수 있도록 설계하였다. 신경망을 훈련시키기 위해 고립 단어 군과 음소균형 문장 군을 화자로 하여금 발성하게 하여, 녹음하고, 분석하여 구한 운을 정보를 데이터베이스로 구축하였다. 문장 내의 각 음소에 대해 지속시간과 피치 변화 그리고 크기 변화를 구하고, 곡선적응 방법을 이용하여 각 변화 곡선에 대한 다항식 계수와 초기치를 구해 운을 데이터베이스를 구축한다. 이 운을 데이터베이스의 일부를 인공 신경망을 훈련시키는데 이용하고, 나머지를 이용해 인공 신경망의 성능을 평가한 결과 운을 데이터베이스를 계속 확장하면 좀더 자연스러운 운율을 발생시킬 수 있음을 관찰하였다.

  • PDF

In Out-of Vocabulary Rejection Algorithm by Measure of Normalized improvement using Optimization of Gaussian Model Confidence (미등록어 거절 알고리즘에서 가우시안 모델 최적화를 이용한 신뢰도 정규화 향상)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.12
    • /
    • pp.125-132
    • /
    • 2010
  • In vocabulary recognition has unseen tri-phone appeared when recognition training. This system has not been created beginning estimation figure of model parameter. It's bad points could not be created that model for phoneme data. Therefore it's could not be secured accuracy of Gaussian model. To improve suggested Gaussian model to optimized method of model parameter using probability distribution. To improved of confidence that Gaussian model to optimized of probability distribution to offer by accuracy and to support searching of phoneme data. This paper suggested system performance comparison as a result of recognition improve represent 1.7% by out-of vocabulary rejection algorithm using normalization confidence.

Spam Filter by Using X2 Statistics and Support Vector Machines (카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터)

  • Lee, Song-Wook
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.249-254
    • /
    • 2010
  • We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.

A Design and Implementation of Music & Image Retrieval Recommendation System based on Emotion (감성기반 음악.이미지 검색 추천 시스템 설계 및 구현)

  • Kim, Tae-Yeun;Song, Byoung-Ho;Bae, Sang-Hyun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.1
    • /
    • pp.73-79
    • /
    • 2010
  • Emotion intelligence computing is able to processing of human emotion through it's studying and adaptation. Also, Be able more efficient to interaction of human and computer. As sight and hearing, music & image is constitute of short time and continue for long. Cause to success marketing, understand-translate of humanity emotion. In this paper, Be design of check system that matched music and image by user emotion keyword(irritability, gloom, calmness, joy). Suggested system is definition by 4 stage situations. Then, Using music & image and emotion ontology to retrieval normalized music & image. Also, A sampling of image peculiarity information and similarity measurement is able to get wanted result. At the same time, Matched on one space through pared correspondence analysis and factor analysis for classify image emotion recognition information. Experimentation findings, Suggest system was show 82.4% matching rate about 4 stage emotion condition.

A Document Sentiment Classification System Based on the Feature Weighting Method Improved by Measuring Sentence Sentiment Intensity (문장 감정 강도를 반영한 개선된 자질 가중치 기법 기반의 문서 감정 분류 시스템)

  • Hwang, Jae-Won;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.6
    • /
    • pp.491-497
    • /
    • 2009
  • This paper proposes a new feature weighting method for document sentiment classification. The proposed method considers the difference of sentiment intensities among sentences in a document. Sentiment features consist of sentiment vocabulary words and the sentiment intensity scores of them are estimated by the chi-square statistics. Sentiment intensity of each sentence can be measured by using the obtained chi-square statistics value of each sentiment feature. The calculated intensity values of each sentence are finally applied to the TF-IDF weighting method for whole features in the document. In this paper, we evaluate the proposed method using support vector machine. Our experimental results show that the proposed method performs about 2.0% better than the baseline which doesn't consider the sentiment intensity of a sentence.

An Automatic Post-processing Method for Speech Recognition using CRFs and TBL (CRFs와 TBL을 이용한 자동화된 음성인식 후처리 방법)

  • Seon, Choong-Nyoung;Jeong, Hyoung-Il;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.9
    • /
    • pp.706-711
    • /
    • 2010
  • In the applications of a human speech interface, reducing the error rate in recognition is the one of the main research issues. Many previous studies attempted to correct errors using post-processing, which is dependent on a manually constructed corpus and correction patterns. We propose an automatically learnable post-processing method that is independent of the characteristics of both the domain and the speech recognizer. We divide the entire post-processing task into two steps: error detection and error correction. We consider the error detection step as a classification problem for which we apply the conditional random fields (CRFs) classifier. Furthermore, we apply transformation-based learning (TBL) to the error correction step. Our experimental results indicate that the proposed method corrects a speech recognizer's insertion, deletion, and substitution errors by 25.85%, 3.57%, and 7.42%, respectively.

A Study on the Direction Future of Cataloging Education (차세대 목록 교육의 방향성에 관한 연구)

  • Cho, Jane
    • Journal of Korean Library and Information Science Society
    • /
    • v.41 no.2
    • /
    • pp.127-145
    • /
    • 2010
  • Outsourcing, importing of publishing metadata and revitalizing copy cataloging have reduced importance of traditional cataloging. Request of interoperability between other communities and business integration of related system also have changed the meaning of library catalog. Furthermore, newly declared principle and rules are totally different to existing AACR, MARC, measures to cataloging education for next generation seems to be urgently needed. In this study, firstly put together a series of discussion about future cataloging and new role of cataloging librarian, and secondly basis on it, suggest direction of cataloging education course which divided two sectors. One is for students who are undergraduated, and another is for current cataloger at working level. In basic training, it should contain principle of knowledge organization and diverse resources and its relationship, encoding scheme and its practice. The other hand, in re-education training, it should include that re-recognition about new concept of bibliographic world, changing vocabulary and encoding scheme, furthermore metadata scheme about diverse resources which library have accepted, and its integration.

  • PDF