• Title/Summary/Keyword: 번역 확률

Search Result 33, Processing Time 0.022 seconds

Performance Improvement by Cluster Analysis in Korean-English and Japanese-English Cross-Language Information Retrieval (한국어-영어/일본어-영어 교차언어정보검색에서 클러스터 분석을 통한 성능 향상)

  • Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.2
    • /
    • pp.233-240
    • /
    • 2004
  • This paper presents a method to implicitly resolve ambiguities using dynamic incremental clustering in Korean-to-English and Japanese-to-English cross-language information retrieval (CLIR). The main objective of this paper shows that document clusters can effectively resolve the ambiguities tremendously increased in translated queries as well as take into account the context of all the terms in a document. In the framework we propose, a query in Korean/Japanese is first translated into English by looking up bilingual dictionaries, then documents are retrieved for the translated query terms based on the vector space retrieval model or the probabilistic retrieval model. For the top-ranked retrieved documents, query-oriented document clusters are incrementally created and the weight of each retrieved document is re-calculated by using the clusters. In the experiment based on TREC test collection, our method achieved 39.41% and 36.79% improvement for translated queries without ambiguity resolution in Korean-to-English CLIR, and 17.89% and 30.46% improvements in Japanese-to-English CLIR, on the vector space retrieval and on the probabilistic retrieval, respectively. Our method achieved 12.30% improvements for all translation queries, compared with blind feedback in Korean-to-English CLIR. These results indicate that cluster analysis help to resolve ambiguity.

A Model of English Part-Of-Speech Determination for English-Korean Machine Translation (영한 기계번역에서의 영어 품사결정 모델)

  • Kim, Sung-Dong;Park, Sung-Hoon
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.53-65
    • /
    • 2009
  • The part-of-speech determination is necessary for resolving the part-of-speech ambiguity in English-Korean machine translation. The part-of-speech ambiguity causes high parsing complexity and makes the accurate translation difficult. In order to solve the problem, the resolution of the part-of-speech ambiguity must be performed after the lexical analysis and before the parsing. This paper proposes the CatAmRes model, which resolves the part-of-speech ambiguity, and compares the performance with that of other part-of-speech tagging methods. CatAmRes model determines the part-of-speech using the probability distribution from Bayesian network training and the statistical information, which are based on the Penn Treebank corpus. The proposed CatAmRes model consists of Calculator and POSDeterminer. Calculator calculates the degree of appropriateness of the partof-speech, and POSDeterminer determines the part-of-speech of the word based on the calculated values. In the experiment, we measure the performance using sentences from WSJ, Brown, IBM corpus.

  • PDF

DES의 선형 해독법에 관한 해설(III)

  • 김광조
    • Review of KIISC
    • /
    • v.4 no.1
    • /
    • pp.30-43
    • /
    • 1994
  • 본 해설은 1993년 일본의 미쓰비스 전기(주)의 마쯔이가 세계 최초로 발표한 DES의 선형해독법을 1994년 1월 27일부터 1월 30일까지 개최된 SCIS'94에서 기 발표한 선형 해독법의 고속화 방법을 제안한 논문을 번역하여 소개한다. 이 방법으로 2$^{43}$ 개의 랜덤과 평문과 암호문이 주어진다면 DES의 모르는 키 56비트를 80%의 성공확률로 구할 수 있다. 또한, 이론을 실증하기 위해 50일간 12대의 W/S(PA-RISC, 99MHz, 125MIPA)을 이용하여 수행한 결과 EDS의 기지 평문 공격에 성공하였다고 한다.

  • PDF

The Study on the Model of Extracting Collocations from Corpus in Korean Using the Statistical Tools (통계 기법을 이용한 연어 추출 모형 연구)

  • Ahn, Sung-Min
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.162-165
    • /
    • 2010
  • 공기하여 나타나는 구 정보 중에서 언어에 대한 연구는 응용 언어학에 발전에 기여할 수 있는 부분이 크다. 연어란 어휘들 간의 제한된 결합 관계를 갖는 공기 확률이 높은 구 구성이다. 이러한 연어 구성에 대한 연구는 특히 기계 번역이나 사전 편찬 등의 분야에서 관심이 높아지고 있다. 본 연구에서는 언어를 추출하기 위해 T-test와 상호 정보, 조건 확률 등의 여러 통계 기법의 사용을 제시한다. 각 기법을 적용하였을 때 연어 추출에 어떠한 변화를 보이는지 조사하였고, 가장 적절한 기법의 적용도 모색함으로써 향후 언어 추출의 방향을 제시하고자 한다.

  • PDF

Enhancing Performance of Bilingual Lexicon Extraction through Refinement of Pivot-Context Vectors (중간언어 문맥벡터의 정제를 통한 이중언어 사전 구축의 성능개선)

  • Kwon, Hong-Seok;Seo, Hyung-Won;Kim, Jae-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.7
    • /
    • pp.492-500
    • /
    • 2014
  • This paper presents the performance enhancement of automatic bilingual lexicon extraction by using refinement of pivot-context vectors under the standard pivot-based approach, which is very effective method for less-resource language pairs. In this paper, we gradually improve the performance through two different refinements of pivot-context vectors: One is to filter out unhelpful elements of the pivot-context vectors and to revise the values of the vectors through bidirectional translation probabilities estimated by Anymalign and another one is to remove non-noun elements from the original vectors. In this paper, experiments have been conducted on two different language pairs that are bi-directional Korean-Spanish and Korean-French, respectively. The experimental results have demonstrated that our method for high-frequency words shows at least 48.5% at the top 1 and up to 88.5% at the top 20 and for the low-frequency words at least 43.3% at the top 1 and up to 48.9% at the top 20.

A Case Study of Creativity Development Using Simpson's Paradox for Mathematically Gifted Students (Simpson의 패러독스를 활용한 영재교육에서 창의성 발현 사례 분석)

  • Lee, Jung-Yeon;Lee, Kyeong-Hwa
    • Journal of Educational Research in Mathematics
    • /
    • v.20 no.3
    • /
    • pp.203-219
    • /
    • 2010
  • Several studies have reported on how and what mathematically gifted students develop superior ability or creativity in geometry and algebra. However, there are lack of studies in probability area, though there are a few trials of probability education for mathematically gifted students. Moreover, less attention has paid to the strategies to develop gifted students' creativity. This study has drawn three teaching strategies for creativity development based on literature review embedding: cognitive conflict, multiple representations, and social interaction. We designed a series of tasks via reconstructing, so called Simpson's paradox to meet these strategies. The findings showed that the gifted students made Quite a bit of improvement in creativity while participating in reflective thinking and active discussion, doing internal and external connection, translating representations, and investigating basic assumption.

  • PDF

Domain Adaptation Method for LHMM-based English Part-of-Speech Tagger (LHMM기반 영어 형태소 품사 태거의 도메인 적응 방법)

  • Kwon, Oh-Woog;Kim, Young-Gil
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.10
    • /
    • pp.1000-1004
    • /
    • 2010
  • A large number of current language processing systems use a part-of-speech tagger for preprocessing. Most language processing systems required a tagger with the highest possible accuracy. Specially, the use of domain-specific advantages has become a hot issue in machine translation community to improve the translation quality. This paper addresses a method for customizing an HMM or LHMM based English tagger from general domain to specific domain. The proposed method is to semi-automatically customize the output and transition probabilities of HMM or LHMM using domain-specific raw corpus. Through the experiments customizing to Patent domain, our LHMM tagger adapted by the proposed method shows the word tagging accuracy of 98.87% and the sentence tagging accuracy of 78.5%. Also, compared with the general tagger, our tagger improved the word tagging accuracy of 2.24% (ERR: 66.4%) and the sentence tagging accuracy of 41.0% (ERR: 65.6%).

Prediction of KBO playoff Using the Deep Neural Network (DNN을 활용한 'KBO' 플레이오프진출 팀 예측)

  • Ju-Hyeok Park;Yang-Jae Lee;Hee-Chang Han;Yoo-Lim Jun;Yoo-Jin Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.315-316
    • /
    • 2023
  • 본 논문에서는 딥러닝을 활용하여 KBO (Korea Baseball Organization)의 다음 시즌 플레이오프 진출 확률을 예측하는 Deep Neural Network (DNN) 시스템을 설계하고 구현하는 방법을 제안한다. 연구 방법으로 KBO 각 시즌별 데이터를 1999년도 데이터부터 수집하여 분석한 결과, 각 시즌 데이터 중 경기당 평균 득점, 타자 OPS, 투수 WHIP 등이 시즌 결과에 유의미한 영향을 끼치는 것을 확인하였다. 모델 설계는 linear, softmax 함수를 사용하는 것보다 relu, tanh, sigmoid 함수를 사용했을 때 더 높은 정확도를 얻을 수 있었다. 실제 2022 시즌 결과를 예측한 결과 88%의 정확도를 도출했다. 폭투의 수, 피홈런 등 가중치가 높은 변수의 값이 우수할 경우 시즌 결과가 좋게 나온다는 것이 증명되었다. 본 논문에서 설계한 이 시스템은 KBO 구단만이 아닌 모든 야구단에서 선수단을 구성하는데 활용 가능하다고 사료된다.

  • PDF

Automatic Recognition of Translation Phrases Enclosed with Parenthesis in Korean-English Mixed Documents (한영 혼용문에서 괄호 안 대역어구의 자동 인식)

  • Lee, Jae-Sung;Seo, Young-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.445-452
    • /
    • 2002
  • In Korean-English mixed documents, translated technical words are usually used with the attached full words or original words enclosed with parenthesis. In this paper, a collective method is presented to recognize and extract the translation phrases with using a base translation dictionary. In order to process the unregistered title words and translation words in the dictionary, a phonetic similarity matching method, a translation partial matching method, and a compound word matching method are newly proposed. The experiment result of each method was measured in F-measure(the alpha is set to 0.4) ; exact matching of dictionary terms as a baseline method showed 23.8%, the hybrid method of translation partial matching and phonetic similarity matching 75.9%, and the compound word matching method including the hybrid method 77.3%, which is 3.25 times better than the baseline method.

Automatic Construction of Foreign Word Transliteration Dictionary from English-Korean Parallel Corpus (영-한 병렬 코퍼스로부터 외래어 표기 사전의 자동 구축)

  • Lee, Jae Sung
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.2
    • /
    • pp.9-21
    • /
    • 2003
  • This paper proposes an automatic construction system for transliteration dictionary from English-Korean parallel corpus. The system works in 3 steps: it extracts all nouns from Korean documents as the first step, filters transliterated foreign word nouns out of them with the language identification method as the second step, and extracts the corresponding English words by using a probabilistic alignment method as the final step. Specially, the fact that there is a corresponding English word in most cases, is utilized to extract the purely transliterated part from a Koreans word phrase, which is usually used in combined forms with Korean endings(Eomi) or particles(Josa). Moreover, the direct phonetic comparison is done to the words in two different alphabet systems without converting them to the same alphabet system. The experiment showed that the performance was influenced by the first and the second preprocessing steps; the most efficient model among manually preprocessed ones showed 85.4% recall, 91.0% precision and the most efficient model among fully automated ones got 68.3% recall, 89.2% precision.

  • PDF