• Title/Summary/Keyword: translation probability

Search Result 28, Processing Time 0.029 seconds

Discriminative Models for Automatic Acquisition of Translation Equivalences

  • Zhang, Chun-Xiang;Li, Sheng;Zhao, Tie-Jun
    • International Journal of Control, Automation, and Systems
    • /
    • v.5 no.1
    • /
    • pp.99-103
    • /
    • 2007
  • Translation equivalence is very important for bilingual lexicography, machine translation system and cross-lingual information retrieval. Extraction of equivalences from bilingual sentence pairs belongs to data mining problem. In this paper, discriminative learning methods are employed to filter translation equivalences. Discriminative features including translation literality, phrase alignment probability, and phrase length ratio are used to evaluate equivalences. 1000 equivalences randomly selected are filtered and then evaluated. Experimental results indicate that its precision is 87.8% and recall is 89.8% for support vector machine.

Building a Korean-English Parallel Corpus by Measuring Sentence Similarities Using Sequential Matching of Language Resources and Topic Modeling (언어 자원과 토픽 모델의 순차 매칭을 이용한 유사 문장 계산 기반의 위키피디아 한국어-영어 병렬 말뭉치 구축)

  • Cheon, JuRyong;Ko, YoungJoong
    • Journal of KIISE
    • /
    • v.42 no.7
    • /
    • pp.901-909
    • /
    • 2015
  • In this paper, to build a parallel corpus between Korean and English in Wikipedia. We proposed a method to find similar sentences based on language resources and topic modeling. We first applied language resources(Wiki-dictionary, numbers, and online dictionary in Daum) to match word sequentially. We construct the Wiki-dictionary using titles in Wikipedia. In order to take advantages of the Wikipedia, we used translation probability in the Wiki-dictionary for word matching. In addition, we improved the accuracy of sentence similarity measuring method by using word distribution based on topic modeling. In the experiment, a previous study showed 48.4% of F1-score with only language resources based on linear combination and 51.6% with the topic modeling considering entire word distributions additionally. However, our proposed methods with sequential matching added translation probability to language resources and achieved 9.9% (58.3%) better result than the previous study. When using the proposed sequential matching method of language resources and topic modeling after considering important word distributions, the proposed system achieved 7.5%(59.1%) better than the previous study.

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

  • Kim Sung-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.385-395
    • /
    • 2005
  • Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.

An Use of the Patterns for an Efficient Example-Based Machine Translation (효율적인 예제 기반 기계번역을 위한 패턴의 사용)

  • Lee, Gi-Yeong;Kim, Han-U
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.3
    • /
    • pp.1-11
    • /
    • 2000
  • An example-based machine translation approach is a new paradigm for resolving various problems caused by the rules of conventional rule-based machine translation. But, in pure example-based machine translation, it is very hard to find similar examples matched with input sentences by using reasonable parallel corpus. This problem causes large overheads in the process of sentence generation. This paper proposes new method of English-Korean transfer using both patterns and examples. The patterns are composed of sentence patterns and phrase patterns. Meta parts of the patterns make the example-based machine translation more practical by raising the probability to find similar examples. The use of patterns and examples can reduce the ambiguities in source language analysis and give us a high quality of MT. And experimental results with a test corpus are discussed.

  • PDF

A Model of English Part-Of-Speech Determination for English-Korean Machine Translation (영한 기계번역에서의 영어 품사결정 모델)

  • Kim, Sung-Dong;Park, Sung-Hoon
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.53-65
    • /
    • 2009
  • The part-of-speech determination is necessary for resolving the part-of-speech ambiguity in English-Korean machine translation. The part-of-speech ambiguity causes high parsing complexity and makes the accurate translation difficult. In order to solve the problem, the resolution of the part-of-speech ambiguity must be performed after the lexical analysis and before the parsing. This paper proposes the CatAmRes model, which resolves the part-of-speech ambiguity, and compares the performance with that of other part-of-speech tagging methods. CatAmRes model determines the part-of-speech using the probability distribution from Bayesian network training and the statistical information, which are based on the Penn Treebank corpus. The proposed CatAmRes model consists of Calculator and POSDeterminer. Calculator calculates the degree of appropriateness of the partof-speech, and POSDeterminer determines the part-of-speech of the word based on the calculated values. In the experiment, we measure the performance using sentences from WSJ, Brown, IBM corpus.

  • PDF

Syntactic Category Prediction for Improving Parsing Accuracy in English-Korean Machine Translation (영한 기계번역에서 구문 분석 정확성 향상을 위한 구문 범주 예측)

  • Kim Sung-Dong
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.345-352
    • /
    • 2006
  • The practical English-Korean machine translation system should be able to translate long sentences quickly and accurately. The intra-sentence segmentation method has been proposed and contributed to speeding up the syntactic analysis. This paper proposes the syntactic category prediction method using decision trees for getting accurate parsing results. In parsing with segmentation, the segment is separately parsed and combined to generate the sentence structure. The syntactic category prediction would facilitate to select more accurate analysis structures after the partial parsing. Thus, we could improve the parsing accuracy by the prediction. We construct features for predicting syntactic categories from the parsed corpus of Wall Street Journal and generate decision trees. In the experiments, we show the performance comparisons with the predictions by human-built rules, trigram probability and neural networks. Also, we present how much the category prediction would contribute to improving the translation quality.

Characterization of some classes of distributions related to operator semi-stable distributions

  • Joo, Sang-Yeol;Choi, Gyeong-Suk
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.11a
    • /
    • pp.221-225
    • /
    • 2002
  • For a positive integer m, operator m-semi-stability and the strict operator m-semi-stability of probability measures on $R^{d}$ are defined. The operator m-semi-stability is a generalization of the definition of operator semi- stability with exponent Q. Translation of strictly operator m-semi-stable distribution is discussed.

  • PDF

CHARACTERIZATION OF STRICTLY OPERATOR SEMI-STABLE DISTRIBUTIONS

  • Choi, Gyeong-Suk
    • Journal of the Korean Mathematical Society
    • /
    • v.38 no.1
    • /
    • pp.101-123
    • /
    • 2001
  • For a linear operator Q from R(sup)d into R(sup)d and 0$\alpha$ and parameter b on the other. characterization of strictly (Q,b)-semi-stable distributions among (Q,b)-semi-stable distributions is made. Existence of (Q,b)-semi-stable distributions which are not translation of strictly (Q,b)-semi-stable distribution is discussed.

  • PDF

Characterization of Some Classes of Distributions Related to Operator Semi-stable Distributions

  • Joo, Sang Yeol;Yoo, Young Ho;Choi, Gyeong Suk
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.177-189
    • /
    • 2003
  • For a positive integer m, operator m-semi-stability and the strict operator m-semi-stability of probability measures on R^d$ are defined. The operator m-semi-stability is a generalization of the definition of operator semi-stability with exponent Q. Characterization of strictly operator na-semi-stable distributions among operator m-semi-stable distributions is given. Translation of strictly operator m-semi-stable distribution is discussed.

Influence of non-Gaussian characteristics of wind load on fatigue damage of wind turbine

  • Zhu, Ying;Shuang, Miao
    • Wind and Structures
    • /
    • v.31 no.3
    • /
    • pp.217-227
    • /
    • 2020
  • Based on translation models, both Gaussian and non-Gaussian wind fields are generated using spectral representation method for investigating the influence of non-Gaussian characteristics and directivity effect of wind load on fatigue damage of wind turbine. Using the blade aerodynamic model and multi-body dynamics, dynamic responses are calculated. Using linear damage accumulation theory and linear crack propagation theory, crack initiation life and crack propagation life are discussed with consideration of the joint probability density distribution of the wind direction and mean wind speed in detail. The result shows that non-Gaussian characteristics of wind load have less influence on fatigue life of wind turbine in the area with smaller annual mean wind speeds. Whereas, the influence becomes significant with the increase of the annual mean wind speed. When the annual mean wind speeds are 7 m/s and 9 m/s at hub height of 90 m, the crack initiation lives under softening non-Gaussian wind decrease by 10% compared with Gaussian wind fields or at higher hub height. The study indicates that the consideration of the influence of softening non-Gaussian characteristics of wind inflows can significantly decrease the fatigue life, and, if neglected, it can result in non-conservative fatigue life estimates for the areas with higher annual mean wind speeds.