• Title/Summary/Keyword: translation Partial matching

Search Result 4, Processing Time 0.015 seconds

Automatic Recognition of Translation Phrases Enclosed with Parenthesis in Korean-English Mixed Documents (한영 혼용문에서 괄호 안 대역어구의 자동 인식)

  • Lee, Jae-Sung;Seo, Young-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.445-452
    • /
    • 2002
  • In Korean-English mixed documents, translated technical words are usually used with the attached full words or original words enclosed with parenthesis. In this paper, a collective method is presented to recognize and extract the translation phrases with using a base translation dictionary. In order to process the unregistered title words and translation words in the dictionary, a phonetic similarity matching method, a translation partial matching method, and a compound word matching method are newly proposed. The experiment result of each method was measured in F-measure(the alpha is set to 0.4) ; exact matching of dictionary terms as a baseline method showed 23.8%, the hybrid method of translation partial matching and phonetic similarity matching 75.9%, and the compound word matching method including the hybrid method 77.3%, which is 3.25 times better than the baseline method.

Automatic partial shape recognition system using adaptive resonance theory (적응공명이론에 의한 자동 부분형상 인식시스템)

  • 박영태;양진성
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.3
    • /
    • pp.79-87
    • /
    • 1996
  • A new method for recognizing and locating partially occluded or overlapped two-dimensional objects regardless of their size, translation, and rotation, is presented. Dominant points approximating occuluding contoures of objects are generated by finding local maxima of smoothed k-cosine function, and then used to guide the contour segment matching procedure. Primitives between the dominant points are produced by projecting the local contours onto the line between the dominant points. Robust classification of primitives. Which is crucial for reliable partial shape matching, is performed using adaptive resonance theory (ART2). The matched primitives having similar scale factors and rotation angles are detected in the hough space to identify the presence of the given model in the object scene. Finally the translation vector is estimated by minimizing the mean squred error of the matched contur segment pairs. This model-based matching algorithm may be used in diveerse factory automation applications since models can be added or changed simply by training ART2 adaptively without modifying the matching algorithm.

  • PDF

Context-Weighted Metrics for Example Matching (문맥가중치가 반영된 문장 유사 척도)

  • Kim, Dong-Joo;Kim, Han-Woo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.6 s.312
    • /
    • pp.43-51
    • /
    • 2006
  • This paper proposes a metrics for example matching under the example-based machine translation for English-Korean machine translation. Our metrics served as similarity measure is based on edit-distance algorithm, and it is employed to retrieve the most similar example sentences to a given query. Basically it makes use of simple information such as lemma and part-of-speech information of typographically mismatched words. Edit-distance algorithm cannot fully reflect the context of matched word units. In other words, only if matched word units are ordered, it is considered that the contribution of full matching context to similarity is identical to that of partial matching context for the sequence of words in which mismatching word units are intervened. To overcome this drawback, we propose the context-weighting scheme that uses the contiguity information of matched word units to catch the full context. To change the edit-distance metrics representing dissimilarity to similarity metrics, to apply this context-weighted metrics to the example matching problem and also to rank by similarity, we normalize it. In addition, we generalize previous methods using some linguistic information to one representative system. In order to verify the correctness of the proposed context-weighted metrics, we carry out the experiment to compare it with generalized previous methods.

The Development of an Automatic Indexing System based on a Thesaurus (시소러스를 기반으로 하는 자동색인 시스템에 관한 연구)

  • 임형묵;정상철
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.213-242
    • /
    • 1993
  • During the past decades,several automatic indexing systems have been developed such as single term indexing.phrase indexing and thesaurus basedidndexing systems.Among these systems,single term indexing has been known as superior to others despte its simpicity of extracting meaningful terms.On the other hand,thesaurus based one has been conceived as producing low retrival rate ,mainly because thesauri do not usually have enough index terms.so that much of text data fail to be indexed if they do not match with any of index terms in thesauri.This paper develops a thesaurus based indexing system THINS that yields higher retrieval rate than other systems.by doing syntactic analysis of text data and matching them with index terms in thesauri partially.First,the system analyzes the input text syntactically by using the machine translation suystem MATES/EK and extracts noun phrases.After deleting stop words from noun phrases and stemming the remaining ones.it tries to index these with similar index terms in the thesaurus as much as possible. We conduct an experiment with CACM data set that measures the retrieval effectiveness with CACM data set that measures the retrieval effectuvenss of THINS with single term based one under HYKIS-a thesaurus based information retrieval system.It turns out that THINS yields about 10 percent higher precision than single term based one.while shows 8to9 percent lower recall.This retrieval rate shows that THINS improves much better than privious ones that only yields 25 or 30 percent lower precision than single term based one.We also argue that the relatively lower recall is cause by that CRCS-the thesaurus included in CACM datea set is very incomplete one,having only more than one thousand terms,thus THINS is expected to produce much higher rate if it is associated with currently available large thesaurus.