• Title/Summary/Keyword: Korean Machine Translation Data

Search Result 51, Processing Time 0.03 seconds

Extracting Korean-English Parallel Sentences from Wikipedia (위키피디아로부터 한국어-영어 병렬 문장 추출)

  • Kim, Sung-Hyun;Yang, Seon;Ko, Youngjoong
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.8
    • /
    • pp.580-585
    • /
    • 2014
  • This paper conducts a variety of experiments for "the extraction of Korean parallel sentences using Wikipedia data". We refer to various methods that were previously proposed for other languages. We use two approaches. The first one is to use translation probabilities that are extracted from the existing resources such as Sejong parallel corpus, and the second one is to use dictionaries such as Wiki dictionary consisting of Wikipedia titles and MRDs (machine readable dictionaries). Experimental results show that we obtained a significant improvement in system using Wikipedia data in comparison to one using only the existing resources. We finally achieve an outstanding performance, an F1-score of 57.6%. We additionally conduct experiments using a topic model. Although this experiment shows a relatively lower performance, an F1-score of 51.6%, it is expected to be worthy of further studies.

A study on extraction of aspect and modality information in Korean (한국어의 시상과 양상 정보추출에 관한 연구)

  • 이수현;한광록
    • Korean Journal of Cognitive Science
    • /
    • v.1 no.2
    • /
    • pp.255-257
    • /
    • 1989
  • This paper proposes a method for extracting the imformation of aspect and modality from the predicative part which is consisted of main verbal and auxiiary verbals.Data which are expressed by the compound predicate with many consecutive verbals are collected and analyzed to thirty-six structual forms of the predicative part.Inthe final analysis, an extracting function of conceptual information is derived to find the connoted aspect and modality in each structure.The informations which are obtained by this function decrease the individual ambiguity of an auxiliary verbal and offer a detailed meaning inthe syntactic and semantic analysis of machine translation system or inference machine.

A New Approach to CAD/CAM Systems Data Exchange Using Plug-in Technology

  • Chernopyatov Y.A.;Chung W.j.;Lee C.M.
    • International Journal of Precision Engineering and Manufacturing
    • /
    • v.6 no.4
    • /
    • pp.8-13
    • /
    • 2005
  • Interoperability has been the problem of CAD/CAM systems. Starting from 1980's, national and international organizations have addressed the issue through development and release of standards for the exchange of geometric and nongeometric design data. To CAD/CAM vendors, the task of interpreting and implementing these standards falls into their products. This task is a balancing action between users' needs, available development resources, and the technical specifications of standards. This paper explores an area of CAD/CAM systems development, particularly the implementation of the effective exchange files translators'. A new approach is introduced, which proposes to enclose all the translation operations concerning each exchange format to a separate DLL, thus making a 'plug-in.' Then, this plug-in could be used together with the CAD/CAM system or with specialized translation software. This approach allows to create new translators rapidly and to gain the reliable, high-efficiency, and reusable program code. The second part of the paper concerns the possible problems of translators' development. These difficulties often come from the exchange standards' misunderstanding or ambiguity in standards. All examples come from the authors' practice experiences of dealing with CAD/CAM systems.

Data Avaliability Scheduling for Synthesis Beyond Basic Block Scope

  • Kim, Jongsoo
    • Journal of Electrical Engineering and information Science
    • /
    • v.3 no.1
    • /
    • pp.1-7
    • /
    • 1998
  • High-Level synthesis of digital circuits calls for automatic translation of a behavioral description to a structural design entity represented in terms of components and connection. One of the critical steps in high-level synthesis is to determine a particular scheduling algorithm that will assign behavioral operations to control states. A new scheduling algorithm called Data Availability Scheduling (DAS) for high-level synthesis is presented. It can determine an appropriate scheduling algorithm and minimize the number of states required using data availability and dependency conditions extracted from the behavioral code, taking into account of states required using data availability and dependency conditions extracted from the behavioral code, taking into account resource constraint in each control state. The DAS algorithm is efficient because data availability conditions, and conditional and wait statements break the behavioral code into manageable pieces which are analyzed independently. The output is the number of states in a finite state machine and shows better results than those of previous algorithms.

  • PDF

Translation of Korean Object Case Markers to Mongolian's Suffixes (한국어 목적격조사의 몽골어 격 어미 번역)

  • Setgelkhuu, Khulan;Shin, Joon Choul;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.2
    • /
    • pp.79-88
    • /
    • 2019
  • Machine translation (MT) system, especially Korean-Mongolian MT system, has recently attracted much attention due to its necessary for the globalization generation. Korean and Mongolian have the same sentence structure SOV and the arbitrarily changing of their words order does not change the meaning of sentences due to postpositional particles. The particles that are attached behind words to indicate their grammatical relationship to the clause or make them more specific in meaning. Hence, the particles play an important role in the translation between Korean and Mongolian. However, one Korean particle can be translated into several Mongolian particles. This is a major issue of the Korean-Mongolian MT systems. In this paper, to address this issue, we propose a method to use the combination of UTagger and a Korean-Mongolian particles table. UTagger is a system that can analyze morphologies, tag POS, and disambiguate homographs for Korean texts. The Korean-Mongolian particles table was manually constructed for matching Korean particles with those of Mongolian. The experiment on the test set extracted from the National Institute of Korean Language's Korean-Mongolian Learner's Dictionary shows that our method achieved the accuracy of 88.38% and it improved the result of using only UTagger by 41.48%.

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

A Study on Usage Frequency of Translated English Phrase Using Google Crawling

  • Kim, Kyuseok;Lee, Hyunno;Lim, Jisoo;Lee, Sungmin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.689-692
    • /
    • 2020
  • People have studied English using online English dictionaries when they looked for the meaning of English words or the example sentences. These days, as the AI technologies such as machine learning have been developing, documents can be translated in real time with Kakao, Papago, Google translators and so on. But, there has still been some problems with the accuracy of translation. The AI secretaries can be used for real-time interpreting, so this kind of systems are being used to translate such the web pages, papers into Korean. In this paper, we researched on the usage frequency of the combined English phrases from dictionaries by analyzing the number of the searched results on Google. With the result of this paper, we expect to help the people to use more English fluently.

Rational B-spline Approximation of Point Data For Reverse Engineering (점 데이타의 Rational B-spline 근사를 통한 역공학)

  • Lee, Hyun-Zic;Ko, Tae-Jo;Kim, Hee-Sool
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.16 no.5 s.98
    • /
    • pp.160-168
    • /
    • 1999
  • This paper describes one method of reverse engineering that machines a free form shape without descriptive model. A portable five-axes 3D CMM was used to digitize point data from physical model. After approximation by rational B-spline curve from digitized point data of a geometric shape, a surface was constructed by the skinning method of the cross-sectional design technique. Since a surface patch was segmented by fifteen part, surface merging was also implemented to assure the surface boundary continuity. Finally, composite surface was transferred to commercial CAD/CAM system through IFES translation in order to machine the modeled geometric shape.

  • PDF

An Angle-Binder Drawbead Simulator for Measuring Drawbead Forces on Inclined Binder Surface (경사진 바인더면의 드로우비드력을 측정하기 위한 모의실험장치)

  • Yang, W.H.;Choi, K.Y.
    • Proceedings of the Korean Society for Technology of Plasticity Conference
    • /
    • 2009.05a
    • /
    • pp.180-184
    • /
    • 2009
  • A novel set of experimental test tooling for measuring pulling and holding forces for drawbeads on binders inclined at a wide range of angles is introduced. A mechanical design featuring a single load cell, a male-female draw bead set, translation and rotation degrees of freedom, and a screw-driven clamping system has been incorporated into a standard tensile test machine. On a real time basis, restraining and holding force data with respect to draw-in displacement may be directly downloaded into a PC for data processing. The proposed experimental system represents a significant breakthrough in drawbead simulation technology due to its relatively low cost, clever design, and versatility. The system is shown to yield excellent experimental data suitable for verifying theory and numerical model predictions.

  • PDF

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

  • Kim Sung-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.385-395
    • /
    • 2005
  • Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.