• Title/Summary/Keyword: Translation-Based Language Model

Search Result 50, Processing Time 0.023 seconds

Neural Machine translation specialized for Coronavirus Disease-19(COVID-19) (Coronavirus Disease-19(COVID-19)에 특화된 인공신경망 기계번역기)

  • Park, Chan-Jun;Kim, Kyeong-Hee;Park, Ki-Nam;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.7-13
    • /
    • 2020
  • With the recent World Health Organization (WHO) Declaration of Pandemic for Coronavirus Disease-19 (COVID-19), COVID-19 is a global concern and many deaths continue. To overcome this, there is an increasing need for sharing information between countries and countermeasures related to COVID-19. However, due to linguistic boundaries, smooth exchange and sharing of information has not been achieved. In this paper, we propose a Neural Machine Translation (NMT) model specialized for the COVID-19 domain. Centering on English, a Transformer based bidirectional model was produced for French, Spanish, German, Italian, Russian, and Chinese. Based on the BLEU score, the experimental results showed significant high performance in all language pairs compared to the commercialization system.

A Query Processing Model based on the XML View in Relational Databases (관계형 데이터베이스에서 XML 뷰 기반의 질의 처리 모델)

  • Jung, Chai-Young;Choi, Kyu-Won;Kim, Young-Ok;Kim, Young-Kyun;Kang, Hyun-Syug;Bae, Jong-Min
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.221-232
    • /
    • 2003
  • This paper addresses the query processing component of a wrapper system for a relational database model based on the XML view in integrating databases. The schema of a relational database is represented as XML Schema that is proposed by W3C. Users submit a query using the XML query language XQuery over the XML Schema. The wrapper system to be developed supports an user-defined XML view. XQuery is also used as the view definition language. In this environment, this paper suggests a new XML query processing model. We propose the composition algorithm of an XML view with an user query, the translation algorithm of XQuery into SQL, and the XML template construction algorithm for generating XML documents.

The Use of MSVM and HMM for Sentence Alignment

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.301-314
    • /
    • 2012
  • In this paper, two new approaches to align English-Arabic sentences in bilingual parallel corpora based on the Multi-Class Support Vector Machine (MSVM) and the Hidden Markov Model (HMM) classifiers are presented. A feature vector is extracted from the text pair that is under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually prepared training data was assigned to train the Multi-Class Support Vector Machine and Hidden Markov Model. Another set of data was used for testing. The results of the MSVM and HMM outperform the results of the length based approach. Moreover these new approaches are valid for any language pairs and are quite flexible since the feature vector may contain less, more, or different features, such as a lexical matching feature and Hanzi characters in Japanese-Chinese texts, than the ones used in the current research.

Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

  • HyunJung Choi;Muyeol Choi;Seonhui Kim;Yohan Lim;Minkyu Lee;Seung Yun;Donghyun Kim;Sang Hun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.127-136
    • /
    • 2024
  • The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

Conceptual Transformation for Code Generation from SDL-92 to Object-oriented Languages (SDL-92에서 객체지향 언어의 코드 생성을 위한 개념 변환)

  • Lee, Si-Young;Lee, Dong-Gill;Lee, Joon-Kyung;Kim, Sung-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.473-487
    • /
    • 2000
  • SDL-92, the language for specification and description of system, has held on to the communication method that based on processes and signals in the adoption of object-oriented concept to embrace the previous documents of system specification and description and users. It has caused problems, not only the absence of corresponding concepts in automatic generation to object-oriented language program based on method and object, but also some side effects accompanied by them like visibility and communication method. So, in this paper, we present a general object-oriented language model, which based on method and object, make a study of problems in the transformation fromSDL-92 to proposed model, and then propose conceptual transformation methods to solve them. The proposed transformation method can utilize the built-in parallelism in objects and guarantee the compiler level portability in translated program by providing translation into the syntax of target language.

  • PDF

Development of a Translator for Automatic Generation of Ubiquitous Metaservice Ontology (유비쿼터스 메타서비스 온톨로지 자동 생성을 위한 번역기 개발)

  • Lee, Mee-Yeon;Lee, Jung-Won;Park, Seung-Soo;Cho, We-Duke
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.1
    • /
    • pp.191-203
    • /
    • 2009
  • To provide dynamic services for users in ubiquitous computing environments by considering context in real-time, in our previous work we proposed Metaservice concept, the description specification and the process for building a Metaservice library. However, our previous process generates separated models - UML, OWL, OWL-S based models - from each step, so it did not provide the established method for translation between models. Moreover, it premises aid of experts in various ontology languages, ontology editing tools and the proposed Metaservice specification. In this paper, we design the translation process from domain ontology in OWL to Metaservice Library in OWL-S and develop a visual tool in order to enable non-experts to generate consistent models and to construct a Metaservice library. The purpose of the Metaservice Library translation process is to maintain consistency in all models and to automatically generate OWL-S code for Metaservice library by integrating existing OWL model and Metaservice model.

Deep-Learning Approach for Text Detection Using Fully Convolutional Networks

  • Tung, Trieu Son;Lee, Gueesang
    • International Journal of Contents
    • /
    • v.14 no.1
    • /
    • pp.1-6
    • /
    • 2018
  • Text, as one of the most influential inventions of humanity, has played an important role in human life since ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications such as the text data extracted from images that can provide information for automatic annotation, indexing, language translation, and the assistance systems for impaired persons. Therefore, natural-scene text detection with active research topics regarding computer vision and document analysis is very important. Previous methods have poor performances due to numerous false-positive and true-negative regions. In this paper, a fully-convolutional-network (FCN)-based method that uses supervised architecture is used to localize textual regions. The model was trained directly using images wherein pixel values were used as inputs and binary ground truth was used as label. The method was evaluated using ICDAR-2013 dataset and proved to be comparable to other feature-based methods. It could expedite research on text detection using deep-learning based approach in the future.

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information (언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축)

  • Lee, Jun-Beom;Kim, So-Eon;Park, Seong-Bae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.125-132
    • /
    • 2022
  • Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.

Design and implementation of a EER-based Visual Product Information Modeler (EER기반의 시각적 상품정보 모델링 에디터의 설계와 구현)

  • Tark, Moon-Hee;Kim, Kyung-Hwa;Shim, Jun-Ho
    • The Journal of Society for e-Business Studies
    • /
    • v.12 no.3
    • /
    • pp.97-106
    • /
    • 2007
  • A core technology that may realize the Semantic Web is Ontology. The OWL (Web Ontology Language) has been positioned as a standard language. It requires technical expertise to directly represent the domain knowledge in OWL. Based on our experience of analyzing the fundamental relationships of concepts in e-catalog domain, we have developed a visual product information modeler called PROMOD. The modeling editor makes it possible to automatically generate the OWL codes for the given product information. We employ an Extended Entity-Relationship for conceptual modeling, enriched with modeling elements specialized for the product domain. In this paper, we present our translation schemes from EER model to OWL codes, and how to design and implement the modeling editor. We also provide a scenario to demonstrate the usage of the editor in practice.

  • PDF