• Title/Summary/Keyword: Transfer Dictionary

Search Result 18, Processing Time 0.023 seconds

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

The Construction of Korean-to-English Verb Dictionary for Phrase-to-Phrase Translations (구절 변환을 위한 한영 동사 사전 구성)

  • Ok, Cheol-Young;Kim, Yung-Taek
    • Annual Conference on Human and Language Technology
    • /
    • 1991.10a
    • /
    • pp.44-57
    • /
    • 1991
  • In the transfer machine translation, transfer dictionary decides the complexity of the transfer phase and the quality of translation according to the types and precision of informations supplied in the dictionary. Using the phrasal level translated informations within the human readable dictionary, human being translates a source sentence correctly and naturally. In this paper, we propose the verb transfer dictionary in which the various informations are constructed so the machine readable format that the Korean-to-English machine translation system can utilize them. In the proposed dictionary, we first provide the criterions by which an appropriate target verb is selected in phrase-to-phrase translations without an additional semantic analysis in transfer phase. Second, we provide the concrete sentence structure of a target verb so that we can resolve the expressive gaps between two languages and reduce the complexity of the various structure transfer in word-to-word translation.

  • PDF

Transfer Dictionary for A Token Based Transfer Driven Korean-Japanese Machine Translation (토큰기반 변환중심 한일 기계번역을 위한 변환사전)

  • Yang Seungweon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.3
    • /
    • pp.64-70
    • /
    • 2004
  • Korean and Japanese have same structure of sentences because they belong to same family of languages. So, The transfer driven machine translation is most efficient to translate each other. This paper introduce a method which creates a transfer dictionary for Token Based Transfer Driven Koran-Japanese Machine Translation(TB-TDMT). If the transfer dictionaries are created well, we get rid of useless effort for traditional parsing by performing shallow parsing. The semi-parser makes the dependency tree which has minimum information needed output generating module. We constructed the transfer dictionaries by using the corpus obtained from ETRI spoken language database. Our system was tested with 900 utterances which are collected from travel planning domain. The success-ratio of our system is $92\%$ on restricted testing environment and $81\%$ on unrestricted testing environment.

  • PDF

Efficient and Dynamic Authenticated Dictionary Design Using RSA One-way Accumulator (RSA 일방향 어큐뮬레이터를 이용한 효율적이고 동적인 인증 딕셔너리 설계)

  • Kim, Soon-Seok;Lee, Yong-Hee;Lee, Kang-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.4
    • /
    • pp.651-660
    • /
    • 2008
  • The widespread use of public networks, such as the Internet, for the exchange of sensitive data that need a severe security, like legally valid documents and business transactions. At the same time public-key certificates used for sensitive data interchange form the viewpoint of data integrity and authentication. But there are some weakness of data transfer capacity and security in public key infrastructure(PKI) environment. This paper use the RSA one-way accumulator to realize an efficient and dynamic authenticated dictionary, where untrusted directories provide cryptographically verifiable answers to membership queries on a set maintained by a trusted source.

Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets

  • Kyoungman Bae;Joon-Ho Lim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.59-70
    • /
    • 2024
  • We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.

Rain Detection via Deep Convolutional Neural Networks (심층 컨볼루셔널 신경망 기반의 빗줄기 검출 기법)

  • Son, Chang-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.8
    • /
    • pp.81-88
    • /
    • 2017
  • This paper proposes a method of detecting rain regions from a single image. More specifically, a way of training the deep convolutional neural network based on the collected rain and non-rain patches is presented in a supervised manner. It is also shown that the proposed rain detection method based on deep convolutional neural network can provide better performance than the conventional rain detection method based on dictionary learning. Moreover, it is confirmed that the application of the proposed rain detection for rain removal can lead to some improvement in detail representation on the low-frequency regions of the rain-removed images. Additionally, this paper introduces the rain transfer method that inserts rain patterns into original images, thereby producing rain effects on the resulting images. The proposed rain transfer method could be used to augment rain patterns while constructing rain database.

A Semantic Analysis of Human Body Russian Slang (사람의 신체에 대한 러시아어 슬랭의 의미론적 분석)

  • Kim, Sung Wan
    • Cross-Cultural Studies
    • /
    • v.31
    • /
    • pp.241-262
    • /
    • 2013
  • In this study, we select and analyze the slang that is represented in Elistratov's "Dictionary of Russian slang". Through the above analysis, some conclusions were drawn as follows: First, as a social and psychological phenomenon appears universal in all languages, the study of slang generates strict criteria for the analysis. Unlike literary language, listed in the dictionary slang expressions can become obsolete for their short period of usage by native speakers. Therefore, in the following research of the actual data, we have to validate words targeted for analysis. Second, as the result of the analysis it is metaphor for the most part studied rather than metonymy. The semantic derivations as a result of metonymy are used very frequently in real life. But in this study we mainly analyze words, therefore the number of words was less in metonymy than was expected. Third, the basic types of metaphor are appeared as similarity by form, function, and location, and there are varieties of intervening of subjectivity in similarity of emotional impression. Fourth, the metonymy is divided into three cases: the part meaning the whole, the whole meaning the part, and some thing meaning the reality of where it exists. Fifth, not only literary language, but also slang as the 'transitional process' is the most active way of development of new meanings, and there are two methods to transfer main meaning to second meaning.

A Hybrid Method of Verb disambiguation in Machine Translation (기계번역에서 동사 모호성 해결에 관한 하이브리드 기법)

  • Moon, Yoo-Jin;Martha Palmer
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.3
    • /
    • pp.681-687
    • /
    • 1998
  • The paper presents a hybrid mcthod for disambiguation of the verb meaning in the machine translation. The presented verb translation algorithm is to perform the concept-based method and the statistics-based method simultaneously. It uses a collocation dictionary, WordNct and the statistical information extracted from corpus. In the transfer phase of the machine translation, it tries to find the target word of the source verb. If it fails, it refers to Word Net to try to find it by calculating word similarities between the logical constraints of the source sentence and those in the collocation dictionary. At the same time, it refers to the statistical information extracted from corpus to try to find it by calculating co-occurrence similarity knowledge. The experimental result shows that the algorithm performs more accurate verb translation than the other algorithms and improves accuracy of the verb translation by 24.8% compared to the collocation-based method.

  • PDF

Development of the Rule-based Smart Tourism Chatbot using Neo4J graph database

  • Kim, Dong-Hyun;Im, Hyeon-Su;Hyeon, Jong-Heon;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.2
    • /
    • pp.179-186
    • /
    • 2021
  • We have been developed the smart tourism app and the Instagram and YouTube contents to provide personalized tourism information and travel product information to individual tourists. In this paper, we develop a rule-based smart tourism chatbot with the khaiii (Kakao Hangul Analyzer III) morphological analyzer and Neo4J graph database. In the proposed chatbot system, we use a morpheme analyzer, a proper noun dictionary including tourist destination names, and a general noun dictionary including containing frequently used words in tourist information search to understand the intention of the user's question. The tourism knowledge base built using the Neo4J graph database provides adequate answers to tourists' questions. In this paper, the nodes of Neo4J are Area based on tourist destination address, Contents with property of tourist information, and Service including service attribute data frequently used for search. A Neo4J query is created based on the result of analyzing the intention of a tourist's question with the property of nodes and relationships in Neo4J database. An answer to the question is made by searching in the tourism knowledge base. In this paper, we create the tourism knowledge base using more than 1300 Jeju tourism information used in the smart tourism app. We plan to develop a multilingual smart tour chatbot using the named entity recognition (NER), intention classification using conditional random field(CRF), and transfer learning using the pretrained language models.