• Title/Summary/Keyword: Data Translation

Search Result 645, Processing Time 0.025 seconds

Filter-mBART Based Neural Machine Translation Using Parallel Corpus Filtering (병렬 말뭉치 필터링을 적용한 Filter-mBART기반 기계번역 연구)

  • Moon, Hyeonseok;Park, Chanjun;Eo, Sugyeong;Park, JeongBae;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.5
    • /
    • pp.1-7
    • /
    • 2021
  • In the latest trend of machine translation research, the model is pretrained through a large mono lingual corpus and then finetuned with a parallel corpus. Although many studies tend to increase the amount of data used in the pretraining stage, it is hard to say that the amount of data must be increased to improve machine translation performance. In this study, through an experiment based on the mBART model using parallel corpus filtering, we propose that high quality data can yield better machine translation performance, even utilizing smaller amount of data. We propose that it is important to consider the quality of data rather than the amount of data, and it can be used as a guideline for building a training corpus.

CNN-based Sign Language Translation Program for the Deaf (CNN기반의 청각장애인을 위한 수화번역 프로그램)

  • Hong, Kyeong-Chan;Kim, Hyung-Su;Han, Young-Hwan
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.22 no.4
    • /
    • pp.206-212
    • /
    • 2021
  • Society is developing more and more, and communication methods are developing in many ways. However, developed communication is a way for the non-disabled and has no effect on the deaf. Therefore, in this paper, a CNN-based sign language translation program is designed and implemented to help deaf people communicate. Sign language translation programs translate sign language images entered through WebCam according to meaning based on data. The sign language translation program uses 24,000 pieces of Korean vowel data produced directly and conducts U-Net segmentation to train effective classification models. In the implemented sign language translation program, 'ㅋ' showed the best performance among all sign language data with 97% accuracy and 99% F1-Score, while 'ㅣ' showed the highest performance among vowel data with 94% accuracy and 95.5% F1-Score.

A Novel Inhibitor of Translation Initiation Factor eIF5B in Saccharomyces cerevisiae

  • Ah-Ra Goh;Yi-Na Kim;Jae Hyeun Oh;Sang Ki Choi
    • Journal of Microbiology and Biotechnology
    • /
    • v.34 no.6
    • /
    • pp.1348-1355
    • /
    • 2024
  • The eukaryotic translation initiation factor eIF5B is a bacterial IF2 ortholog that plays an important role in ribosome joining and stabilization of the initiator tRNA on the AUG start codon during the initiation of translation. We identified the fluorophenyl oxazole derivative 2,2-dibromo-1-(2-(4-fluorophenyl)benzo[d]oxazol-5-yl)ethanone quinolinol as an inhibitor of fungal protein synthesis using an in vitro translation assay in a fungal system. Mutants resistant to this compound were isolated in Saccharomyces cerevisiae and were demonstrated to contain amino acid substitutions in eIF5B that conferred the resistance. These results suggest that eIF5B is a target of potential antifungal compound and that mutation of eIF5B can confer resistance. Subsequent identification of 16 other mutants revealed that primary mutations clustered mainly on domain 2 of eIF5B and secondarily mainly on domain 4. Domain 2 has been implicated in the interaction with the small ribosomal subunit during initiation of translation. The tested translation inhibitor could act by weakening the functional contact between eIF5B and the ribosome complex. This data provides the basis for the development of a new family of antifungals.

Some nonparametric test procedure for the multi-sample case

  • Park, Hyo-Il;Kim, Ju-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.1
    • /
    • pp.237-250
    • /
    • 2009
  • We consider a nonparametric test procedure for the multi-sample problem with grouped data. We construct the test statistics based on the scores obtained from the likelihood ratio principle and derive the limiting distribution under the null hypothesis. Also we illustrate our procedure with an example and obtain the asymptotic properties under the Pitman translation alternatives. Also we discuss some concluding remarks. Finally we derive the covariance between components in the Appendix.

  • PDF

A Study on multi-translation system for e-business collaboration (e-비즈니스 협업에 적합한 다중변환 시스템 연구)

  • Ahn, Kyeong-Rim;Chung, Jin-Wook
    • Journal of Internet Computing and Services
    • /
    • v.7 no.6
    • /
    • pp.123-130
    • /
    • 2006
  • The transaction was happened within a single business entity or a single marketplace at the stage of e-business. It becomes to grow to complex form. Expecially, the need for business collaboration between business entities or marketplaces has being on the rise as the core topic. The format translation between documents is very important factor according to various the exchanged document formats. In this paper, we define ebXML as the basic format of exchanged document according to object-oriented business transaction. Also we design the multi-format translation system to support the translation of various document formats. The proposed system in this paper, is designed with model-driven method and it is possible to construct with various structure as for system environment. The proposed translation system is designed to use the proposed system as adding the corresponding parsing module even though any format of document. Also, we increase the reusability of data as using the common data set. In this paper, we prove the superiority of the proposed system to compare the performance with the legacy system for various format translation.

  • PDF

Korean Text to Gloss: Self-Supervised Learning approach

  • Thanh-Vu Dang;Gwang-hyun Yu;Ji-yong Kim;Young-hwan Park;Chil-woo Lee;Jin-Young Kim
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.32-46
    • /
    • 2023
  • Natural Language Processing (NLP) has grown tremendously in recent years. Typically, bilingual, and multilingual translation models have been deployed widely in machine translation and gained vast attention from the research community. On the contrary, few studies have focused on translating between spoken and sign languages, especially non-English languages. Prior works on Sign Language Translation (SLT) have shown that a mid-level sign gloss representation enhances translation performance. Therefore, this study presents a new large-scale Korean sign language dataset, the Museum-Commentary Korean Sign Gloss (MCKSG) dataset, including 3828 pairs of Korean sentences and their corresponding sign glosses used in Museum-Commentary contexts. In addition, we propose a translation framework based on self-supervised learning, where the pretext task is a text-to-text from a Korean sentence to its back-translation versions, then the pre-trained network will be fine-tuned on the MCKSG dataset. Using self-supervised learning help to overcome the drawback of a shortage of sign language data. Through experimental results, our proposed model outperforms a baseline BERT model by 6.22%.

An Alignment based technique for Text Translation between Traditional Chinese and Simplified Chinese

  • Sue J. Ker;Lin, Chun-Hsien
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.147-156
    • /
    • 2002
  • Aligned parallel corpora have proved very useful in many natural language processing tasks, including statistical machine translation and word sense disambiguation. In this paper, we describe an alignment technique for extracting transfer mapping from the parallel corpus. During building our system and data collection, we observe that there are three types of translation approaches can be used. We especially focuses on Traditional Chinese and Simplified Chinese text lexical translation and a method for extracting transfer mappings for machine translation.

  • PDF

A System Model for Storage Independent Use of SPARQL-to-SQL Translation Algorithm (SPARQL-to-SQL 변환 알고리즘의 저장소 독립적 활용을 위한 시스템 모델)

  • Son, Ji-Seong;Jeong, Dong-Won;Baik, Doo-Kwon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.5
    • /
    • pp.467-471
    • /
    • 2008
  • With active research on Web ontology, various storages and query languages have been developed to store Web Ontology. As SPARQL usage increases and most of storages are based on relational data base, the necessity of SPARQL-to-SQL translation algorithm development becomes issued. Even though several translation algorithms have been proposed, there still remain the following problems. They do not support fully SPARQL clauses and they are dependent on a specific storage model. This paper proposes a new model to use a specific translation algorithm independently on storages.

SemFilter: A Simple and Efficient Semantic XML Message Filtering (SemFilter: 단순하며 효율적인 시맨틱 XML 메시지 필터링)

  • Kim, Jae-Hoon;Park, Seog
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.7
    • /
    • pp.680-693
    • /
    • 2008
  • Recent studies on XML filtering assume that all data sources follow a single global schema defined in a filtering system. However, beyond this simple assumption, a filtering system can provide a service that allows data publishers to have their own schema; hence, the data sources will become heterogeneous. The number of data sources is expected to be large in a filtering system and the data sources are frequently published, updated, and disappeared, that is, dynamic. In this paper, we introduce implementing a simple and efficient XPath query translation method for such a dynamic environment. The method is especially targeted for a query which is composed based only on users' knowledge and experience without a graphical guidance of the global schema. When a user queries a large number of heterogeneous data, there is a high possibility that the query is not consistent with the same local schema assumed by the user. Our query translation method also supports a function for this problem. Some experimental results for query translation performance have shown that our method has reasonable performance, and is more practical than the existing method.

A Brief Verification Study on the Normalization and Translation Invariant of Measurement Data for Seaport Efficiency : DEA Approach (항만효율성 측정 자료의 정규성과 변환 불변성 검증 소고 : DEA접근)

  • Park, Ro-Kyung;Park, Gil-Young
    • Journal of Korea Port Economic Association
    • /
    • v.23 no.2
    • /
    • pp.109-120
    • /
    • 2007
  • The purpose of this paper is to verify the two problems(normalization for the different inputs and outputs data, translation invariant for the negative data) which will be occurred in measuring the seaport DEA(data envelopment analysis) efficiency. The main result is as follow: Normalization and translation invariant in the BCC model for measuring the seaport efficiency by using 26 Korean seaport data in 1995 with two inputs(berthing capacity, cargo handling capacity) and three outputs(import cargo throughput, export cargo throughput, number of ship calls) was verified. The main policy implication of this paper is that the port management authority should collect the more specific data and publish these data on the inputs and outputs in the seaports with consideration of negative(ex. accident numbers in each seaport) and positive value for analyzing the efficiency by the scholars, because normalization and translation invariant in the data was verified.

  • PDF