Search | Korea Science

Study on the development of automatic translation service system for Korean astronomical classics by artificial intelligence - Focused on development results and test operation (천문 고문헌 특화 인공지능 자동번역 서비스 시스템 개발 연구 - 개발 결과 및 시험 운영 위주)

Seo, Yoon Kyung;Kim, Sang Hyuk;Ahn, Young Sook;Choi, Go-Eun;Choi, Young Sil;Baik, Hangi;Sun, Bo Min;Kim, Hyun Jin;Choi, Byung Sook;Lee, Sahng Woon;Park, Raejin
- The Bulletin of The Korean Astronomical Society
- /
- v.45 no.1
- /
- pp.56.1-56.1
- /
- 2020
한국의 고문헌 중에는 다양한 고천문 기록들이 한문 형태로 존재하며, 이를 학술적으로 활용하기 위해서는 전문 번역가 투입에 따른 많은 비용과 시간이 요구된다. 이에 인공신경망 기계학습에 의한 인공지능 번역기를 개발하여 비록 초벌 번역 수준일지라도 문장 형태의 한문을 한글로 자동번역해 주는 학술 도구를 소개하고자 한다. 이 자동번역기는 한국천문연구원이 한국정보화진흥원이 주관하는 2019년도 Information and Communication Technology 기반 공공서비스 촉진사업에 한국고전번역원과 공동 참여하여 개발 완료한 것이다. 이 연구는 고천문 도메인에 특화된 인공지능 기계학습용 데이터인 천문 고전 코퍼스를 구축하여 이를 기반으로 천문 고전 특화 자동번역 모델을 개발하고 번역 서비스하는 것을 목적으로 한다. 이를 위해 구축되는 시스템은 크게 세 가지이다. 첫째, 로그인이 필요 없이 누구나 웹 접속을 통해 사용이 가능한 클라우드 기반의 고문헌 자동번역 대국민서비스 시스템이다. 둘째, 참여 기관별로 구축된 코퍼스와 도메인 특화된 번역 모델의 생성 및 관리할 수 있는 클라우드 기반의 대기관 서비스 플랫폼 구축이다. 셋째, 개발된 자동번역 Applied Programmable Interface를 활용한 한국천문연구원 내 자체 서비스가 가능한 AITHA 시스템이다. 연구 결과로서 먼저 구축된 천문 고전 코퍼스 60,760건에 대한 샘플링 검수 결과는 품질 순도 99.9% 이상이다. 아울러 도출된 천문 고전 특화 번역 모델 총 20개 중 대표 모델에 대한 성능 평가 결과는 기계 번역 텍스트 품질 평가 알고리즘인 Bilingual Evaluation Understudy 평가에서 40.02점이며, 전문가에 의한 휴먼 평가에서 5.0 만점 중 4.05점이다. 이는 당초 연구 목표로 삼았던 초벌 번역 수준에 충분하며, 현재 개발된 시스템들은 자체 시험 운영 중이다. 이 연구는 특수 고문헌에 해당되는 고천문 기록들의 번역 장벽을 낮춰 관련 연구자들의 학술적 접근 및 다양한 연구에 도움을 줄 수 있다는 점에서 의의가 있다. 또한 고천문 분야가 인공지능 자동번역 확산 플랫폼 시범의 첫 케이스로써 추후 타 학문 분야 참여 시 시너지 효과도 기대해 볼 수 있다. 고문헌 자동번역기는 점차 더 많은 학습 데이터와 학습량이 쌓일수록 더 좋은 학술 도구로 진화할 것이다.
PDF

Braille Translator Using Technology (OCR 기술을 활용한 점자 번역기)

Jang, Chan-hee;Kim, Duk-Woen;Jang, Jun-Ki;Jo, Jun-Hee;Park, Hyun Joo;Kim, Joong Jae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2019.10a
- /
- pp.566-569
- /
- 2019
본 시스템은 시각 장애인들이 일반 책을 점자책처럼 읽을 수 있도록 페이지를 스캔해서 점자(점각)를 표현하는 기기이다. 카메라 모듈로 페이지를 스캔하면 프로그램은 페이지 위의 글자를 인식해 해석하고 이를 점자로 번역한다. 번역된 점자는 솔레노이드 모터를 통해 실제 시각 장애인이 읽을 수 있도록 점자를 표현한다.
https://doi.org/10.3745/PKIPS.y2019m10a.566 인용 PDF

Implementation of Korean Honorific Converter Using OpenNMT (OpenNMT를 활용한 한글 존댓말 변환기의 구현)

Jeong, Jun-Nyeong;Kim, Sang-Yeong;Kim, Seong-Tae;Lee, Jeong-Jae;Jung, Yuchul
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2021.01a
- /
- pp.141-142
- /
- 2021
최근 발전한 인공신경망 기반 기계 번역은 번역 시 더 자연스러운 번역을 제공한다. 본 논문에서는 기계번역기법을 이용하여 반말 표현을 존댓말 표현으로 변환하는 기법을 제안한다. 특히, 이를 위해 DCInside의 게시판을 크롤링하고 AI-HUB 데이터와 합쳐 약 20,000개의 자체 데이터 셋을 구축하였으며, 한글 전처리를 위한 4가지 기법 및 OpenNMT 프레임웍의 LSTM 및 Transformer 모듈을 활용하여 실험을 진행하였다. 이를 통해, 반말 표현을 높임 표현으로 변환하는 최적조합을 확인하였으며, 검증시 BLUE점수로 최대 66.53를 획득하였다.
PDF

An Analysis System of Prepositional Phrases in English-to-Korean Machine Translation (영한 기계번역에서 전치사구를 해석하는 시스템)

Gang, Won-Seok
- The Transactions of the Korea Information Processing Society
- /
- v.3 no.7
- /
- pp.1792-1802
- /
- 1996
The analysis of prepositional phrases in English-to Korean machine translation has problem on the PP-attachment resolution, semantic analysis, and acquisition of information. This paper presents an analysis system for prepositional phrases, which solves the problem. The analysis system consists of the PP-attachment resolution hybrid system, semantic analysis system, and semantic feature generator that automatically generates input information. It provides objectiveness in analyzing prepositional phrases with the automatic generation of semantic features. The semantic analysis system enables to generate natural Korean expressions through selection semantic roles of prepositional phrases. The PP-attachment resolution hybrid system has the merit of the rule-based and neural network-based method.
PDF

Design of Hybrid Debugging System for Java Programs (자바 프로그램을 위한 복합 디버깅 시스템의 설계)

Kouh, Hoon-Joon
- The Journal of the Korea Contents Association
- /
- v.9 no.1
- /
- pp.81-88
- /
- 2009
In the previous work, we presented HDTS for locating logical errors in Java programs. The HDTS locates an erroneous method at an execution tree using an algorithmic program debugging technique and locates a statement with errors in the erroneous method using a step-wise program debugging. The technique can remove the unnecessary statements and nodes in debugging using a program slicing technique at the execution tree. So HDTS reduces the number of program debugging. In this paper, we design HDTS system for debugging java programs. We define small subset of Java language and design the translator that translates java source codes and the virtual machine that runs java programs. We design GUI(Graphical User Interface) for debugging.
https://doi.org/10.5392/JKCA.2009.9.1.081 인용 PDF

A Corpus-based Hybrid Translation System for Limited Domain (제한된 도메인을 위한 코퍼스 기반의 하이브리드 번역 시스템)

Kang, Un-Gu;Kim, Sung-Hyun;Lee, Byung-Mun;Lee, Young-Ho
- Journal of KIISE:Software and Applications
- /
- v.37 no.11
- /
- pp.826-836
- /
- 2010
This paper proposes a hybrid machine translation system which integrates SMT, RBMT, and PBMT in serial manner. SMT in our project has been implemented as a Quasi-syntax-based system where monotone search is done, given a preprocessed string of foreign language. Preprocessing includes rule-based reordering, NE recognition, clausal splitting, and attaching pattern translation information at the end of the input text. For lengthy & complex sentences, clausal splitting turned out to generate better translation than normal input.
PDF KSCI

Three-Phase English Syntactic Analysis for Improving the Parsing Efficiency (영어 구문 분석의 효율 개선을 위한 3단계 구문 분석)

Kim, Sung-Dong
- KIPS Transactions on Software and Data Engineering
- /
- v.5 no.1
- /
- pp.21-28
- /
- 2016
The performance of an English-Korean machine translation system depends heavily on its English parser. The parser in this paper is a part of the rule-based English-Korean MT system, which includes many syntactic rules and performs the chart-based parsing. The parser generates too many structures due to many syntactic rules, so much time and memory are required. The rule-based parser has difficulty in analyzing and translating the long sentences including the commas because they cause high parsing complexity. In this paper, we propose the 3-phase parsing method with sentence segmentation to efficiently translate the long sentences appearing in usual. Each phase of the syntactic analysis applies its own independent syntactic rules in order to reduce parsing complexity. For the purpose, we classify the syntactic rules into 3 classes and design the 3-phase parsing algorithm. Especially, the syntactic rules in the 3rd class are for the sentence structures composed with commas. We present the automatic rule acquisition method for 3rd class rules from the syntactic analysis of the corpus, with which we aim to continuously improve the coverage of the parsing. The experimental results shows that the proposed 3-phase parsing method is superior to the prior parsing method using only intra-sentence segmentation in terms of the parsing speed/memory efficiency with keeping the translation quality.
https://doi.org/10.3745/KTSDE.2016.5.1.21 인용 PDF KSCI

Design of Translator for generating Secure Java Bytecode from Thread code of Multithreaded Models (다중스레드 모델의 스레드 코드를 안전한 자바 바이트코드로 변환하기 위한 번역기 설계)

김기태;유원희
- Proceedings of the Korea Society for Industrial Systems Conference
- /
- 2002.06a
- /
- pp.148-155
- /
- 2002
Multithreaded models improve the efficiency of parallel systems by combining inner parallelism, asynchronous data availability and the locality of von Neumann model. This model executes thread code which is generated by compiler and of which quality is given by the method of generation. But multithreaded models have the demerit that execution model is restricted to a specific platform. On the contrary, Java has the platform independency, so if we can translate from threads code to Java bytecode, we can use the advantages of multithreaded models in many platforms. Java executes Java bytecode which is intermediate language format for Java virtual machine. Java bytecode plays a role of an intermediate language in translator and Java virtual machine work as back-end in translator. But, Java bytecode which is translated from multithreaded models have the demerit that it is not secure. This paper, multhithread code whose feature of platform independent can execute in java virtual machine. We design and implement translator which translate from thread code of multithreaded code to Java bytecode and which check secure problems from Java bytecode.
PDF

Translation and Validity test of the FIM instrument and Guide (FIM도구 및 지침서 번역과 타탕도 검증 연구)

Hwang, Ok-Nam;Cho, Kap-Chul
- The Korean Journal of Rehabilitation Nursing
- /
- v.4 no.2
- /
- pp.232-239
- /
- 2001
이 논문의 목적은 재활기능을 측정하는 FIM 도구(영문)를 한국어로 번역하고 한국 한국문화에 적합한지를 문화적 타당도를 검증하기 위함이다. 이를 위해 FIM 도구를 소개하고 타당도 검증을 위해 Flaherty et al.(1988) 등이 소개한 횡문화적 일치성 검증을 위한 5가지 단계 중 2가지 단계 즉 전문가 집단의 내용타당도 검증과 역번역 절차를 사용하였다. 연구 결과 FIM 도구는 사회 심리적 도구가 아닌 재활환자의 기능을 측정하기 위해 사용된 도구여서 여기에 사용된 용어나 문장들은 비교적 간결하고 번역에 혼동을 초래할 수 있는 형용사나 은유법은 거의 사용되지 않아서 의미는 일치한 수준으로 나타났다. 그러나 47쪽에 해당되는 방대한 도구를 번역함에 있어 연구자가 5문장을 해석하지 않은 것이 나타나 재해석을 하였고, 의미를 더 정확하게 전하게 하기 위해 '사고 없음'의 문장을 '옷이나 침요에 실금하여 적시는 사고가 없음'으로 풀어서 설명하였고, '배뇨 사고 없음' 혹은 '배변 사고 없음'으로 번역하였다. 생활양식의 차이로 인해 크게 2가지 차이가 나타나 한국에서 이 도구를 사용하고자 할 때는 신뢰도 검증은 물론 다소의 변형이 불가피 한 것으로 나타났다. 2가지의 생활양식의 차이는 식습관의 차이와 온돌과 침대 문화의 차이였다. 첫째, 한국에서는 식사시에 포크 대신 젓가락을 사용한다. 그러나 손을 잘 사용하지 못하는 장애인의 경우 젓가락 대신 포크를 사용하게 되므로 이 때 포크 사용은 보조기로 고려되어서 측정 점수는 7점이 아닌 6점으로 측정되어야 할 것을 제안한다. 둘째, 한국 사람들은 온돌문화를 갖고 있어 전통 양식의 가옥에 거주하는 장애인의 경우 개조된 가옥에서는 휠체어를 사용 할 수 있지만 그렇지 않은 경우 휠체어 없이 앉아서 침대나 욕실 및 변기로 이동이 가능하다. 이런 경우 앉아서 이동할 수있는 환자들의 기능 정도를 정밀하게 검토하여 측정 가능한 점수로 환원해서 사용할 것을 제안한다.
PDF

High-Quality Multimodal Dataset Construction Methodology for ChatGPT-Based Korean Vision-Language Pre-training (ChatGPT 기반 한국어 Vision-Language Pre-training을 위한 고품질 멀티모달 데이터셋 구축 방법론)

Jin Seong;Seung-heon Han;Jong-hun Shin;Soo-jong Lim;Oh-woog Kwon
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.603-608
- /
- 2023
본 연구는 한국어 Vision-Language Pre-training 모델 학습을 위한 대규모 시각-언어 멀티모달 데이터셋 구축에 대한 필요성을 연구한다. 현재, 한국어 시각-언어 멀티모달 데이터셋은 부족하며, 양질의 데이터 획득이 어려운 상황이다. 따라서, 본 연구에서는 기계 번역을 활용하여 외국어(영문) 시각-언어 데이터를 한국어로 번역하고 이를 기반으로 생성형 AI를 활용한 데이터셋 구축 방법론을 제안한다. 우리는 다양한 캡션 생성 방법 중, ChatGPT를 활용하여 자연스럽고 고품질의 한국어 캡션을 자동으로 생성하기 위한 새로운 방법을 제안한다. 이를 통해 기존의 기계 번역 방법보다 더 나은 캡션 품질을 보장할 수 있으며, 여러가지 번역 결과를 앙상블하여 멀티모달 데이터셋을 효과적으로 구축하는데 활용한다. 뿐만 아니라, 본 연구에서는 의미론적 유사도 기반 평가 방식인 캡션 투영 일치도(Caption Projection Consistency) 소개하고, 다양한 번역 시스템 간의 영-한 캡션 투영 성능을 비교하며 이를 평가하는 기준을 제시한다. 최종적으로, 본 연구는 ChatGPT를 이용한 한국어 멀티모달 이미지-텍스트 멀티모달 데이터셋 구축을 위한 새로운 방법론을 제시하며, 대표적인 기계 번역기들보다 우수한 영한 캡션 투영 성능을 증명한다. 이를 통해, 우리의 연구는 부족한 High-Quality 한국어 데이터 셋을 자동으로 대량 구축할 수 있는 방향을 보여주며, 이 방법을 통해 딥러닝 기반 한국어 Vision-Language Pre-training 모델의 성능 향상에 기여할 것으로 기대한다.
PDF

Search Result 320, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)