• Title/Summary/Keyword: Korean parsing

Search Result 326, Processing Time 0.027 seconds

An implementation of parser for special syntax processing in Korea (한국어 특수구문 처리를 위한 파서의 구현)

  • Kim, Jae-Mun;Lee, Sang-Kuk;Lee, Sang-Jo
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.11
    • /
    • pp.124-135
    • /
    • 1994
  • In this paper, we propose a Korean syntax analysis system for special syntax processing. HPSG, which processes syntatic and semantic analysis unificationally, is chosen for grammar description. Head-driven unidirectional active chart parser, which is efficient in Korean processing, is used for parsing mechanism. The parser of this paper can analyze not only general sentence structure which consists of complement-head, adjunct-head and head-head structure bur also special syntax which consists of auxiliay verb sentence, causative sentence, passive sentence and so on.

  • PDF

Korean Dependency Parsing Using Online Learning (온라인 학습을 이용한 한국어 의존구문분석)

  • Lee, Yong-Hun;Lee, Jong-Hyeok
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.299-304
    • /
    • 2010
  • 본 논문에서는 온라인 학습을 이용한 한국어 의존구문분석 방법을 제안한다. CoNLL-X에서 1위를 차지한 그래프 기반 의존구문분석 방법을 한국어에 맞게 변형하고, 한국어의 교착어적 특성을 고려해 한국어에 적합한 자질 집합을 제시하였다. 특히 의존트리의 에지(edge)를 단어와 단어간의 의존관계가 아닌 부분트리(partial tree)와 부분트리의 의존관계로 바라보기 위해 부분트리가 공유하고 있는 기능어 정보를 추가 자질로 사용하였다. 또한 한국어의 지배소 후위(head-final) 언어 특성과 투사성(projectivity)을 이용하여 Eisner(1996) 알고리즘을 사용하지 않고도 O($n^3$)의 CYK알고리즘을 사용할 수 있었고, 이를 이용해 최적의 전역해(global optimum)를 찾을 수 있었다. 각 자질을 위한 최적의 가중치 벡터는 온라인 학습방법 중 하나인 Collins(2002)의 averaged perceptron 알고리즘을 사용함으로써 빠르게 모델을 학습할 수 있었다. 제안 모델을 국어정보베이스(KIBS) 말뭉치에 적용한 결과 어절 단위 정확률 88.42%의 높은 성능을 얻을 수 있었다.

  • PDF

Design and Implementation of an SGML Document Presentation System based on DSSSL (DSSSL에 기반한 SGML 문서 표현 시스템의 설계 및 구현)

  • Kim, Chang-Soo;Jung, Hoe-Kyung;Yun, Bo-Hyun;Kang, Hyung-Yu
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.5
    • /
    • pp.477-486
    • /
    • 2000
  • This paper is for the design and implementation of SGML document presentation system to format SGML document based on DSSSL. ISO/lEC proposed DSSSL as the technological standard for formatting and transforming SGML document. This paper shows the design of this system in accordance with the model defined in DSSSL. This system, which is able to provide Korean, has parsing function of arbitrary DTDs, SGML documents and DSSSL style sheets, and contains a formatter that can manage various specifications, such as graphs, lists, pictures, as well as text. This will satisfy a user's intent for exchange of SGML documents that involve format information between different types, greatly contribute to make up SGML document processing environment.

  • PDF

An efficient compression method of metadata using BiM (BiM을 이용한 메타데이터의 효율적인 부호화 방법)

  • 양승준;남제호;김영태;강경옥
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2001.11b
    • /
    • pp.199-202
    • /
    • 2001
  • ISO/IEC 15938-1(MPEG-7 Systems)에서는 멀티미디어 컨텐츠에 대한 메타데이터의 효율적인 전송과 저장을 위한 이진 표현 방법인 BiM(binary format for MPEC-7)을 제공한다. 멀티미디어 컨텐츠를 기술(description)하는 메타데이터의 텍스트 표현은 대체로 많은 저장 용량과 전송 리소스를 요구하기 때문에 효율적인 압축을 위해서는 이진 형식으로의 변환이 요구된다. 또한 텍스트 형식은 방송 환경과 같은 스트리밍 전송에는 적절하지 못한 단점이 있다. BiM은 컨텐츠에 대한 기술을 전체 또는 2개 이상의 AU(access units) 단위로 분할하며 부호화하는 방법을 지원함으로써 스트리밍 전송을 가능하게 한다. 이러한 구조는 이진 포맷 형태로 표현되는 헤더를 가지는 패킷 기반 형태이며, 융통성이 있는 전송 순서를 제공한다. 또한, 비트 스트림의 전체를 해석(parsing)하지 않고 랜덤 엑세스 기능을 제공하는 장점이 있다. BiM이 지닌 이러한 장점들로 인하여 현재 방송산업계를 중심으로 메타데이터를 방송에 활용하기 위한 기술을 표준화하는 국제 민간 표준화 기구인 TV-Anytime 포럼에서는 방송 컨텐츠에 대한 메타데이터의 압축에 관한 요구사항을 만족하는 하나의 방법으로 BiM을 고려하고 있다 본 논문에서는 이러한 MPEG-7 시스템의 BiM을 소개하고, 이를 이용하여 TV-Anytime 포럼의 메타데이터를 이진 포맷으로 부호화한 실험과 그 결과를 기술한다.

  • PDF

Development of Japanese to Korean Machine Translation System ATOM Using Personal Computer II - Syntactic/Semantic Analysis and Generation Process - (PC를 이용한 일$\cdot$한 번역 시스템 ATOM의 개발에 관한 연구 ( II ) - 구문해석과 생성과 정을 중심으로 -)

  • Kim, Young-Sum;Kim, Han-Woo;Choi, Byung-Uk
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.10
    • /
    • pp.1193-1201
    • /
    • 1988
  • In this paper, we describe the syntactic and semantic parsing methods which use the case frames. The case structures based on obligatory cases of verbs. And, we use a small set of partial-garammar rules based on simple sentence to represent such case structures. Also, we enhance the efficiency by constructing independent procedure for particle classification and ambiguity resolution of major particle considering the importance of Japanese particle process in the generation. And we construct the generation table considering the combination possibility between the verbs and auxiliary verbs for processing the termination phrase. Therefore we can generate more natural translated sentence according to unique decision with information of syntactic analysis and simplify the generating process.

  • PDF

A data management system for microbial genome projects

  • Ki-Bong Kim;Hyeweon Nam;Hwajung Seo and Kiejung Park
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.83-85
    • /
    • 2000
  • A lot of microbial genome sequencing projects is being done in many genome centers around the world, since the first genome, Haemophilus influenzae, was sequenced in 1995. The deluge of microbial genome sequence data demands new and highly automatic data flow system in order for genome researchers to manage and analyze their own bulky sequence data from low-level to high-level. In such an aspect, we developed the automatic data management system for microbial genome projects, which consists mainly of local database, analysis programs, and user-friendly interface. We designed and implemented the local database for large-scale sequencing projects, which makes systematic and consistent data management and retrieval possible and is tightly coupled with analysis programs and web-based user interface, That is, parsing and storage of the results of analysis programs in local database is possible and user can retrieve the data in any level of data process by means of web-based graphical user interface. Contig assembly, homology search, and ORF prediction, which are essential in genome projects, make analysis programs in our system. All but Contig assembly program are open as public domain. These programs are connected with each other by means of a lot of utility programs. As a result, this system will maximize the efficiency in cost and time in genome research.

  • PDF

The Design and Implementation of Validation RDF Authoring Tool (Validation RDF 저작 툴의 설계 및 구현)

  • Cho, Sung-Hoon;Lee, Moo-Hoon;Son, Duk-Ju;Cho, Hyun-Kyu;Song, Byung-Ryul;Lee, Chan-Sub;Choi, Eui-In
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10e
    • /
    • pp.208-210
    • /
    • 2002
  • 웹의 이용이 보편화되면서 웹 상의 자원(Resource) 또한 급격히 증가하고 있다. 이러한 상황에서 웹 자원을 효율적으로 기술할 수 있는 RDF(Resource Description Framework)는 웹 자원을 체계적으로 관리할 수 있는 지평을 열어주었다. 그러나 현재 RDF문서의 저작환경은 저차원적인 텍스트 편집 수준으로 파싱(parsing)과 유효성 검사(validation checking)를 지원하지 못하고 있는 실정이다. 따라서 본 논문에서는 저작환경을 개선함과 동시에 RDF 및 XML 문서에 대한 파싱 및 유효성 검사를 효율적으로 수행하고, RDF로 기술된 웹 자원을 n-triple 형식으로 생성할 수 있는 RDF 저작 툴을 설계 및 구현하여 유효성이 보장된 웹 자원을 효율적으로 생성 및 처리할 수 있도록 하였다.

  • PDF

Zero-anaphora resolution in Korean based on deep language representation model: BERT

  • Kim, Youngtae;Ra, Dongyul;Lim, Soojong
    • ETRI Journal
    • /
    • v.43 no.2
    • /
    • pp.299-312
    • /
    • 2021
  • It is necessary to achieve high performance in the task of zero anaphora resolution (ZAR) for completely understanding the texts in Korean, Japanese, Chinese, and various other languages. Deep-learning-based models are being employed for building ZAR systems, owing to the success of deep learning in the recent years. However, the objective of building a high-quality ZAR system is far from being achieved even using these models. To enhance the current ZAR techniques, we fine-tuned a pretrained bidirectional encoder representations from transformers (BERT). Notably, BERT is a general language representation model that enables systems to utilize deep bidirectional contextual information in a natural language text. It extensively exploits the attention mechanism based upon the sequence-transduction model Transformer. In our model, classification is simultaneously performed for all the words in the input word sequence to decide whether each word can be an antecedent. We seek end-to-end learning by disallowing any use of hand-crafted or dependency-parsing features. Experimental results show that compared with other models, our approach can significantly improve the performance of ZAR.

A Study on the Application of Knowledge-based Service in Procurement Engineering (구매엔지니어링을 위한 지식기반 서비스 적용 방안에 관한 연구)

  • Kim, Jinil;Cha, Jaemin;Shin, Joonguk;Yeum, Choongseup
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.14 no.2
    • /
    • pp.67-72
    • /
    • 2018
  • In the EPC(Engineering Procurement and Construction) project of the plant, procurement engineering has a profound effect on the profitability of the project. It is important that the procurement specifications are well written to ensure that procurement engineering works properly. In the meantime, the procurement specifications have been created by the experience of the person in charge because there was no system for helping procurement engineering. To cope with this situation, we are developing a procurement engineering management support system (PeMSS). This paper describes how to implement a knowledge-based service in the procurement engineering management support system. First, we briefly introduce the PeMSS, the knowledge base application field, and how to apply it. The parts that requires knowledge-based service are parsing the requirements in the PDF (Portable Document Format) file and management of the document provided by the supplier of the equipment.

Deep Window Detection in Street Scenes

  • Ma, Wenguang;Ma, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.855-870
    • /
    • 2020
  • Windows are key components of building facades. Detecting windows, crucial to 3D semantic reconstruction and scene parsing, is a challenging task in computer vision. Early methods try to solve window detection by using hand-crafted features and traditional classifiers. However, these methods are unable to handle the diversity of window instances in real scenes and suffer from heavy computational costs. Recently, convolutional neural networks based object detection algorithms attract much attention due to their good performances. Unfortunately, directly training them for challenging window detection cannot achieve satisfying results. In this paper, we propose an approach for window detection. It involves an improved Faster R-CNN architecture for window detection, featuring in a window region proposal network, an RoI feature fusion and a context enhancement module. Besides, a post optimization process is designed by the regular distribution of windows to refine detection results obtained by the improved deep architecture. Furthermore, we present a newly collected dataset which is the largest one for window detection in real street scenes to date. Experimental results on both existing datasets and the new dataset show that the proposed method has outstanding performance.