• Title/Summary/Keyword: 자동태깅

Search Result 108, Processing Time 0.023 seconds

A Study on the Integration of Recognition Technology for Scientific Core Entities (과학기술 핵심개체 인식기술 통합에 관한 연구)

  • Choi, Yun-Soo;Jeong, Chang-Hoo;Cho, Hyun-Yang
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.1
    • /
    • pp.89-104
    • /
    • 2011
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

Design and Application of XTML Script Language based on XML (XML을 이용한 스크립트 언어 XTML 의 설계 및 응용)

  • Jeong, Byeong-Hui;Park, Jin-U;Lee, Su-Yeon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.6
    • /
    • pp.816-833
    • /
    • 1999
  • 스타일 정보를 중심으로 하는 기존의 워드 프로세서의 출력 문서들을 차세대 인터넷 문서인 XML문서방식에 따라서 표기하고 또한 제목, 초록, 장 및 단락 등과 같은 논리적인 구조를 반영할 수 있도록 구조화함으로써 문서들의 상호교환뿐만 아니라 인터넷에서 유효하게 사용할 수가 있다. 본 논문에서는 스타일 또는 표현 속성 중심으로 하는 다양한 문서의 평면 구조를 XML의 계층적인 논리적인 구조로, 또한 다양한 DTD(Document Type Definition)환경하에서 변경시킬 수가 있는 변환 스크립트 언어를 표현할 수 있도록 하기 위하여 XTML(XML Transformation Markup Language)을 DTD형식으로 정의하고 이를 이용하여 변환 스크립트를 작성하였으며 자동태깅에 적용하여 보았다.XTML은 그 인스턴스에 해당하는 변환 알고리즘의 효과적인 수행을 위하여 즉 기존의 XML문서를 효과적으로 다루기 위하여 문서를 GROVE라는 트리 구조로 만들어 저장하고 또한 이를 조작할 수 있는 기능 및 다양한 명령어 인터페이스를 제공하였다. Abstract Output documents of existing word processors based on style informations or presentation attributes can be structured by converting them into XML(Extensible Markup Language) documents based on hierarchically logical structures such as title, abstract, chapter and so on. If so, it can be very useful to interchange and manipulate documents under Internet environment. The conversion need the complicate process calling auto-tagging by which elements of output documents can be inferred from style informations and sequences of text etc, and which is different from various kinds of simple conversion.In this paper, we defined XTML(XML Transformation Markup Language) of DTD(Document Type Definition) form and also defined the script language as instances of its DTD for the auto-tagging. XTML and its DTD are represented in XML syntax.Especially XTML includes various functions and commands to generate tree structure named as "GROVE" and also to process, store and manipulate the GROVE in order to process efficiently XML documents.documents.

Noun Sense Disambiguation Based-on Corpus and Conceptual Information (말뭉치와 개념정보를 이용한 명사 중의성 해소 방법)

  • 이휘봉;허남원;문경희;이종혁
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.2
    • /
    • pp.1-10
    • /
    • 1999
  • This paper proposes a noun sense disambiguation method based-on corpus and conceptual information. Previous research has restricted the use of linguistic knowledge to the lexical level. Since knowledge extracted from corpus is stored in words themselves, the methods requires a large amount of space for the knowledge with low recall rate. On the contrary, we resolve noun sense ambiguity by using concept co-occurrence information extracted from an automatically sense-tagged corpus. In one experimental evaluation it achieved, on average, a precision of 82.4%, which is an improvement of the baseline by 14.6%. considering that the test corpus is completely irrelevant to the learning corpus, this is a promising result.

  • PDF

Design and Implementation of Tag Clustering System for Efficient Image Retrieval in Web2.0 Environment (Web2.0 환경에서의 효율적인 이미지 검색을 위한 태그 클러스터링 시스템의 설계 및 구현)

  • Lee, Si-Hwa;Lee, Man-Hyoung;Hwang, Dae-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.8
    • /
    • pp.1169-1178
    • /
    • 2008
  • Most of information in Web2.0 is constructed by users and can be classified by tags which are also constructed and added by users. However, as we known, referring by the related works such as automatic tagging techniques and tag cloud's construction techniques, the research to be classified information and resources by tags effectively is to be given users which is still up to the mark. In this paper, we propose and implement a clustering system that does mapping each other according to relationships of the resource's tags collected from Web and then makes the mapping result into clusters to retrieve images. Tn addition, we analyze our system's efficiency by comparing our proposed system's image retrieval result with the image retrieval results searched by Flickr website.

  • PDF

KTARSQI: The Annotation of Temporal and Event Expressions in Korean Text (KTARSQI: 한국어 텍스트의 시간 및 사건 표현 주석)

  • Im, Seohyun;Kim, Yoon-Shin;Jo, Yoomi;Jang, Hayun;Ko, Minsoo;Nam, Seungho;Shin, Hyopil
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.130-135
    • /
    • 2009
  • 정보추출(information extraction), 질의-응답 시스템(Question-Answering system) 등의 자연언어처리 응용분야에서 시간과 사건에 관련한 정보를 추출하는 것은 중요한 부분이다. 그럼에도 불구하고, 한국어의 자연언어처리 응용분야에서는 아직까지 이 연구가 본격화되지 않았다. 미국 TARSQI 프로젝트의 연구결과를 바탕으로 하여 한국어 텍스트에서 시간 및 사건 표현의 주석, 추출, 추론을 위한 명세 언어(KTimeML), 주석 말뭉치(KTimeBank), 자동 태깅 시스템(KTarsqi Toolkit: KTTK)의 개발을 목표로 2008년 KTARSQI 프로젝트가 시작되었다. 이 논문에서는 KTARSQI 프로젝트의 목표와 과제에 대한 전반적인 소개와 함께, 현재까지 진행된 작업의 결과로서 사건 태그의 명세와 주석에 관한 논의를 덧붙인다.

  • PDF

Unsupervised Semantic Role Labeling for Korean Adverbial Case (비지도 학습을 기반으로 한 한국어 부사격의 의미역 결정)

  • Kim, Byoung-Soo;Lee, Yong-Hun;Na, Seung-Hoon;Kim, Jun-Gi;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.32-39
    • /
    • 2006
  • 본 논문은 한국어정보처리 과정에서 구문 관계를 의미 관계로 사상하는 의미역 결정 문제에 대해 다루고 있다. 한국어의 경우 대량의 학습 말뭉치를 구하기 힘들며, 이를 구축하기 위해서는 많은 시간과 노력이 필요한 문제점이 있다. 따라서 본 논문에서는 학습 말뭉치를 직접 태깅하지 않고 격틀사전을 이용하여 자동으로 학습 말뭉치를 구축하고 간단한 확률모델을 적용하여 점진적으로 모델을 학습하는 수정된 self-training 알고리즘을 사용하였다. 실험 결과, 4개의 부사격 조사에 대해 평균적으로 81.81%의 정확률을 보였으며, 수정된 self-training 방법은 기존의 방법에 비해 성능 및 실행시간에서 개선된 결과를 보였다.

  • PDF

Coreference Resolution for Korean using Mention Pair with SVM (SVM 기반의 멘션 페어 모델을 이용한 한국어 상호참조해결)

  • Choi, Kyoung-Ho;Park, Cheon-Eum;Lee, Changki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.4
    • /
    • pp.333-337
    • /
    • 2015
  • In this paper, we suggest a Coreference Resolution system for Korean using Mention Pair with SVM. The system introduced in this paper, also be able to extract Mention from document which is including automatically tagged name entity information, dependency trees and POS tags. We also built a corpus, including 214 documents with Coreference tags, referencing online news and Wikipedia for training the system and testing the system's performance. The corpus had 14 documents from online news, along with 200 question-and-answer documents from Wikipedia. When we tested the system by corpus, the performance of the system was extracted by MUC-F1 55.68%, B-cube-F1 57.19%, and CEAFE-F1 61.75%.

Glomerular Detection for Diagnosis of Lupus Nephritis using Deep Learning (딥러닝을 활용한 루푸스 신염 진단을 위한 생검 조직 내 사구체 검출)

  • Jung, Jehyun;Ha, Sukmin;Lim, Jongwoo;Kim, Hyunsung;Park, Hosub;Myung, Jaekyung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.85-87
    • /
    • 2022
  • 루푸스 신염을 정확히 진단하기 위해서는 신장의 침 생검을 통한 조직검사를 통해 사구체들을 찾아내고, 각각의 염증 정도를 분류해야 한다. 하지만 이에는 의료진의 많은 시간과 노력이 소요된다. 따라서 본 연구에서는 이러한 한계를 극복하기 위해 합성곱 신경망 (Convolutional neural network, CNN)에 기반한 검출 및 분할에 딥 러닝 접근법을 적용하는 YOLOv5 알고리즘을 통해 검체 이미지 내에서 사구체를 자동으로 검출해 내도록 하였다. 그리고 루푸스 신염 환자의 슬라이드 이미지에 대한 태깅 작업을 거쳐 학습을 위한 데이터와 테스트를 위한 데이터를 생성하여 학습 및 테스트에 활용하였다. 그 결과 고화질의 검체 이미지 내에서 대부분의 사구체를 0.9 이상의 높은 precision과 recall로 검출해 낼 수 있었다. 이를 통해 신장 내부의 사구체 검출을 자동화하고 추후 연구를 통해 사구체 염증 정도를 단계화 할 수 있는 발판을 마련하였다.

  • PDF

PPEditor: Semi-Automatic Annotation Tool for Korean Dependency Structure (PPEditor: 한국어 의존구조 부착을 위한 반자동 말뭉치 구축 도구)

  • Kim Jae-Hoon;Park Eun-Jin
    • The KIPS Transactions:PartB
    • /
    • v.13B no.1 s.104
    • /
    • pp.63-70
    • /
    • 2006
  • In general, a corpus contains lots of linguistic information and is widely used in the field of natural language processing and computational linguistics. The creation of such the corpus, however, is an expensive, labor-intensive and time-consuming work. To alleviate this problem, annotation tools to build corpora with much linguistic information is indispensable. In this paper, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. The most ideal way is to fully automatically create the corpus without annotators' interventions, but as a matter of fact, it is impossible. The proposed tool is semi-automatic like most other annotation tools and is designed to edit errors, which are generated by basic analyzers like part-of-speech tagger and (partial) parser. We also design it to avoid repetitive works while editing the errors and to use it easily and friendly. Using the proposed annotation tool, 10,000 Korean sentences containing over 20 words are annotated with dependency structures. For 2 months, eight annotators have worked every 4 hours a day. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.

A Study on the Integration of Information Extraction Technology for Detecting Scientific Core Entities based on Large Resources (대용량 자원 기반 과학기술 핵심개체 탐지를 위한 정보추출기술 통합에 관한 연구)

  • Choi, Yun-Soo;Cheong, Chang-Hoo;Choi, Sung-Pil;You, Beom-Jong;Kim, Jae-Hoon
    • Journal of Information Management
    • /
    • v.40 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In this study, we define scientific as a set of 10 types of named entities and technical terminologies in a biomedical domain. in order to automatically extract these entities from scientific documents at once, we develop a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer, co-reference resolver and terminology extractor. Each module of the integrated system has been evaluated with various corpus as well as KEEC 2009. The system will be utilized for various information service areas such as information retrieval, question-answering(Q&A), document indexing, dictionary construction, and so on.