• Title/Summary/Keyword: 자연어

Search Result 1,197, Processing Time 0.026 seconds

Collision Cause-Providing Ratio Prediction Model Using Natural Language Processing Analytics (자연어 처리 기법을 활용한 충돌사고 원인 제공 비율 예측 모델 개발)

  • Ik-Hyun Youn;Hyeinn Park;Chang-Hee, Lee
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.30 no.1
    • /
    • pp.82-88
    • /
    • 2024
  • As the modern maritime industry rapidly progresses through technological advancements, data processing technology is emphasized as a key driver of this development. Natural language processing is a technology that enables machines to understand and process human language. Through this methodology, we aim to develop a model that predicts the proportions of outcomes when entering new written judgments by analyzing the rulings of the Marine Safety Tribunal and learning the cause-providing ratios of previously adjudicated ship collisions. The model calculated the cause-providing ratios of the accident using the navigation applied at the time of the accident and the weight of key keywords that affect the cause-providing ratios. Through this, the accuracy of the developed model could be analyzed, the practical applicability of the model could be reviewed, and it could be used to prevent the recurrence of collisions and resolve disputes between parties involved in marine accidents.

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

Biomarker Detection of Specific Disease using Word Embedding (단어 표현에 기반한 연관 바이오마커 발굴)

  • Youn, Young-Shin;Kim, Yu-Seop
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.317-320
    • /
    • 2016
  • 기계학습 기반의 자연어처리 모듈에서 중요한 단계 중 하나는 모듈의 입력으로 단어를 표현하는 것이다. 벡터의 사이즈가 크고, 단어 간의 유사성의 개념이 존재하지 않는 One-hot 형태와 대조적으로 유사성을 표현하기 위해서 단어를 벡터로 표현하는 단어 표현 (word representation/embedding) 생성 작업은 자연어 처리 작업의 기계학습 모델의 성능을 개선하고, 몇몇 자연어 처리 분야의 모델에서 성능 향상을 보여 주어 많은 관심을 받고 있다. 본 논문에서는 Word2Vec, CCA, 그리고 GloVe를 사용하여 106,552개의 PubMed의 바이오메디컬 논문의 요약으로 구축된 말뭉치 카테고리의 각 단어 표현 모델의 카테고리 분류 능력을 확인한다. 세부적으로 나눈 카테고리에는 질병의 이름, 질병 증상, 그리고 난소암 마커가 있다. 분류 능력을 확인하기 위해 t-SNE를 이용하여 2차원으로 단어 표현 결과를 맵핑하여 가시화 한다. 2차원으로 맵핑된 결과 값을 코사인 유사도를 사용하여 질병과 바이오 마커간의 유사도를 구한다. 이 유사도 결과 값 상위 20쌍의 결과를 가지고 실제 연구가 되고 있는지 구글 스콜라를 통해 관련 논문을 검색하여 확인하고, 검색 결과를 점수화 한다. 실험 결과 상위 20쌍 중에서 85%의 쌍이 실제적으로 질병과 바이오 마커 간의 관계를 파악하는 방향으로 진행 되고 있으나, 나머지 15%의 쌍에 대해서는 실질적인 연구가 잘 되고 있지 않은 것으로 파악되었다.

  • PDF

Knowledge Based Question Answering System Using Fuzzy Logic (지식 기반형 fuzzy 질의 응답 시스템)

  • 이현주;오경환
    • Korean Journal of Cognitive Science
    • /
    • v.2 no.2
    • /
    • pp.309-339
    • /
    • 1990
  • The most common way that people communicate is by speaking or writing natural languages.But if people use computers in the modern technology,they should learn artificial programming languages.If computers could understand what people mean when people speak or type natural languages,people would use the computers more easily and naturally.but there is a problem.The language which people use has vagueness.For example,the convential computer system cant's handle the subjective feeling like 'tall' or 'young'.So peole must specify the exact threshold like 'more'than 25 ages'.We have developed the knowledge-based natural language question answering system which can handle sentences having fuzzy concepts by using blackboard model.Our goal of this research is to develop a portable question answering system as interface for database systems or understanding systems.

Automatic Correction of Errors in Annotated Corpus Using Kernel Ripple-Down Rules (커널 Ripple-Down Rule을 이용한 태깅 말뭉치 오류 자동 수정)

  • Park, Tae-Ho;Cha, Jeong-Won
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.636-644
    • /
    • 2016
  • Annotated Corpus is important to understand natural language using machine learning method. In this paper, we propose a new method to automate error reduction of annotated corpora. We use the Ripple-Down Rules(RDR) for reducing errors and Kernel to extend RDR for NLP. We applied our system to the Korean Wikipedia and blog corpus errors to find the annotated corpora error type. Experimental results with various views from the Korean Wikipedia and blog are reported to evaluate the effectiveness and efficiency of our proposed approach. The proposed approach can be used to reduce errors of large corpora.

Efficient Classification of User's Natural Language Question Types using Word Semantic Information (단어 의미 정보를 활용하는 이용자 자연어 질의 유형의 효율적 분류)

  • Yoon, Sung-Hee;Paek, Seon-Uck
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.4 s.54
    • /
    • pp.251-263
    • /
    • 2004
  • For question-answering system, question analysis module finds the question points from user's natural language questions, classifies the question types, and extracts some useful information for answer. This paper proposes a question type classifying technique based on focus words extracted from questions and word semantic information, instead of complicated rules or huge knowledge resources. It also shows how to find the question type without focus words, and how useful the synonym or postfix information to enhance the performance of classifying module.

Natural Language Interface for Composite Web Services (복합 웹 서비스를 위한 자연어 인터페이스)

  • Lim, Jong-Hyun;Lee, Kyong-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.2
    • /
    • pp.144-156
    • /
    • 2010
  • With the wide spread of Web services in various fields, there is a growing interest in building a composite Web service, however, it is very difficult for ordinary users to specify how to compose services. Therefore, a convenient interface for generating and invoking composite Web services are required. This paper proposes a natural language interface to invoke services. The proposed interface provides a way to describe users' requests for composite Web Services in a natural language. A user with no technical knowledge about Web services can describe requests for composite Web services through the proposed interface. The proposed method extracts a complex workflow and finds appropriate Web services from the requests. Experimental results show that the proposed method extracts a sophisticated workflow from complex sentences with many phrases and control constructs.

Discriminator of Similar Documents Using Syntactic and Semantic Analysis (구문의미분석를 이용한 유사문서 판별기)

  • Kang, Won-Seog;Hwang, Do-Sam;Kim, Jung H.
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.3
    • /
    • pp.40-51
    • /
    • 2014
  • Owing to importance of document copyright the need to detect document duplication and plagiarism is increasing. Many studies have sought to meet such need, but there are difficulties in document duplication detection due to technological limitations with the processing of natural language. This thesis designs and implements a discriminator of similar documents with natural language processing technique. This system discriminates similar documents using morphological analysis, syntactic analysis, and weight on low frequency and idiom. To evaluate the system, we analyze the correlation between human discrimination and term-based discrimination, and between human discrimination and proposed discrimination. This analysis shows that the proposed discrimination needs improving. Future research should work to define the document type and improve the processing technique appropriate for each type.

A Study on the Natural Language Generation by Machine Translation (영한 기계번역의 자연어 생성 연구)

  • Hong Sung-Ryong
    • Journal of Digital Contents Society
    • /
    • v.6 no.1
    • /
    • pp.89-94
    • /
    • 2005
  • In machine translation the goal of natural language generation is to produce an target sentence transmitting the meaning of source sentence by using an parsing tree of source sentence and target expressions. It provides generator with linguistic structures, word mapping, part-of-speech, lexical information. The purpose of this study is to research the Korean Characteristics which could be used for the establishment of an algorism in speech recognition and composite sound. This is a part of realization for the plan of automatic machine translation. The stage of MT is divided into the level of morphemic, semantic analysis and syntactic construction.

  • PDF

Experiments on Pseudo Relevance Feedback in Probabilistic Information Retrieval Model (확률적 정보 검색 모델에서의 유사 적합성 피드백 실험)

  • Cho, Bong-Hyun;Lee, Chang-Kee;An, Joo-Hui;Lee, Gary Geun-Bae
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.183-190
    • /
    • 2001
  • 본 논문은 확률기반 자연어 검색 시스템 POSNIR/E를 이용한 여러 가지 유사 적합성 피드백 방법들이 검색 시스템의 성능 향상에 기여할 수 있는 정도를 보여주고, 확률 기반 정보 검색 시스템에 적합한 유사 적합성 피드백 수행 방법을 제시한다. POSNIR/E는 한국어 자연어 검색 시스템, POSNIR를 기반으로 만들어진 영어 자연어 검색 시스템이다. 이 시스템은 성능 향상을 위한 질의 확장의 방법으로 검색 단계에서 유사 적합성 피드백을 사용한다. 검색 단계에서 영어 태거에 의해 태깅된 사용자 질의로부터 질의어를 추출하고 초기 검색을 수행한다. 유사 적합성 피드백을 위하여 초기 검색 결과 중 상위 5개의 문서에 나타나는 키워드를 중요도에 따라 내림차순 정렬하여 상위 10개의 키워드를 초기 질의어에 확장한다. 이렇게 확장된 질의어로 최종 검색을 수행한다. TREC 평가용 테스트 컬렉션 WT10g와 TREC-9의 질의 적합문서 집합을 이용하여 여러 가지 TSV 함수를 사용하여 검색 성능을 평가 하였다. 실험 결과 유사 적합성 피드백을 사용할 경우 TSV 함수에 확률 모델의 CF 요소 뿐만 아니라 TF 요소 등을 적용 시킬 경우 성능 향상에 기여할 수 있음을 알 수 있었다. 또한 색인어와 검색어로 단일어 뿐만 아니라 복합어도 사용할 경우 성능이 향상됨을 알 수 있다.

  • PDF