• Title/Summary/Keyword: 온톨로지 추출

Search Result 200, Processing Time 0.023 seconds

A study on the nation images of the big three exporting countries in East Asia shown in Wikipedia English-Edition (영어 위키피디아 페이지뷰를 통한 한중일 국가 인지도 비교)

  • Lee, Youngwhan;Chun, Heuiju;Sawng, Youngwha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1071-1085
    • /
    • 2015
  • The researchers attempted to develop a way to extract a near real-time online nation image using social media. Referring to previous studies about nation images and the categories defined in Wikipedia, an ontology considering the characteristics of nation image was constructed. Separately, data sets from various social media were compared and the click view of Wikipedia English-edition was selected. The ontology was applied to the recent six years of the data extracted of the three big exporting countries of the east Asia, China, Japan, and Korea. To compare the nation images, correspondence analysis was employed to show images in the area of politics, society, culture, and economy. The nation images extracted are indeed the reasonable representation of them. The researchers verified them to a few known government policies and confirmed that it could be used to help government officers to make foreign policies to boost nation's export and to employ as a key performance index for them.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

SPARQL-DL Processor to Extract OWL Ontologies from Relational Databases (관계형 데이터베이스로부터 OWL 온톨로지를 추출하기 위한 SPARQL-DL 프로세서)

  • Choi, Ji-Woong;Kim, Myung-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.3
    • /
    • pp.29-45
    • /
    • 2015
  • This paper proposes an implementation of SPARQL-DL, which is a query language for OWL ontologies, for query-answering over the OWL ontologies virtually generated from existing RDBs. The proposed SPARQL-DL processor internally translates input SPARQL-DL queries into SQL queries and then executes the translated queries. There are two advantages in the query processing method. First, another repository to store OWL ontologies generated from RDBs is not required. Second, a large ABox generated from an RDB instance is able to be served without using Tableau algorithm based reasoners which have a problem in large ABox reasoning. Our algorithm for query rewriting is designed to create one corresponding SQL query from one input SPARQL-DL query to minimize the overhead by establishing connections with RDBs.

Effective Indexing for Evolving Data Collection by Using Ontology (온톨로지를 이용한 변화하는 데이터의 효과적인 인덱싱 방법)

  • Kim, Jong Wook;Bae, Myung Soo
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.2
    • /
    • pp.240-247
    • /
    • 2014
  • Data which is created and shared on the Web is characterized by the massive amount of user generated content on various applications and dynamically evolving content on the basis of user interests. Thus, in order to benefit from Web data, it is essential to provide (a) the mechanisms which enable scalable processing of large data collections and (b) the organization schemes which reduce the navigational overhead within complex and dynamically growing content. Between these two impending needs, in this paper, we are interested in developing an indexing scheme which aims to reduce the time and effort needed to access the relevant piece of information by leveraging ontologies. In particular, considering evolving nature of Web contents, the proposed technique in this paper computes the sub-ontology, which best matches a given data collection, from the existing large size of ontology. Case studies show that the proposed indexing scheme in this paper indeed helps organize dynamically evolving content.

A Model-Driven Approach for Converting UML Model to OWL-S Ontology (UML 모델을 OWL-S 온톨로지로 변환하기 위한 모델지향접근방식)

  • Kim, Il-Woong;Lee, Kyong-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.3
    • /
    • pp.179-192
    • /
    • 2007
  • Based on ontologies, semantic Web services enable the discovery, selection, and composition. OWL-S is a do facto standard ontology for describing semantics of Web services. Due to the difficulty of the OWL-S grammar, the teaming curve for constructing OWL-S description manually can be steep. This paper presents an efficient method for generating OWL-S descriptions from UML diagrams, which are widely used for software design and development. The proposed method is based on UML profiles to generate an OWL-S ontology from sequence or activity diagrams, which represent the behavior of a business process. Specifically, an XMI file extracted from UML diagrams is transformed into an OWL-S description via an XSLT script. Experimental results with a large volume of UML diagrams show that the proposed method deals with the control flow of complex processes and is superior to previous methods.

Design of Knowledge Model of Nursing Diagnosis based on Ontology (온톨로지에 기반한 간호진단 지식모델의 설계)

  • Lee, In-Keun;Kim, Hwa-Sun;Lee, Sung-Hee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.4
    • /
    • pp.468-475
    • /
    • 2012
  • Nurses have performed their nursing practice according to the standard guidelines such as NANDA, NIC, and NOC, and recorded the information on nursing process into EMR system. In particular, NANDA, nursing diagnosis taxonomy, has difficulty expressing nursing diagnosis in detail because it represents abstract concepts of nursing diagnosis. So, the hospitals in KOREA have developed and used the list of nursing diagnosis on their own without referring the international standard terminologies, and it caused the delay of computerization of nursing records. Therefore, we proposed a ontology development methodology on nursing diagnosis based on NANDA and SNOMED-CT. The developed ontology, systematically developed with the frequently used nursing diagnosis terminologies in each hospital, based on the proposed methodology enables knowledge expansion and interoperable exchange of nursing records between EMR systems. We developed an ontology using the 112 nursing diagnosis terms defined by extracting and refining information on nursing diagnosis recorded in Kyungpook National University Hospital. We also confirmed the content validity and the usefulness of the developed ontology through expert assessment and experiment.

Semantic Video Retrieval Based On User Preference (사용자 선호도를 고려한 의미기반 비디오 검색)

  • Jung, Min-Young;Park, Sung-Han
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.4
    • /
    • pp.127-133
    • /
    • 2009
  • To ensure access to rapidly growing video collection, video indexing is becoming more and more essential. A database for video should be build for fast searching and extracting the accurate features of video information with more complex characteristics. Moreover, video indexing structure supports efficient retrieval of interesting contents to reflect user preferences. In this paper, we propose semantic video retrieval method based on user preference. Unlikely the previous methods do not consider user preferences. Futhermore, the conventional methods show the result as simple text matching for the user's query that does not supports the semantic search. To overcome these limitations, we develop a method for user preference analysis and present a method of video ontology construction for semantic retrieval. The simulation results show that the proposed algorithm performs better than previous methods in terms of semantic video retrieval based on user preferences.

Construction of the Digital Archive System from the Records of Westerners Who Stayed in Korea during the Enlightenment Period of Chosun (개화기 조선 체류 서양인 기록물의 디지털 아카이브 시스템 구축)

  • Chung, Heesun;Kim, Heesoon;Song, Hyun-Sook;Lee, Myeong-Hee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.27 no.4
    • /
    • pp.229-249
    • /
    • 2016
  • This study was conducted to create a digital archive for local cultural contents compiled from the records of westerners who stayed in Korea during the Enlightenment Period of Chosun. The compiled information were gathered from 22 records, and 10 main subjects, 40 sub-subjects and 239 mini-subjects were derived through the subject classification scheme. Item analysis was conducted through 38 metadata and input data types were classified and databased in Excel. Finally, a web-based digital archiving system was developed for searching and providing information through various access points. Suggestions for future research were made to expand archive contents through continuous excavation of westerners' records, to build an integrated information system of Korean digital archives incorporating individual archive systems, to develop standardization of classification schemes and a multidimensional classification system considering facet structure in cultural heritage areas, to keep consistency of contents through standardization of metadata format, and to build ontology using semantic search functions and data mining functions.

Traveal Information Retrieval System based on Bridge XMDR (브리지 XMDR 기반의 여행정보 검색 시스템)

  • Kim Ik-Han;Kook Yoon-Kyu;Eum Young-Hyun;Jung Kye-Dong;Choi Young-Keun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06c
    • /
    • pp.103-105
    • /
    • 2006
  • 최근 기업들은 분산된 조직과 각 조직의 목적에 따라 데이터베이스도 분산되어 있기 때문에 이들 간의 공유 및 협업을 통한 상호 운용성을 지원하기 어려우므로 일관적인 형태로 연동하기 위해서 메타데이터 수준의 표준이 필요하다. 또한 협업적인 거래환경에서의 EAI시스템은 다양한 정보 시스템에서 관리되는 지식들을 유기적으로 통합하고 공유함으로서 효율적인 검색 및 비용절감 등 많은 효과를 기대할 수 있다. 그러나 기존의 시스템은 특정 목적에 따라 관리되고 공유되므로 사실상 통합 외 공유에는 상당한 어려움이 있다. 따라서 본 논문에서 제시하는 XMDR은 온톨로지와 메타데이터 결합된 형태로 각종 표준들을 일관적인 형태로 온톨로지와 시소러스 개념을 도입함으로서 데이터수준의 정보를 통합 하기위한 메타데이터 공유 및 정보 시스템 통합의 일관성을 유지 할 수 있다. 본 논문에서 제시되는 브리지 XMDR 검색시스템은 원시데이터 계층, XMDR 계층. 브리지 XMDR 계층으로 3계층으로 구성된다. XMDR 계층은 분산된 데이터베이스의 속성표현의 표준과 관계성을 정의한 표준 온톨로지, 카테고리 분류 온틀로지, 사이트의 정보를 제공하는 로케이션 온톨로지로 구성되는 XMDR을 정의한다. 브리지 XMDR 계층은 XMDR간의 정보를 공유하기 위한 공유 도메인 속성을 추출한 하이브리드 통합방식으로 업무간의 의미적 통합이 가능하다.

  • PDF

Livestock Telemedicine System Prediction Model for Human Healthy Life (인간의 건강한 삶을 위한 가축원격 진료 예측 모델)

  • Kang, Yun-Jeong;Lee, Kwang-Jae;Choi, Dong-Oun
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.335-343
    • /
    • 2019
  • Healthy living is an essential element of human happiness. Quality eating provides the basis for life, and the health of livestock, which provides meat and dairy products, has a direct impact on human health. In the case of calves, diarrhea is the cause of all diseases.In this paper, we use a sensor to measure calf 's biometric data to diagnose calf diarrhea. The collected biometric data is subjected to a preprocessing process for use as meaningful information. We measure calf birth history and calf biometrics. The ontology is constructed by inputting environmental information of housing and biochemistry, immunity, and measurement information of human body for disease management. We will build a knowledge base for predicting calf diarrhea by predicting calf diarrhea through logical reasoning. Predict diarrhea with the knowledge base on the name of the disease, cause, timing and symptoms of livestock diseases. These knowledge bases can be expressed as domain ontologies for parent ontology and prediction, and as a result, treatment and prevention methods can be suggested.