• 제목/요약/키워드: domain ontology model

검색결과 78건 처리시간 0.024초

온톨로지 지식 기반 특성치를 활용한 Bidirectional LSTM-CRF 모델의 시퀀스 태깅 성능 향상에 관한 연구 (Improving Bidirectional LSTM-CRF model Of Sequence Tagging by using Ontology knowledge based feature)

  • 진승희;장희원;김우주
    • 지능정보연구
    • /
    • 제24권1호
    • /
    • pp.253-266
    • /
    • 2018
  • 본 연구는 질의 응답(QA) 시스템에서 사용하는 개체명 인식(NER)의 성능을 향상시키기 위하여 시퀀스 태깅 방법론을 적용한 새로운 방법론을 제안한다. 사용자의 질의를 입력 받아 데이터베이스에 저장된 정답을 추출하기 위해서는 사람의 언어를 컴퓨터가 알아들을 수 있도록 구조화 질의어(SQL)와 같은 데이터베이스의 언어로 전환하는 과정이 필요한데, 개체명 인식은 사용자의 질의에서 데이터베이스에 포함된 클래스나 데이터 명을 식별하는 과정이다. 기존의 데이터베이스에서 질의에 포함된 단어를 검색하여 개체명을 인식하는 방식은 동음이의어와 문장성분 구를 문맥을 고려하여 식별하지 못한다. 다수의 검색 결과가 존재하면 그들 모두를 결과로 반환하기 때문에 질의에 대한 해석이 여러 가지가 나올 수 있고, 계산을 위한 시간복잡도가 커진다. 본 연구에서는 이러한 단점을 극복하기 위해 신경망 기반의 방법론을 사용하여 질의가 가지는 문맥적 의미를 반영함으로써 이러한 문제를 해결하고자 했고 신경망 기반의 방법론의 문제점인 학습되지 않은 단어에 대해서도 문맥을 통해 식별을 하고자 하였다. Sequence Tagging 분야에서 최신 기술인 Bidirectional LSTM-CRF 모델을 도입함으로써 신경망 모델이 가진 단점을 해결하였고, 학습되지 않은 단어에 대해서는 온톨로지 기반 특성치를 활용하여 문맥을 반영한 추론을 사용하였다. 음악 도메인의 온톨로지(Ontology) 지식베이스를 대상으로 실험을 진행하고 그 성능을 평가하였다. 본 연구에서 제안한 방법론인 L-Bidirectional LSTM-CRF의 성능을 정확하게 평가하기 위하여 학습에 포함된 단어들뿐만 아니라 학습에 포함되지 않은 단어들도 포함한 질의를 평가에 사용하였다. 그 결과 L-Bidirectional LSTM-CRF 모형을 재학습 시키지 않아도 학습에 포함되지 않은 단어를 포함한 질의에 대한 개체명 인식이 가능함을 확인하였고, 전체적으로 개체명 인식의 성능이 향상됨을 확인할 수 있었다.

미군 사례 고찰을 통한 한국군 데이터 전략 및 공유 데이터 모델 개발방안 제안 (Proposal of the development plan for the ROK military data strategy and shared data model through the US military case study)

  • 이학래;김완주;임재성
    • 한국정보통신학회논문지
    • /
    • 제25권6호
    • /
    • pp.757-765
    • /
    • 2021
  • 2018년 미 국방부 국가안보전략에 포함된 다영역작전 수행을 위해서는 C4I 체계 간의 적시적인 데이터 공유가 선행되어야 가능하다는 것을 인식하고 문제 해결을 위해 노력하고 있다. 한국군도 수 차례 연구를 통해 C4I 체계간 데이터 연동과 표준화에 대한 문제가 제기 되었고, 이를 해결하기 위한 새로운 방안의 수립이 필요한 상황이다. 본 연구에서는 미국방성이 2003년 데이터 전략 수립 후 이를 구현하기 위해 약 20여 년간 추진해온 사례 분석을 통해 문제 해결 방안을 도출하고, 한국군 C4I체계 운용환경에 적합한 데이터 전략 수립, 데이터 모델 개발, 데이터 공유를 위한 표준 선정, 그리고 공유 데이터 개발 절차를 제안함으로써, 한국군 C4I체계간 데이터 공유 능력을 향상시키고자 한다.

Expressed Sequence Tag Analysis of the Erythrocytic Stage of Plasmodium berghei

  • Seok, Ji-Woong;Lee, Yong-Seok;Moon, Eun-Kyung;Lee, Jung-Yub;Jha, Bijay Kumar;Kong, Hyun-Hee;Chung, Dong-Il;Hong, Yeon-Chul
    • Parasites, Hosts and Diseases
    • /
    • 제49권3호
    • /
    • pp.221-228
    • /
    • 2011
  • Rodent malaria parasites, such as Plasmodium berghei, are practical and useful model organisms for human malaria research because of their analogies to the human malaria in terms of structure, physiology, and life cycle. Exploiting the available genetic sequence information, we constructed a cDNA library from the erythrocytic stages of P. berghei and analyzed the expressed sequence tag (EST). A total of 10,040 ESTs were generated and assembled into 2,462 clusters. These EST clusters were compared against public protein databases and 48 putative new transcripts, most of which were hypothetical proteins with unknown function, were identified. Genes encoding ribosomal or membrane proteins and purine nucleotide phosphorylases were highly abundant clusters in P. berghei. Protein domain analyses and the Gene Ontology functional categorization revealed translation/protein folding, metabolism, protein degradation, and multiple family of variant antigens to be mainly prevalent. The presently-collected ESTs and its bioinformatic analysis will be useful resources to identify for drug target and vaccine candidates and validate gene predictions of P. berghei.

온톨로지 기반 법령 검색시스템의 개발: 철도·교통 분야 연구개발사업을 중심으로 (A Development of Ontology-Based Law Retrieval System: Focused on Railroad R&D Projects)

  • 원민재;김동희;정해민;이상근;홍준석;김우주
    • 한국전자거래학회지
    • /
    • 제20권4호
    • /
    • pp.209-225
    • /
    • 2015
  • 철도교통 분야의 연구개발사업은 여러 법령과 긴밀하게 관련되어 있기 때문에, 연구개발을 성공적으로 수행했더라도 법령에 의해 제약되어 연구개발 결과의 실질적인 사업화 또는 실용화를 이루어내지 못하는 사례가 발생하고 있다. 본 논문에서는 이러한 사례를 방지하기 위한 방편으로 철도교통 분야에서 진행되는 연구개발사업과 관련된 법령을 검색할 수 있는 법령검색시스템의 모델을 제시하였다. 사업 내용을 설명하는 연구개발계획서가 시스템에 입력되면 요약서의 내용을 대상으로 형태소 분석을 수행하여 명사들만을 남긴다. 국가법령정보센터에서 제공하는 법령정보공동활용서비스를 사용하여 명사들 중 법령용어를 분류하고, 법령용어와 해당 법령용어를 정의하는 법령과의 관계를 지능형 지식 베이스인 온톨로지에 저장한다. 온톨로지에 저장된 법령들은 본 연구에서 개발한 추가적인 지표 계산과정을 거쳐 연구개발사업과 관련된 정도를 기준으로 순위가 매겨진 후, 시스템 사용자에게 제공된다. 사용자는 연구개발에 영향을 미칠 수 있는 법령을 검색할 수 있게 되어 사업 시작 전에 연구 방향을 결정하는 데 참고하거나, 사업 진행하는 과정에서도 참고자료로 사용할 수 있다. 궁극적으로, 법령에 의해 철도교통 분야 연구개발사업이 실패하거나 실용화되지 못하는 경우를 사전에 방지함으로써, 사업에 투자한 예산에 의해 기대되는 충분한 기술적 경제적 효과를 얻을 수 있게 될 것이다.

Genome-wide survey and expression analysis of F-box genes in wheat

  • Kim, Dae Yeon;Hong, Min Jeong;Seo, Yong Weon
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2017년도 9th Asian Crop Science Association conference
    • /
    • pp.141-141
    • /
    • 2017
  • The ubiquitin-proteasome pathway is the major regulatory mechanism in a number of cellular processes for selective degradation of proteins and involves three steps: (1) ATP dependent activation of ubiquitin by E1 enzyme, (2) transfer of activated ubiquitin to E2 and (3) transfer of ubiquitin to the protein to be degraded by E3 complex. F-box proteins are subunit of SCF complex and involved in specificity for a target substrate to be degraded. F-box proteins regulate many important biological processes such as embryogenesis, floral development, plant growth and development, biotic and abiotic stress, hormonal responses and senescence. However, little is known about the F-box genes in wheat. The draft genome sequence of wheat (IWGSC Reference Sequence v1.0 assembly) used to analysis a genome-wide survey of the F-box gene family in wheat. The Hidden Markov Model (HMM) profiles of F-box (PF00646), F-box-like (PF12937), F-box-like 2 (PF13013), FBA (PF04300), FBA_1 (PF07734), FBA_2 (PF07735), FBA_3 (PF08268) and FBD (PF08387) domains were downloaded from Pfam database were searched against IWGSC Reference Sequence v1.0 assembly. RNA-seq paired-end libraries from different stages of wheat, such as stages of seedling, tillering, booting, day after flowering (DAF) 1, DAF 10, DAF 20, and DAF 30 were conducted and sequenced by Illumina HiSeq2000 for expression analysis of F-box protein genes. Basic analysis including Hisat, HTseq, DEseq, gene ontology analysis and KEGG mapping were conducted for differentially expressed gene analysis and their annotation mappings of DEGs from various stages. About 950 F-box domain proteins identified by Pfam were mapped to wheat reference genome sequence by blastX (e-value < 0.05). Among them, more than 140 putative F-box protein genes were selected by fold changes cut-offs of > 2, significance p-value < 0.01, and FDR<0.01. Expression profiling of selected F-box protein genes were shown by heatmap analysis, and average linkage and squared Euclidean distance of putative 144 F-box protein genes by expression patterns were calculated for clustering analysis. This work may provide valuable and basic information for further investigation of protein degradation mechanism by ubiquitin proteasome system using F-box proteins during wheat development stages.

  • PDF

사물인터넷 환경에서 대용량 스트리밍 센서데이터의 실시간·병렬 시맨틱 변환 기법 (Real-time and Parallel Semantic Translation Technique for Large-Scale Streaming Sensor Data in an IoT Environment)

  • 권순현;박동환;방효찬;박영택
    • 정보과학회 논문지
    • /
    • 제42권1호
    • /
    • pp.54-67
    • /
    • 2015
  • 최근 사물인터넷 환경에서는 발생하는 센서데이터의 가치와 데이터의 상호운용성을 증진시키기 위해 시맨틱웹 기술과의 접목에 대한 연구가 활발히 진행되고 있다. 이를 위해서는 센서데이터와 서비스 도메인 지식의 융합을 위한 센서데이터의 시맨틱화는 필수적이다. 하지만 기존의 시맨틱 변환기술은 정적인 메타데이터를 시맨틱 데이터(RDF)로 변환하는 기술이며, 이는 사물인터넷 환경의 실시간성, 대용량성의 특징을 제대로 처리할 수 없는 실정이다. 따라서 본 논문에서는 사물인터넷 환경에서 발생하는 대용량 스트리밍 센서데이터의 실시간 병렬처리를 통해 시맨틱 데이터로 변환하는 기법을 제시한다. 본 기법에서는 시맨틱 변환을 위한 변환규칙을 정의하고, 정의된 변환규칙과 온톨로지 기반 센서 모델을 통해 실시간 병렬로 센서데이터를 시맨틱 변환하여 시맨틱 레파지토리에 저장한다. 성능향상을 위해 빅데이터 실시간 분석 프레임워크인 아파치 스톰을 이용하여, 각 변환작업을 병렬로 처리한다. 이를 위한 시스템을 구현하고, 대용량 스트리밍 센서데이터인 기상청 AWS 관측데이터를 이용하여 제시된 기법에 대한 성능평가를 진행하여, 본 논문에서 제시된 기법을 입증한다.

유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안 (Semantic Process Retrieval with Similarity Algorithms)

  • 이홍주
    • Asia pacific journal of information systems
    • /
    • 제18권1호
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.

시맨틱 웹 자원의 랭킹을 위한 알고리즘: 클래스중심 접근방법 (A Ranking Algorithm for Semantic Web Resources: A Class-oriented Approach)

  • 노상규;박현정;박진수
    • Asia pacific journal of information systems
    • /
    • 제17권4호
    • /
    • pp.31-59
    • /
    • 2007
  • We frequently use search engines to find relevant information in the Web but still end up with too much information. In order to solve this problem of information overload, ranking algorithms have been applied to various domains. As more information will be available in the future, effectively and efficiently ranking search results will become more critical. In this paper, we propose a ranking algorithm for the Semantic Web resources, specifically RDF resources. Traditionally, the importance of a particular Web page is estimated based on the number of key words found in the page, which is subject to manipulation. In contrast, link analysis methods such as Google's PageRank capitalize on the information which is inherent in the link structure of the Web graph. PageRank considers a certain page highly important if it is referred to by many other pages. The degree of the importance also increases if the importance of the referring pages is high. Kleinberg's algorithm is another link-structure based ranking algorithm for Web pages. Unlike PageRank, Kleinberg's algorithm utilizes two kinds of scores: the authority score and the hub score. If a page has a high authority score, it is an authority on a given topic and many pages refer to it. A page with a high hub score links to many authoritative pages. As mentioned above, the link-structure based ranking method has been playing an essential role in World Wide Web(WWW), and nowadays, many people recognize the effectiveness and efficiency of it. On the other hand, as Resource Description Framework(RDF) data model forms the foundation of the Semantic Web, any information in the Semantic Web can be expressed with RDF graph, making the ranking algorithm for RDF knowledge bases greatly important. The RDF graph consists of nodes and directional links similar to the Web graph. As a result, the link-structure based ranking method seems to be highly applicable to ranking the Semantic Web resources. However, the information space of the Semantic Web is more complex than that of WWW. For instance, WWW can be considered as one huge class, i.e., a collection of Web pages, which has only a recursive property, i.e., a 'refers to' property corresponding to the hyperlinks. However, the Semantic Web encompasses various kinds of classes and properties, and consequently, ranking methods used in WWW should be modified to reflect the complexity of the information space in the Semantic Web. Previous research addressed the ranking problem of query results retrieved from RDF knowledge bases. Mukherjea and Bamba modified Kleinberg's algorithm in order to apply their algorithm to rank the Semantic Web resources. They defined the objectivity score and the subjectivity score of a resource, which correspond to the authority score and the hub score of Kleinberg's, respectively. They concentrated on the diversity of properties and introduced property weights to control the influence of a resource on another resource depending on the characteristic of the property linking the two resources. A node with a high objectivity score becomes the object of many RDF triples, and a node with a high subjectivity score becomes the subject of many RDF triples. They developed several kinds of Semantic Web systems in order to validate their technique and showed some experimental results verifying the applicability of their method to the Semantic Web. Despite their efforts, however, there remained some limitations which they reported in their paper. First, their algorithm is useful only when a Semantic Web system represents most of the knowledge pertaining to a certain domain. In other words, the ratio of links to nodes should be high, or overall resources should be described in detail, to a certain degree for their algorithm to properly work. Second, a Tightly-Knit Community(TKC) effect, the phenomenon that pages which are less important but yet densely connected have higher scores than the ones that are more important but sparsely connected, remains as problematic. Third, a resource may have a high score, not because it is actually important, but simply because it is very common and as a consequence it has many links pointing to it. In this paper, we examine such ranking problems from a novel perspective and propose a new algorithm which can solve the problems under the previous studies. Our proposed method is based on a class-oriented approach. In contrast to the predicate-oriented approach entertained by the previous research, a user, under our approach, determines the weights of a property by comparing its relative significance to the other properties when evaluating the importance of resources in a specific class. This approach stems from the idea that most queries are supposed to find resources belonging to the same class in the Semantic Web, which consists of many heterogeneous classes in RDF Schema. This approach closely reflects the way that people, in the real world, evaluate something, and will turn out to be superior to the predicate-oriented approach for the Semantic Web. Our proposed algorithm can resolve the TKC(Tightly Knit Community) effect, and further can shed lights on other limitations posed by the previous research. In addition, we propose two ways to incorporate data-type properties which have not been employed even in the case when they have some significance on the resource importance. We designed an experiment to show the effectiveness of our proposed algorithm and the validity of ranking results, which was not tried ever in previous research. We also conducted a comprehensive mathematical analysis, which was overlooked in previous research. The mathematical analysis enabled us to simplify the calculation procedure. Finally, we summarize our experimental results and discuss further research issues.