• 제목/요약/키워드: Domain-Specific Information

검색결과 423건 처리시간 0.029초

LHMM기반 영어 형태소 품사 태거의 도메인 적응 방법 (Domain Adaptation Method for LHMM-based English Part-of-Speech Tagger)

  • 권오욱;김영길
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제16권10호
    • /
    • pp.1000-1004
    • /
    • 2010
  • 형태소 품사 태거는 언어처리 시스템의 전처리기로 많이 활용되고 있다. 형태소 품사 태거의 성능 향상은 언어처리 시스템의 전체 성능 향상에 크게 기여할 수 있다. 자동번역과 같이 복잡도가 놓은 언어처리 시스템은 최근 특정 도메인에서 좋은 성능을 나타내는 시스템을 개발하고자 한다. 본 논문에서는 기존 일반도메인에서 학습된 LHMM이나 HMM 기반의 영어 형태소 품사 태거를 특정 도메인에 적응하여 높은 성능을 나타내는 방법을 제안한다. 제안하는 방법은 특정도메인에 대한 원시코퍼스를 이용하여 HMM이나 LHMM의 기학습된 전이확률과 출력확률을 도메인에 적합하게 반자동으로 변경하는 도메인 적응 방법이다. 특허도메인에 적응하는 실험을 통하여 단어단위 태깅 정확률 98.87%와 문장단위 태깅 정확률 78.5%의 성능을 보였으며, 도메인 적응하지 않은 형태소 태거보다 단어단위 태깅 정확률 2.24% 향상(ERR: 6.4%)고 문장단위 태깅 정확률 41.0% 향상(ERR: 65.6%)을 보였다.

고정키어구 추출을 통한 디지털 문서의 도메인 특정 주석 (Domain Specific Annotation of Digital Documents through Keyphrase Extraction)

  • 이람 파티마;이영구;이승룡
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2011년도 춘계학술발표대회
    • /
    • pp.1389-1391
    • /
    • 2011
  • In this paper, we propose a methodology to annotate the digital documents through keyphrase extraction using domain specific taxonomy. Limitation of the existing keyphrase extraction algorithms is that output keyphrases may contain irrelevant information along with relevant ones. The quality of the generated keyphrases by the existing approaches does not meet the required level of accuracy. Our proposed approach exploits semantic relationships and hierarchical structure of the classification scheme to filter out irrelevant keyphrases suggested by Keyphrase Extraction Algorithm (KEA++). Our experimental results proved the accuracy of the proposed algorithm through high precision and low recall.

로그 이상 탐지를 위한 도메인별 사전 훈련 언어 모델 중요성 연구 (On the Significance of Domain-Specific Pretrained Language Models for Log Anomaly Detection)

  • 레리사 아데바 질차;김득훈;곽진
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2024년도 춘계학술발표대회
    • /
    • pp.337-340
    • /
    • 2024
  • Pretrained language models (PLMs) are extensively utilized to enhance the performance of log anomaly detection systems. Their effectiveness lies in their capacity to extract valuable semantic information from logs, thereby strengthening the detection performance. Nonetheless, challenges arise due to discrepancies in the distribution of log messages, hindering the development of robust and generalizable detection systems. This study investigates the structural and distributional variation across various log message datasets, underscoring the crucial role of domain-specific PLMs in overcoming the said challenge and devising robust and generalizable solutions.

Representation of Event-Based Ontology Models: A Comparative Study

  • Ali, Ashour;Noah, Shahrul Azman Mohd;Zakaria, Lailatul Qadri
    • International Journal of Computer Science & Network Security
    • /
    • 제22권7호
    • /
    • pp.147-156
    • /
    • 2022
  • Ontologies are knowledge containers in which information about a specified domain can be shared and reused. An event happens within a specific time and place and in which some actors engage and show specific action features. The fact is that several ontology models are based on events called Event-Based Models, where the event is an individual entity or concept connected with other entities to describe the underlying ontology because the event can be composed of spatiotemporal extents. However, current event-based ontologies are inadequate to bridge the gap between spatiotemporal extents and participants to describe a specific domain event. This paper reviews, describes and compares the existing event-based ontologies. The paper compares various ways of representing the events and how they have been modelled, constructed, and integrated with the ontologies. The primary criterion for comparison is based on the events' ability to represent spatial and temporal extent and the participants in the event.

Guiding Practical Text Classification Framework to Optimal State in Multiple Domains

  • Choi, Sung-Pil;Myaeng, Sung-Hyon;Cho, Hyun-Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제3권3호
    • /
    • pp.285-307
    • /
    • 2009
  • This paper introduces DICE, a Domain-Independent text Classification Engine. DICE is robust, efficient, and domain-independent in terms of software and architecture. Each module of the system is clearly modularized and encapsulated for extensibility. The clear modular architecture allows for simple and continuous verification and facilitates changes in multiple cycles, even after its major development period is complete. Those who want to make use of DICE can easily implement their ideas on this test bed and optimize it for a particular domain by simply adjusting the configuration file. Unlike other publically available tool kits or development environments targeted at general purpose classification models, DICE specializes in text classification with a number of useful functions specific to it. This paper focuses on the ways to locate the optimal states of a practical text classification framework by using various adaptation methods provided by the system such as feature selection, lemmatization, and classification models.

SPARQL Query Automatic Transformation Method based on Keyword History Ontology for Semantic Information Retrieval

  • Jo, Dae Woong;Kim, Myung Ho
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권2호
    • /
    • pp.97-104
    • /
    • 2017
  • In semantic information retrieval, we first need to build domain ontology and second, we need to convert the users' search keywords into a standard query such as SPARQL. In this paper, we propose a method that can automatically convert the users' search keywords into the SPARQL queries. Furthermore, our method can ensure effective performance in a specific domain such as law. Our method constructs the keyword history ontology by associating each keyword with a series of information when there are multiple keywords. The constructed ontology will convert keyword history ontology into SPARQL query. The automatic transformation method of SPARQL query proposed in the paper is converted into the query statement that is deemed the most appropriate by the user's intended keywords. Our study is based on the existing legal ontology constructions that supplement and reconstruct schema and use it as experiment. In addition, design and implementation of a semantic search tool based on legal domain and conduct experiments. Based on the method proposed in this paper, the semantic information retrieval based on the keyword is made possible in a legal domain. And, such a method can be applied to the other domains.

An Identification of the Image Retrieval Domain from the Perspective of Library and Information Science with Author Co-citation and Author Bibliographic Coupling Analyses

  • 윤정원;정은경;변지혜
    • 한국문헌정보학회지
    • /
    • 제49권4호
    • /
    • pp.99-124
    • /
    • 2015
  • As the improvement of digital technologies increases the use of images from various fields, the domain of image retrieval has evolved and become a growing topic of research in the Library and Information Science field. The purpose of this study is to identify the knowledge structure of the image retrieval domain by using the author co-citation analysis and author bibliographic coupling as analytical tools in order to understand the domain's past and present. The data set for this study is 245 articles with 8,031 cited articles in the field of image retrieval from 1998 to 2013, from the Web of Science citation database. According to the results of author co-citation analysis for the past of the image retrieval domain, our findings demonstrate that the intellectual structure of image retrieval in the LIS field consists of predominantly user-oriented approaches, but also includes some areas influenced by the CBIR area. More specifically, the user-oriented approach contains six specific areas which include image needs, information seeking, image needs and search behavior, image indexing and access, indexing of image collection, and web image search. On the other hand, for CBIR approaches, it contains feature-based image indexing, shape-based indexing, and IR & CBIR. The recent trends of image retrieval based on the results from author bibliographic coupling analysis show that the domain is expanding to emerging areas of medical images, multimedia, ontology- and tag-based indexing which thus reflects a new paradigm of information environment.

ONTOLOGY DESIGN FOR THE EFFICIENT CUSTOMER INFORMATION RETRIEVAL

  • Gu, Mi-Sug;Hwang, Jeong-Hee;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2005년도 Proceedings of ISRS 2005
    • /
    • pp.345-348
    • /
    • 2005
  • Because the current web search engine estimates the similarity of documents, using the frequency of words, many documents irrespective of the user query are provided. To solve these kinds of problems, the semantic web is appearing as a future web. It is possible to provide the service based on the semantic web through ontology which specifies the knowledge in a special domain and defines the concepts of knowledge and the relationships between concepts. In this paper to search the information of potential customers for home-delivery marketing, we model the specific domain for generating the ontology. And we research how to retrieve the information, using the ontology. Therefore, in this paper, we generate the ontology to define the domain about potential customers and develop the search robot which collects the information of customers.

  • PDF

Neural Network-based Decision Class Analysis with Incomplete Information

  • 김재경;이재광;박경삼
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 1999년도 춘계공동학술대회-지식경영과 지식공학
    • /
    • pp.281-287
    • /
    • 1999
  • Decision class analysis (DCA) is viewed as a classification problem where a set of input data (situation-specific knowledge) and output data(a topological leveled influence diagram (ID)) is given. Situation-specific knowledge is usually given from a decision maker (DM) with the help of domain expert(s). But it is not easy for the DM to know the situation-specific knowledge of decision problem exactly. This paper presents a methodology for sensitivity analysis of DCA under incomplete information. The purpose of sensitivity analysis in DCA is to identify the effects of incomplete situation-specific frames whose uncertainty affects the importance of each variable in the resulting model. For such a purpose, our suggested methodology consists of two procedures: generative procedure and adaptive procedure. An interactive procedure is also suggested based the sensitivity analysis to build a well-formed ID. These procedures are formally explained and illustrated with a raw material purchasing problem.

  • PDF

u-헬스케어시스템의 정보보안 체계 확보를 위한 5단계 보안위험도 평가모델 설계 (A Study on Five Levels of Security Risk Assessment Model Design for Ensuring the u-Healthcare Information System)

  • 노시춘
    • 융합보안논문지
    • /
    • 제13권4호
    • /
    • pp.11-17
    • /
    • 2013
  • 모든 u-헬스케어 시스템은 보안 취약점을 가지고 있다. 이 취약점은 로컬(local) 또는 네트워크(network) 상에서 잠재적인 위험이 된다. 의료정보 기술의 Smart 환경, Ad-hoc networking, 무선통신 환경은, u-헬스케어 보안 취약성을 증가시키는 주요 요인이다. u-헬스케어 의료정보시스템 도메인은 사용자단말 구간, 공중통신망 인프라구간, 네트워킹구간, 인트라넷구간으로 구분된다. 의료정보시스템 도메인별을 구분하여 취약점을 평가하는 이유는 도메인별로 취약점에 대한 대처방법이 다르기 때문이다. u-헬스케어시스템 5단계의 보안위험도 평가체계는 도메인별 보안취약성 진단체계를 설계하여 보안대책을 강구하기 위해 필요하다. 제안하는 모델을 사용할 경우 현재까지 막연하게 진행 되어온 USN 기반 의료정보네트워크 보안취약성 진단대책을 좀 더 체계적으로 수행할 수 있는 모형을 제공한다.