• Title/Summary/Keyword: Informal on Retrieval

Search Result 8, Processing Time 0.022 seconds

A Exploratory Study on the Expansion of Academic Information Services Based on Automatic Semantic Linking Between Academic Web Resources and Information Services (웹 정보의 자동 의미연계를 통한 학술정보서비스의 확대 방안 연구)

  • Jeong, Do-Heon;Yu, So-Young;Kim, Hwan-Min;Kim, Hye-Sun;Kim, Yong-Kwang;Han, Hee-Jun
    • Journal of Information Management
    • /
    • v.40 no.1
    • /
    • pp.133-156
    • /
    • 2009
  • In this study, we link informal Web resources to KISTI NDSL's collections using automatic semantic indexing and tagging to examine the possibility of the service which recommends related documents using the similarity between KISTI's formal information resources and informal web resources. We collect and index Web resources and make automatic semantic linking through STEAK with KISTI's collections for NDSL retrieval. The macro precision which shows retrieval precision per a subject category is 62.6% and the micro precision which shows retrieval precision per a query is 66.9%. The experts' evaluation score is 76.7. This study shows the possibility of semantic linking NDSL retrieval results with Web information resources and expanding information services' coverage to informal information resources.

Informal ion Retrieval using Word Sense Disambiguation based on Statintical Method (통계기만 의미중의성 해소를 이용한 정보검색)

  • Hur, Jeong;Kim, Hyun-Jin;Jang, Myung-Gil
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.508-510
    • /
    • 2002
  • 인터넷의 발전과 더불어 기하급수적으로 늘어난 디지털 정보를 대상으로 사용자의 요구를 만족시키는 정보검색을 하기 위해 자연어처리 기술이 많이 응용되고 있다. 본 논문에서는 정보검색에 자연어 처리 기술 중, 의미중의성 해소(WSD) 기술을 적용하였다. HANTEC 12만 문서를 대상으로 9개의 중의성 단어를 실험한 결과 67.8%의 정확률을 보였다. 본 실험을 통해 WSD의 오분석이 정보검색의 정확률에 상당히 민감한 결과를 초래함을 알 수 있었다. 그리고, WSD 기술이 정보검색에 적용된 떼 발생할 수 있는 여러 문제점들에 대하여 논의하였고, 이 문제점의 근원적인 해결방안은 WSD기술의 발전에 있다는 것을 알 수 있었다.

  • PDF

3D Knowledge Retrieval for e-Learning Construction (이러닝 구축을 위한 3D 지식 검색)

  • Kim, Gui-Jung;Han, Jung-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.7
    • /
    • pp.63-69
    • /
    • 2010
  • This research does focus on supporting all of formal, or informal learning at real time. From industry or education field, learning training is according to worker's current situation or business context. And according to worker's current situation, or business flow, 3D immersion knowledge visualization is effective in the individual ability and the learning progress For this, workers listen to compound knowledge coaching advices at real time. Therefore we developed the realistic 3D based knowledge retrieval method to identify and retrieve multidimensional relation easily.

Feature Selection for a Hangul Text Document Classification System (한글 텍스트 문서 분류시스템을 위한 속성선택)

  • Lee, Jae-Sik;Cho, You-Jung
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2003.05a
    • /
    • pp.435-442
    • /
    • 2003
  • 정보 추출(Information Retrieval) 시스템은 거대한 양의 정보들 가운데 필요한 정보의 적절한 탐색을 도와주기 위한 도구이다. 이는 사용자가 요구하는 정보를 보다 정확하고 보다 효과적이면서 보다 효율적으로 전달해주어야만 한다. 그러기 위해서는 문서내의 무수히 많은 속성들 가운데 해당 문서의 특성을 잘 반영하는 속성만을 선별해서 적절히 활용하는 것이 절실히 요구된다. 이에 본 연구는 기존의 한글 문서 분류시스템(CB_TFIDF)[1]의 정확도와 신속성 두 가지 측면의 성능향상에 초점을 두고 있다. 기존의 영문 텍스트 문서 분류시스템에 적용되었던 다양한 속성선택 기법들 가운데 잘 알려진 세가지 즉, Information Gain, Odds Ratio, Document Frequency Thresholding을 통해 선별적인 사례베이스를 구성한 다음에 한글 텍스트 문서 분류시스템에 적용시켜서 성능을 비교 평가한 후, 한글 문서 분류시스템에 가장 적절한 속성선택 기법과 속성 선택에 대한 가이드라인을 제시하고자 한다.

  • PDF

A Study on the Quantitative Analysis of Scientific Communication (학술 커뮤니케이션의 수량학적 분석에 관한 연구)

  • Kim Hyun-hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.14
    • /
    • pp.93-130
    • /
    • 1987
  • Scientific communication is an information exchange activity between scientists. Scientific communication is carried out in a variety of informal and formal ways. Basically, informal communication takes place by word of mouth, whereas formal communication occurs via the written word. Science is a highly interdependent activity in which each scientist builds upon the work of colleagues past and present. Consequently, science depends heavily on scientific communication. In this study, three mathematical models, namly Brillouin measure, logistic equation, and Markov chain are examined. These models provide one with a means of describing and predicting the behavior of scientific communication process. These mathematical models can be applied to construct quality filtering algorithms for subject literature which identify synthesized elements (authors, papers, and journals). Each suggests a different type of application. Quality filtering for authors can be useful to funding agencies in terms of identifying individuals doing the best work in a given area or subarea. Quality filtering with respect to papers can be useful in constructing information retrieval and dissemination systems for the community of scientists interested m the field. The quality filtering of journals can be a basis for the establishment of small quality libraries based on local interests in a variety of situations, ranging from the collection of an individual scientist or physician to research centers to developing countries. The objective of this study is to establish the theoretical framework for informetrics which is defined as the quantitative analysis of scientific communication, by investigating mathematical models of scientific communication.

  • PDF

Query Context Information-Based Translation Models for Korean-Japanese Cross-Language Informal ion Retrieval (한-일 교차언어검색에서의 질의 문맥 정보를 이용한 대역어 변환 확률 모델)

  • Lee, Gyu-Chan;Kang, In-Su;Na, Seung-Hoon;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2005.10a
    • /
    • pp.97-104
    • /
    • 2005
  • 교차언어 검색 과정에서는 질의나 문서의 언어를 일치시키기 위한 변환 과정이 필수적이며, 이런 변환 과정에서 어휘의 중의성으로 인해 하나의 어휘에 대응하는 다수의 대역어가 생성됨으로써 사용자의 정보 욕구를 왜곡시켜 검색의 성능을 저하시킬 수 있다. 본 논문에서는 어휘 중의성 문제를 해결하기 위해서 질의의 문맥 정보를 이용하여 변환 질의의 확률을 구함으로써 중의성을 해소하는 방식을 제시하고, 질의의 길이, 중의도, 중의성을 가진 어휘의 비율 등에 따라서 성능이 어떻게 변하는지 비교함으로써 이 방법의 장점과 단점을 분석한다. 또한 현재의 단점을 보완하기 위한 차후 연구 방향을 제시한다.

  • PDF

Towards Improving Causality Mining using BERT with Multi-level Feature Networks

  • Ali, Wajid;Zuo, Wanli;Ali, Rahman;Rahman, Gohar;Zuo, Xianglin;Ullah, Inam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3230-3255
    • /
    • 2022
  • Causality mining in NLP is a significant area of interest, which benefits in many daily life applications, including decision making, business risk management, question answering, future event prediction, scenario generation, and information retrieval. Mining those causalities was a challenging and open problem for the prior non-statistical and statistical techniques using web sources that required hand-crafted linguistics patterns for feature engineering, which were subject to domain knowledge and required much human effort. Those studies overlooked implicit, ambiguous, and heterogeneous causality and focused on explicit causality mining. In contrast to statistical and non-statistical approaches, we present Bidirectional Encoder Representations from Transformers (BERT) integrated with Multi-level Feature Networks (MFN) for causality recognition, called BERT+MFN for causality recognition in noisy and informal web datasets without human-designed features. In our model, MFN consists of a three-column knowledge-oriented network (TC-KN), bi-LSTM, and Relation Network (RN) that mine causality information at the segment level. BERT captures semantic features at the word level. We perform experiments on Alternative Lexicalization (AltLexes) datasets. The experimental outcomes show that our model outperforms baseline causality and text mining techniques.

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.