• Title/Summary/Keyword: 키워드-기반 시스템

Search Result 517, Processing Time 0.029 seconds

Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis (구문분석에 기반한 한글 자연어 질의로부터의 불리언 질의 생성)

  • Park, Mi-Hwa;Won, Hyeong-Seok;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1219-1229
    • /
    • 1999
  • 일반적으로 AND, OR, NOT과 같은 연산자를 사용하는 불리언 질의는 사용자의 검색의도를 정확하게 표현할 수 있기 때문에 검색 전문가들은 불리언 질의를 사용하여 높은 검색성능을 얻는다고 알려져 있지만, 일반 사용자는 자신이 원하는 정보를 불리언 형태로 표현하는데 익숙하지 않다. 본 논문에서는 검색성능의 향상과 사용자 편의성을 동시에 만족하기 위하여 사용자의 자연어 질의를 확장 불리언 질의로 자동 변환하는 방법론을 제안한다. 먼저 자연어 질의를 범주문법에 기반한 구문분석을 수행하여 구문트리를 생성하고 연산자 및 키워드 정보를 추출하여 구문트리를 간략화한다. 다음으로 간략화된 구문트리로부터 명사구를 합성하고 키워드들에 대한 가중치를 부여한 후 불리언 질의를 생성하여 검색을 수행한다. 또한 구문분석의 오류로 인한 검색성능 저하를 최소화하기 위하여 상위 N개 구문트리에 대해 각각 불리언 질의를 생성하여 검색하는 N-BEST average 방법을 제안하였다. 정보검색 실험용 데이타 모음인 KTSET2.0으로 실험한 결과 제안된 방법은 수동으로 추출한 불리언 질의보다 8% 더 우수한 성능을 보였고, 기존의 벡터공간 모델에 기반한 자연어질의 시스템에 비해 23% 성능향상을 보였다. Abstract There have been a considerable evidence that trained users can achieve a good search effectiveness through a boolean query because a structural boolean query containing operators such as AND, OR, and NOT can make a more accurate representation of user's information need. However, it is not easy for ordinary users to construct a boolean query using appropriate boolean operators. In this paper, we propose a boolean query formulation method that automatically transforms a user's natural language query into a extended boolean query for both effectiveness and user convenience. First, a user's natural language query is syntactically analyzed using KCCG(Korean Combinatory Categorial Grammar) parser and resulting syntactic trees are structurally simplified using a tree-simplifying mechanism in order to catch the logical relationships between keywords. Next, in a simplified tree, plausible noun phrases are identified and added into the same tree as new additional keywords. Finally, a simplified syntactic tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an N-BEST average method that uses top N syntactic trees to compensate for bad effects of single incorrect top syntactic tree. In experiments using KTSET2.0, we showed that a proposed method outperformed a traditional vector space model by 23%, and surprisingly manually constructed boolean queries by 8%.

Microplastics Intellectual Network Analysis based on Bigdata (빅데이터 기반한 미세플라스틱 지적네트워크 분석)

  • Kim, Younghee;Chang, Kwanjong
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.4
    • /
    • pp.239-259
    • /
    • 2022
  • Since 2019, research on microplastics has been actively conducted around the world, so analyzing the differences between domestic and foreign microplastics research can be a milestone in establishing the direction of domestic research. In this study, microplastic papers from KCI and WoS were extracted and the differences between domestic and foreign studies were analyzed using a network analysis methodology based on big data such as author keyword co-occurrence word analysis, thesis co-citation analysis, and author co-citation analysis. As a result of the analysis, the analysis of the research topic confirmed that studies that could affect the human body and the treatment of microplastics in daily life were additionally needed in Korea. In the analysis of the depth of thesis citation that examines the quality of research, it was found that Korea was still insufficient at 2.25 overseas and 1.39 in Korea. In the analysis of the composition of the joint research front, where various researchers participate and share information, 3 out of 22 clusters in Korea are Star type. In the case of overseas, all 19 clusters have a mesh structure, so it was confirmed that information flow and sharing were insufficient in specific research fields in Korea. These research results confirmed the need to expand the research topic of microplastics, improve the quality of research, and improve the research promotion system in which various researchers participate. In addition, if the automation program is developed based on topic modeling, it will be possible to build a system capable of real-time analysis.

Concept-based Question Analysis for Accurate Answer Extraction (정확한 해답 추출을 위한 개념 기반의 질의 분석)

  • Shin, Seung-Eun;Kang, Yu-Hwan;Ahn, Young-Min;Park, Hee-Guen;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.1
    • /
    • pp.10-20
    • /
    • 2007
  • This paper describes a concept-based question analysis to analyze concept which is more important than keyword for the accurate answer extraction. Our idea is that we can extract correct answers from various paragraphs with different structures when we use well-defined concepts because concepts occurred in questions of same answer type are similar. That is, we will analyze the syntactic and semantic role of each word or phrase in a question in order to extract more relevant documents and more accurate answer in them. For each answer type, we define a concept frame which is composed of concepts commonly occurred in that type of questions and analyze user's question by filling a concept frame with a word or phrase. Empirical results show that our concept-based question analysis can extract more accurate answer than any other conventional approach. Also, concept-based approach has additional merits that it is language universal model, and can be combined with arbitrary conventional approaches.

Data value extraction through comparison of online big data analysis results and water supply statistics (온라인 빅 데이터 분석 결과와 상수도 통계 비교를 통한 데이터 가치 추출)

  • Hong, Sungjin;Yoo, Do Guen
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.431-431
    • /
    • 2021
  • 4차 산업혁명의 도래로 사회기반시설물의 계획 및 운영관리에 있어 데이터 분석을 통한 가치추출에 대한 관심은 매우 높은 상황이다. 데이터의 가용성과 접근성, 정부 지원 등을 평가하는 공공데이터 개방지수에서 한국은 1점 만점에 0.93점을 획득하여 경제협력개발기구 회원국 중 1위(2019년 기준)를 할 정도로 매우 높은 수준(평균 0.60점)이다. 그러나 공식적으로 발표 및 배포되는 사회기반시설물 관련 정보와 심도 있는 연구 분석이 필요한 정보는 접근이 여전히 제한적이라 할 수 있다. 특히 대표적인 사회기반시설물인 상수도시스템은 대부분 국가중요시설로 지정되어 있어 다양한 정보를 획득하고 분석하는데 제약이 존재하며, 관련 국가통계인 상수도통계에서는 누수사고 등과 같은 비정상적 상황에 대한 사고지점, 원인 등과 같은 세부정보는 제공하고 있지 않다. 본 연구에서는 웹크롤링 및 빅데이터 분석기술을 활용하여 과거 일정기간 발생한 지자체의 상수도 누수사고 관련 뉴스를 전수조사하고 도출된 사고건수를 국가 공인 정보인 상수도통계자료와 비교·분석하였다. 독립적인 누수사고 기사를 추출하기 위해서 중복기사의 제거, 누수 관련 키워드 정립, 상수도분야 이외의 관련기사 제거 등의 절차가 필요하며, 이와 같은 기법은 R프로그래밍을 통해 구현되었다. 추가적으로 뉴스기사의 자연어 처리기반 정보추출기법을 통해 누수사고 건수 뿐만 아니라 사고발생일, 위치, 원인, 피해정도, 그리고 대상 관로의 크기 등을 획득하여 상수도 통계에서 제시하고 있는 정보보다 많은 가치를 추출하여 연계할 수 있는 방안을 제시하였다. 제시된 방법론을 국내 A광역시에 적용하여 누수사고 건수를 비교한 결과 상수도통계에서 제시하고 있는 누수발생건수와 유사한 규모의 사고건수를 뉴스기사분석을 통해 도출할 수 있었다. 제안된 방법론은 추가적인 정보의 추출이 가능하다는 점에서 향후 활용성이 높을 것으로 기대된다.

  • PDF

Semantic Web based Multi-Dimensional Information Analysis System on the National Defense Weapons (시맨틱 웹 기반 국방무기 다차원 정보 분석 시스템)

  • Choi, Jung-Hwoan;Park, Jeong-Ho;Kim, Pyung;Lee, Seungwoo;Jung, Hanmin;Seo, Dongmin
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.11
    • /
    • pp.502-510
    • /
    • 2012
  • As defense science and technology are developing, smart weapons are being developed continually. The collection and analysis of the future strategic weapon information from all over the world have become a greater priority because information sharing became active. So, a system to manage and analyze heterogeneous defense intelligence is required. Semantic Web is the next generation knowledge information management technology for integrating, searching and navigating heterogeneous knowledge resource. Recently, Semantic Web is wildly being used in intelligent information management system. Semantic Web supports the analysis with the high reliability because it supports the simple keyword search as well as the semantic based information retrieval. In this paper, we propose the semantic web based multi-dimensional information analysis system on the national defense weapons that constructs ontology for various weapons information such as weapon specifications, nations, manufacturers and technologies and searches and analyses the specific weapon based on ontology. The proposed system supports the semantic search and multi-dimensional information analysis based on the relations between weapon specifications. Also, our system improves the efficiency on acquiring smart weapon information because it is developed with ontology based on military experts' knowledge and various web documents related with various weapons and intelligent search service.

Design and Implementation of Information Retrieval System Based on Ontology Using Semantic Web (시맨틱 웹을 이용한 온톨로지 기반의 정보검색 시스템 설계 및 구현)

  • Seo, Woo-Jin;Rhyu, Kyeong-Taek
    • Journal of Digital Convergence
    • /
    • v.17 no.1
    • /
    • pp.209-217
    • /
    • 2019
  • In this paper, the purpose of this paper is to lay the foundation for the search system by using and building an online search engine suitable for the search domain and enabling search, conversion, integration and sharing of information. It is to use the ontology to infer hierarchical relationships, deduce objects based on that layer, and extract attributes to search areas that are relevant to the data that the user wants. In order to search for information in this way, the information search system was implemented by entering key words related to 'qualifications'. The implemented system arranged the meaning and relationship of each attribute online so that the general public can search information quickly, easily, and accurately. In addition, the implementation results were compared with two different search engines. Comparable search engines are Naver and Daum, the two major search engines. The search engine of this study, which was built using an ontology suitable for the search domain to perform searches using the semantic web, was evaluated to have excellent results. However, it is thought that a more formalized online location is necessary to increase the accuracy and reliability of search engines and to include more comprehensive categories of search terms.

A Study on an Efficient e-learning Content Creation and Maintenance Method (효과적인 e-learning 콘텐츠 생성 및 관리기법에 관한 연구)

  • Cho, Soo-Hyun;Kim, Young-Hak;Kim, Myoung-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.15-25
    • /
    • 2008
  • Recently, with the growing use of e-learning, instructors develop new online courses using a variety of contents and then store the results on their computers. These contents should be updated with new information as time goes on, and a new content also can be produced by reusing these ones. However, a lot of time will be needed for instructors to search, edit, and manage various contents stored from place to place on their computers. Currently, the development of the e-learning content management tool. which performs efficiently these functions on the PC environment, leaves much to be desired. Therefore, in this paper, we proposed an e-learning content creation and management system which can manage efficiently a variety of contents stored from different locations on an instructor's computer and can develop easily new online courses. The proposed system can be used widely to develop contents for instructors based on the PC environment. For performance evaluation, this paper compared the proposed system with the previous system according to the retrieval time of content keyword, and the experiment showed that our system is much better than the previous one.

  • PDF

Optical Character Recognition based Security Document Image File Management System (광학문자인식 기반 보안문서 이미지 파일 관리 시스템)

  • Jeong, Pil-Seong;Cho, Yang-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.3
    • /
    • pp.7-14
    • /
    • 2019
  • With the development of information and communication technology, we have been able to access and manage documents containing corporate information anytime and anywhere using smart devices. As the work environment changes to smart work, the scope of information distribution is expanded, and more efforts are needed to manage security. This paper proposes a file sharing system that enables users who have smart devices to manage and share files through mutual cooperation. Proposed file sharing system, the user can add a partner to share files with each other when uploading files kept by spliting the part of the file and the other uses an algorithm to store on the server. After converting the file to be uploaded to base64, it splits it into encrypted files among users, and then transmits it to the server when it wants to share. It is easy to manage and control files using dedicated application to view files and has high security. Using the system developed with proposed algorithm, it is possible to build a system with high efficiency even for SMEs(small and medium-sized enterprises) that can not pay much money for security.

The implementation of the depth search system for relations of contents information based on Ajax (콘텐츠 정보의 연관성을 고려한 Ajax기반의 깊이 검색 시스템 구현)

  • Kim, Woon-Yong;Park, Seok-Gyu
    • Journal of Advanced Navigation Technology
    • /
    • v.12 no.5
    • /
    • pp.516-523
    • /
    • 2008
  • Recently, the Web has been constructed based on collective intel1igence and growing up quickly. User created contents have been made the mainstream in this environments. So it's required to make an efficient technique of searching for the contents. The current searching technique mainly is achieved by key words. Semantic Web based on similarity and relationship of a language and using user tags in web2.0 also have been researched with activity. Generally, the web of the participation architecture has a lot of user created contents, various forms and classification. Therefore, it is necessary to classify and to efficiently search for a lot of user created contents. In this paper, we propose a depth searching technique considering the relationship among the tags that descript user contents. It is expected that the proposed depth searching techniques can reduce the time taken to search for the unwanted contents and the increase the efficiency of the contents searching using a service of suggestion words in tags groups.

  • PDF

Intelligent Spam-mail Filtering Based on Textual Information and Hyperlinks (텍스트정보와 하이퍼링크에 기반한 지능형 스팸 메일 필터링)

  • Kang, Sin-Jae;Kim, Jong-Wan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.7
    • /
    • pp.895-901
    • /
    • 2004
  • This paper describes a two-phase intelligent method for filtering spam mail based on textual information and hyperlinks. Scince the body of spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information (sender`s information and definite spam keyword lists) and less definite textual information (words or phrases, and particular features of email). In filtering spam mails, definite information is used first, and then less definite textual information is applied. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using on original email header and body only.