Search | Korea Science

BERT Sparse: Keyword-based Document Retrieval using BERT in Real time (BERT Sparse: BERT를 활용한 키워드 기반 실시간 문서 검색)

Kim, Youngmin;Lim, Seungyoung;Yu, Inguk;Park, Soyoon
- Annual Conference on Human and Language Technology
- /
- 2020.10a
- /
- pp.3-8
- /
- 2020
문서 검색은 오래 연구되어 온 자연어 처리의 중요한 분야 중 하나이다. 기존의 키워드 기반 검색 알고리즘 중 하나인 BM25는 성능에 명확한 한계가 있고, 딥러닝을 활용한 의미 기반 검색 알고리즘의 경우 문서가 압축되어 벡터로 변환되는 과정에서 정보의 손실이 생기는 문제가 있다. 이에 우리는 BERT Sparse라는 새로운 문서 검색 모델을 제안한다. BERT Sparse는 쿼리에 포함된 키워드를 활용하여 문서를 매칭하지만, 문서를 인코딩할 때는 BERT를 활용하여 쿼리의 문맥과 의미까지 반영할 수 있도록 고안하여, 기존 키워드 기반 검색 알고리즘의 한계를 극복하고자 하였다. BERT Sparse의 검색 속도는 BM25와 같은 키워드 기반 모델과 유사하여 실시간 서비스가 가능한 수준이며, 성능은 Recall@5 기준 93.87%로, BM25 알고리즘 검색 성능 대비 19% 뛰어나다. 최종적으로 BERT Sparse를 MRC 모델과 결합하여 open domain QA환경에서도 F1 score 81.87%를 얻었다.
PDF

Hierarchical extended Chord to Support Efficient Keyword Search (효율적인 키워드 검색을 지원하기 위한 계층적 확장 Chord)

이승은;진명희;김경석
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.10c
- /
- pp.574-576
- /
- 2004
현재 Peer-to-Peer (P2P) 파일 공유 시스템은 특정 데이터 항목을 저장한 노드의 위치를 어떻게 효율적으로 찾을 것인가에 대해 연구되고 있다. DHT 기반 구조는 확장 가능하며 정확한 매치(exact-match) 질의가 가능하지만, 정확한 매치가 아닌 질의에 대해서는 효율적이지 못하다. 본 논문은 DT 기반의 P2P 파일 공유 시스템에서 확장 가능한 키워드 기반 파일 검색을 제공하기 위한 메커니즘을 제안한다. 우리의 제안된 구조는 높은 능력을 가진 ' 슈퍼피어 ' 를 둠으로써 많은 파일을 가진 공통 키워드로 인친 발생하는 과도한 저장 공간 소비와 네트워크 트래픽을 감소시킨다.래픽을 감소시킨다.
PDF

Keyword Spotting on Hangul Document Images Using Character Feature Models (문자 별 특징 모델을 이용한 한글 문서 영상에서 키워드 검색)

Park, Sang-Cheol;Kim, Soo-Hyung;Choi, Deok-Jai
- The KIPS Transactions:PartB
- /
- v.12B no.5 s.101
- /
- pp.521-526
- /
- 2005
In this Paper, we propose a keyword spotting system as an alternative to searching system for poor quality Korean document images and compare the Proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to remove the connectivity between adjacent characters and a character segmentation method by making the variance of character widths minimum. In the query creation step, feature vector for the query is constructed by a combination of a character model by typeface. In the matching step, word-to-word matching is applied base on a character-to-character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on the Korean document images, especially when the quality of documents is quite poor and point size is small.
https://doi.org/10.3745/KIPSTB.2005.12B.5.521 인용 PDF KSCI

스톰을 기반으로 한 실시간 SNS 데이터 분석 시스템

Lee, Hyeon-Gyeong;Go, Gi-Cheol;Son, Yeong-Seong;Kim, Jong-Bae
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.05a
- /
- pp.435-436
- /
- 2015
In order to analyze and maximize efficiency of advertise, business put more importance on SNS. Especially, keyword extraction analyses based on Hadoop receive attention. The existing keyword extraction analyses have mostly MapReduce processes. Due to that, it causes problems data base would not update in real time like SNS system. In this study, we indicate limitations of the existing model and suggest new model using Storm technique to analyze data in real time.
PDF

Software Testing by a keyword driven test automation method and Effects (키워드 기반 자동 테스트 구현 및 적용 사례)

이영석;하영민
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04a
- /
- pp.604-606
- /
- 2001
소프트웨어의 본질인 변경 문제로 인해 상용 테스트 도구의 단순 적용만으로는 자동 테스트의 실질적인 효과를 기대하기 어려운 것이 현실이다. 이러한 문제를 해결하기 위해 변경에 영향 받지 않는 다양한 자도 테스트 기법이 시도되어 오고 있다. 그 중에서 가장 주목받는 것으로 키워드 기반 자동 테스트(Keyword Driven Automated Test)를 들 수 있으며 이 방법의 궁극적인 목적은 대상 소프트웨어의 변경에 따른 테스트 자원의 유지보수를 쉽게 하기 위한 것이라 할 수 있다. 테스트 자원의 구축, 실행 및 발생하는 변경으로 인한 유지보수까지의 전 과정에서 키워드 기반 자동 테스트가 보다 효율적이며 다른 프로젝트, 제품 테스트에서도 일부 함수 추가, 재정의만으로 재사용 가능한 장점을 가지고 있다. 키워드 기반 자동 테스트를 위해서는 기존의 테스트 도구를 사용하여 대상 소프트웨어에 맞게 테스트 도구가 제공하는 자체 프로그래밍 언어를 사용하여 개발해야 한다.
PDF

Presenting the possibility of using water pipe network data through R-based data mining analysis (R기반 데이터마이닝 분석을 통한 상수관망 자료 활용가능성 제시)

Hong, Sung Jin;Lee, Chan Wook;Yoo, Do Guen
- Proceedings of the Korea Water Resources Association Conference
- /
- 2020.06a
- /
- pp.236-236
- /
- 2020
데이터마이닝은 빅데이터를 활용하는데 주로 활용되는 기술이다. 빅데이터 활용의 중요성이 증대됨에 따라 빅데이터를 기반으로 데이터마이닝을 활용한 생산, 금융, 통신 등의 성공적인 활용사례가 있지만 상수도 시설물에 적용한 사례는 드물다. 본 연구에서는 R프로그램을 기반으로 확보하기 어려운 데이터를 얻고자 관련 기사를 수집하고 데이터마이닝의 주요 기능인 분류, 군집(K-means)분석을 수행하였다. 예를들어, 상수관로의 정밀한 누수 분석을 위해서는 관경, 매설년도 등의 세분화된 자료가 필요하나 이러한 자료들은 쉽게 확보할 수 없다는 한계를 갖고 있다. 이러한 관점에서 상수관망 단수, 누수 등의 키워드를 통해 얻을 수 있는 기사를 기반으로 주요 키워드에 대한 군집분석을 수행하여 세분화된 상수관망 자료를 획득 및 분석하였다. 단수, 누수 키워드 기사에 의해 관경정보 등 파손된 관로의 정보를 확보할 수 있는 것으로 나타났으며 향후 확보하기 어려운 데이터를 보완할 수 있는 방법 중 하나로 활용될 수 있을것으로 기대된다. 그러나, 데이터의 양과 보다 정교한 군집분석을 위한 키워드설정 등의 추가연구가 필요할 것으로 판단된다.
PDF

Keyword-Based Query Translation using Ontology Structure (온톨로지 구조를 활용한 키워드 기반 질의 변환)

Song, Hyun-Je;Noh, Tae-Gil;Park, Seong-Bae;Park, Se-Young
- Journal of KIISE:Computing Practices and Letters
- /
- v.15 no.12
- /
- pp.953-957
- /
- 2009
This paper proposes a keyword-based query translation system for the semantic web. With the relationship between keywords and ontology structure information, the system converts keyword based queries into queries written by formal query language which is appropriate for the semantic web. As a result, casual web users could not only express queries easily but also obtain the better result.
PDF KSCI

A Study on the Performance Evaluation of Semantic Retrieval Engines (시맨틱검색엔진의 성능평가에 관한 연구)

Noh, Young-Hee
- Journal of the Korean BIBLIA Society for library and Information Science
- /
- v.22 no.2
- /
- pp.141-160
- /
- 2011
This study suggested knowledge base and search engine for the libraries that have the largescaled data. For this purpose, 3 components of knowledge bases(triple ontology, concept-based knowledge base, inverted file) were constructed and 3 search engines(search engine JENA for rule-based reasoning, Concept-based search engine, keyword-based Lucene retrieval engine) were implemented to measure their performance. As a result, concept-based retrieval engine showed the best performance, followed by ontology-based Jena retrieval engine, and then by a normal keyword search engine.
https://doi.org/10.14699/kbiblia.2011.22.2.141 인용 PDF KSCI

Design of an Efficient Keyword-based Retrieval System Using Concept lattice (개념 망을 이용한 키워드 기반의 효율적인 정보 검색 시스템 설계)

Ma, Jin;Jeon, In ho;Choi, Young keun
- Journal of Internet Computing and Services
- /
- v.16 no.3
- /
- pp.43-57
- /
- 2015
In this thesis was conducted to propose a method for efficient information retrieval using concept lattices. Since this thesis designed a new system based on ordinary concept lattices, it has the same approach method as ontology, but this thesis proposes new concept lattices to be used by establishing collaborative relations between objects and concepts that users are likely to search information more efficiently. The system suggested by this thesis can be summarized as below. Firstly, this system leads to a collaborative search by using Three kinds of concepts, such as keyword concept lattices, which focus on input key words, expert concept lattices recommended by experts and theme concept lattices, and based on these 3 concept lattices, it will help users search information they want more efficiently. Besides, as the expert concept and the keyword concept become combined, further providing users with the frequency of keyword and the frequency of category, this system can function to recommend key words related to search words entered by users. Another function of this system is to inform users of key words and categories used in users' interested themes by using the theme concept lattices. Secondly, when there is not keyword entered by a user, it is possible for users to achieve the goal of search through the secondary search when this system provides them with key words related to the input keyword. Thirdly, since most of the information is managed while being dispersed, such dispersed and managed information not only has different expression methods but changes as time goes. Accordingly, By using XMDR for efficient data access and integration of distributed information, this thesis proposes a new technique and retrieval system to integrate dispersed data.
https://doi.org/10.7472/jksii.2015.16.3.43 인용 PDF KSCI

Keyword Extraction Using Syntactic Information of Question (질의문의 구문정보를 이용한 키워드 추출)

양수정;서영훈
- Proceedings of the Korea Contents Association Conference
- /
- 2003.11a
- /
- pp.190-194
- /
- 2003
자연언어 질의문에서 추출된 키워드들은 정답추출에 미치는 비중이 다른 경우가 많지만 키워드들에 대해 상대적인 가중치를 부여하기가 어렵다. 본 논문에서는 이러한 문제점을 해결하기 위하여 질의 문장의 구문 정보를 이용하여 중심키워드와 일반키워드들로 구분하였으며 이를 기반으로 키워드들 간의 가중치 부여 방법을 제안한다. 질의문 코퍼스로부터 질문 유형을 분석하여 구문을 추출하고 추출된 구문정보를 이용하여 질의문에서 키워드들을 추출한다. 이렇게 얻어진 키워드들을 이용하여 다량의 문서들 속에서 중심키워드와 일반키워드들 간의 불린 검색을 통해 질의문의 정답이 포함되었을 가능성이 큰 단락을 추출하고, 질의문과 추출된 단락간의 유사도 측정을 통해 단락을 순위화 한다. 본 논문에서 제안하는 시스템은 질의문의 정답이 포함된 단락추출에 대한 정확도를 향상시킬 것으로 기대된다.
PDF

Search Result 1,104, Processing Time 0.039 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)