• Title/Summary/Keyword: BM25 알고리즘

Search Result 8, Processing Time 0.023 seconds

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.

BERT Sparse: Keyword-based Document Retrieval using BERT in Real time (BERT Sparse: BERT를 활용한 키워드 기반 실시간 문서 검색)

  • Kim, Youngmin;Lim, Seungyoung;Yu, Inguk;Park, Soyoon
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.3-8
    • /
    • 2020
  • 문서 검색은 오래 연구되어 온 자연어 처리의 중요한 분야 중 하나이다. 기존의 키워드 기반 검색 알고리즘 중 하나인 BM25는 성능에 명확한 한계가 있고, 딥러닝을 활용한 의미 기반 검색 알고리즘의 경우 문서가 압축되어 벡터로 변환되는 과정에서 정보의 손실이 생기는 문제가 있다. 이에 우리는 BERT Sparse라는 새로운 문서 검색 모델을 제안한다. BERT Sparse는 쿼리에 포함된 키워드를 활용하여 문서를 매칭하지만, 문서를 인코딩할 때는 BERT를 활용하여 쿼리의 문맥과 의미까지 반영할 수 있도록 고안하여, 기존 키워드 기반 검색 알고리즘의 한계를 극복하고자 하였다. BERT Sparse의 검색 속도는 BM25와 같은 키워드 기반 모델과 유사하여 실시간 서비스가 가능한 수준이며, 성능은 Recall@5 기준 93.87%로, BM25 알고리즘 검색 성능 대비 19% 뛰어나다. 최종적으로 BERT Sparse를 MRC 모델과 결합하여 open domain QA환경에서도 F1 score 81.87%를 얻었다.

  • PDF

Assistant Chatbot for Database Design Course (데이터베이스 설계 교과목을 위한 조교 챗봇)

  • Kim, Eun-Gyung;Jeong, Tae-Hun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1615-1622
    • /
    • 2022
  • In order to overcome the limitations of the instructor-centered lecture-style teaching method, recently, flipped learning, a learner-centered teaching method, has been widely introduced. However, despite the many advantages of flipped learning, there is a problem that students cannot solve questions that arise during prior learning in real time. Therefore, in order to solve this problem, we developed DBbot, an assistant chatbot for database design course managed in the flipped learning method. The DBBot is composed of a chatbot app for learners and a chatbot management app for instructors. Also, it's implemented so that questions that instructors can anticipate in advance, such as questions related to class operation and every semester repeated questions related to learning content, can be answered using Google's DialogFlow. It's implemented so that questions that the instructor cannot predict in advance, such as questions related to team projects, can be answered using the question/answer DB and the BM25 algorithm, which is a similarity comparison algorithm.

A BM25 based Passage Retrieval System for Developing an Efficient Question and Answering System (효율적인 질의응답시스템 개발을 위한 BM25기반의 단락 검색 시스템)

  • Lim, Heui Seok;Lee, Yong Shin;Rim, Hae Chang
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.4
    • /
    • pp.23-30
    • /
    • 2003
  • This paper proposes a passage retrieval system based on Okapi's BM25 for developing an efficient QA system and evaluates performances of the passage retrieval system. The test collection of TREC Q&A track which is composed of about one million documents was indexed and a hundred queries of TREC Q&A track are used as testing queries. The experimental results shows that the proposed passage retrieval system can reach to 100% recall rate by searching in only 1700 sentences while the conventional document retrieval system have to search about 120 thousands sentences which are about 70 times more than the proposed passage retrieval system.

  • PDF

Implementation of Battery Management System for Li-ion Battery Considering Self-energy Balancing (셀프에너지 밸런싱을 고려한 리튬이온전지의 Battery Management System 구현)

  • Kim, Ji-Myung;Lee, Hu-Dong;Tae, Dong-Hyun;Ferreira, Marito;Park, Ji-Hyun;Rho, Dae-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.3
    • /
    • pp.585-593
    • /
    • 2020
  • Until now, 29 fire accidents have occurred; 22 of them were caused by the interconnection of renewable energy sources that occurred during the rest period after the lithium-ion battery had been fully charged regardless of the seasons. The fire accidents of ESS were attributed to thermal runaway due to the overcharging of a few cells with the phenomenon of self-energy balancing, which is unintentional current flow from cells with a high SOC to the low cells if the SOC condition of each cell connected in parallel is different. Therefore, this paper proposes a novel configuration and operation algorithm of the BMS to prevent the self-energy balancing of ESS and presents a hybrid SOC estimation algorithm. From the test results of the self-energy balancing phenomenon between aging and normal cells based on the proposed algorithm and BMS, it was confirmed the possibility of self-energy balancing, which is unintentional current flow from cells with a high SOC to cells with a low SOC. In addition, the proposed configuration of the BMS is useful and practical to improve the safety of lithium-ion batteries because the BMS can reliably disconnect a parallel connection of the cells if the self-energy balancing current becomes excessively high.

Implementation of a Sensor Node with Convolutional Channel Coding Capability (컨벌루션 채널코딩 기능의 센서노드 구현)

  • Jin, Young Suk;Moon, Byung Hyun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.19 no.1
    • /
    • pp.13-18
    • /
    • 2014
  • Sensor nodes are used for monitoring and collecting the environmental data via wireless sensor network. The wireless sensor network with various sensor nodes draws attention as a key technology in ubiquitous computing. Sensor nodes has very small memory capacity and limited power resource. Thus, it is essential to have energy efficient strategy for the sensor nodes. Since the sensor nodes are operating on the same frequency bands with ISM frequency bands, the interference by the devices operating on the ISM band degrades the quality of communication integrity. In this paper, the convolutional code is proposed instead of ARQ for the error control for the sensor network. The proposed convolutional code was implemented and the BER performance is measured. For the fixed transmitting powers of -19.2 dBm and -25dBm, the BER with various communication distances are measured. The packet loss rate and the retransmission rate are calculated from the measured BER. It is shown that the porposed method obtained about 9~12% and 12-19% reduction in retransmission rate for -19.2 dBm and -25 dBm respectively.

Usefulness of Deep Learning Image Reconstruction in Pediatric Chest CT (소아 흉부 CT 검사 시 딥러닝 영상 재구성의 유용성)

  • Do-Hun Kim;Hyo-Yeong Lee
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.3
    • /
    • pp.297-303
    • /
    • 2023
  • Pediatric Computed Tomography (CT) examinations can often result in exam failures or the need for frequent retests due to the difficulty of cooperation from young patients. Deep Learning Image Reconstruction (DLIR) methods offer the potential to obtain diagnostically valuable images while reducing the retest rate in CT examinations of pediatric patients with high radiation sensitivity. In this study, we investigated the possibility of applying DLIR to reduce artifacts caused by respiration or motion and obtain clinically useful images in pediatric chest CT examinations. Retrospective analysis was conducted on chest CT examination data of 43 children under the age of 7 from P Hospital in Gyeongsangnam-do. The images reconstructed using Filtered Back Projection (FBP), Adaptive Statistical Iterative Reconstruction (ASIR-50), and the deep learning algorithm TrueFidelity-Middle (TF-M) were compared. Regions of interest (ROI) were drawn on the right ascending aorta (AA) and back muscle (BM) in contrast-enhanced chest images, and noise (standard deviation, SD) was measured using Hounsfield units (HU) in each image. Statistical analysis was performed using SPSS (ver. 22.0), analyzing the mean values of the three measurements with one-way analysis of variance (ANOVA). The results showed that the SD values for AA were FBP=25.65±3.75, ASIR-50=19.08±3.93, and TF-M=17.05±4.45 (F=66.72, p=0.00), while the SD values for BM were FBP=26.64±3.81, ASIR-50=19.19±3.37, and TF-M=19.87±4.25 (F=49.54, p=0.00). Post-hoc tests revealed significant differences among the three groups. DLIR using TF-M demonstrated significantly lower noise values compared to conventional reconstruction methods. Therefore, the application of the deep learning algorithm TrueFidelity-Middle (TF-M) is expected to be clinically valuable in pediatric chest CT examinations by reducing the degradation of image quality caused by respiration or motion.

Korean Baseball League Q&A System Using BERT MRC (BERT MRC를 활용한 한국 프로야구 Q&A 시스템)

  • Seo, JungWoo;Kim, Changmin;Kim, HyoJin;Lee, Hyunah
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.459-461
    • /
    • 2020
  • 매일 게시되는 다양한 프로야구 관련 기사에는 경기 결과, 각종 기록, 선수의 부상 등 다양한 정보가 뒤섞여있어, 사용자가 원하는 정보를 찾아내는 과정이 매우 번거롭다. 본 논문에서는 문서 검색과 기계 독해를 이용하여 야구 분야에 대한 Q&A 시스템을 제안한다. 기사를 형태소 분석하고 BM25 알고리즘으로 얻은 문서 가중치로 사용자 질의에 적합한 기사들을 선정하고 KorQuAD 1.0과 직접 구축한 프로야구 질의응답 데이터셋을 이용해 학습시킨 BERT 모델 기반 기계 독해로 답변 추출을 진행한다. 야구 특화 데이터 셋을 추가하여 학습시켰을 때 F1 score, EM 모두 15% 내외의 정확도 향상을 보였다.

  • PDF