• Title/Summary/Keyword: BM25

Search Result 311, Processing Time 0.028 seconds

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.

The Burst Effect Analysis of 2.5 Gb/s TDM-PON Systems Using a SOA Link Extender (반도체광증폭기로 전송거리 확장된 2.5 Gb/s TDM-PON에서 버스트 효과에 의한 신호왜곡 분석)

  • Choi, Bo-Hun;Lee, Sang Soo
    • Korean Journal of Optics and Photonics
    • /
    • v.23 no.1
    • /
    • pp.6-11
    • /
    • 2012
  • A bidirectional TDM-PON link to support 2.5 Gb/s upstream signals of 256 ONUs was considered for an extended transmission distance of 50 km. The power budget of the link was 58 dB for the upstream signal and a SOA was applied as a link extender which had a 25 dB gain. Receiver sensitivity of the upstream signal was -25 dBm for -30 dBm input power to the SOA. When the input power was -10 dBm, pulse overshooting caused by gain transient of the SOA was maximum at 45% and the signal performance degradation gave a power penalty of 1.55 dB for $10^{-12}$ BER. However the penalties diminished rapidly and became negligible as the input power went below -15 dBm. So this input power dynamic range of up to -15 dBm means that it is not positively necessary to use gain control methods for the next generation TDM-PON systems.

Measurement of Electromagnetic Wave for the Selection of Certification Test Space at GSM Band (GSM 대역용 휴대전화 인증 시험 공간 확보를 위한 전파 환경 측정)

  • Park, Chul-Keun;Min, Kyeong-Sik
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.18 no.9
    • /
    • pp.1030-1038
    • /
    • 2007
  • This paper presents the measurement results of strength of electromagnetic wave for GSM-900/GSM-1800 band which is used in Europe. The Giryong village and Sosan field are selected as candidate regions according to the measurement results in Gijang-gun, Busan. The vertical polarizations is about 12 dBm higher than horizontal polarization at two candidate places, and it is measured 25 dBm lower than urban. The maximum value of measured strength of vertical polarizations in the cellular/GSM-900 bands are -65 dBm at Giryong village and -69 dBm at Sosan field, respectively. The maximum value of measured results of PCS/GSM-1800 bands are -90.5 dBm at Giryong village and -85 dBm at Sosan field, respectively, We confirm that the receiving strength of electromagnetic wave are very weak below -65 dBm at two candidate places and the signals of GSM frequency bands not affect to conversional system, then it is considered as a suitable place for GSM mobile field test.

A Model for Minimum Price Search of Processed Food Items on Online Platforms Based on Quantity and Weight (온라인 가공식품의 수량과 중량에 따른 최저가격 검색 모델)

  • Tae-Min Choi;Heui-Seok Lim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.458-460
    • /
    • 2023
  • 가공식품이라는 특정 도메인에서는 기존 검색엔진에서 많이 활용되는 BM25 만을 가지고 최저가 검색하는 데는 어려움이 있다. 본 논문에서는 BM25 외에도 검색의 정확성을 높이기 위해 HuggingFace 에 공개되어 있는 KoELECTRA 를 활용하여 개체명 인식(Named Entity Recognition 과 이진 분류모델(Binary Classification)을 Fine-tuning 하고 BM25 와 연계하여 구축한 검색시스템을 제안한다. 기존의 BM25 대비 성능 평가를 통해 효과를 검증하였다.

Characteristics of Photon Beam through a Handmade Build-Up Modifier as a Substitute of a Bolus (Bolus를 대체하기 위해 자체 제작된 선량상승영역 변환기를 투과한 광자선의 특성)

  • Kim, Sung Joon;Lee, Seoung Jun;Moon, Su Ho;Seol, Ki Ho;Lee, Jeong Eun
    • Progress in Medical Physics
    • /
    • v.25 no.4
    • /
    • pp.225-232
    • /
    • 2014
  • We evaluated the effect of scatter on a build-up region based on the measured percent depth dose (PDD) of high-energy photon beams that penetrated a handmade build-up modifier (BM) as a substitute of bolus. BM scatter factors ($S_{BM}$) were calculated based on the PDDs of photon beams that penetrated through the BM. The calculated $S_{BM}$ values were normalized to 1 at the square field side (SFS) of 30 mm without a BM. For the largest SFS (200 mm), the SBM values for a 6-MV beam were 1.331, 1.519, 1.598, 1.641, and 1.657 for the corresponding BM thickness values. For a 10-MV beam, the $S_{BM}$ values were 1.384, 1.662, 1.825, 1.913, and 2.001 for the corresponding BM thickness values. The BM yielded 76% of the bolus efficiency. We expect BM to become useful devices for deep-set patient body parts to which it is difficult to apply a bolus.

Design and Fabrication of a HBT Power Amplifier for Quasi Millimeter-wave Broadband Wireless Local Loop Applications (준밀리미터파 BWLL용 HBT 전력증폭기 설계 및 제작)

  • 김창우;채규성
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.3C
    • /
    • pp.234-240
    • /
    • 2002
  • A power amplifier with AlGaAs/InGaAs/GaAs HBT's has been developed for customer premise equipments of the quasi millimeter-wave frequency-band broadband wireless local loop(BWLL) system. Parameters of the linear and nonlinear equivalent circuits for a common base HBT have been extracted by a fitting method. The amplifier has been designed through the linear and nonlinear circuit simulations and fabricated on a ceramic substrate for a hybrid IC. The amplifier has produced a 25.5-dBm output power with 35% power-added efficiency(PAE) at 24.4 GHz and achieved a 7.5-dB linear power gain at 24.8 GHz. In 24.25 ∼24.75 GHz band, the amplifier has exhibited a saturated output over larger than 22 dBm and PAE higher than 25%.

Analysis of Optimum Impedance for X-Band GaN HEMT using Load-Pull (로드-풀을 이용한 X-Band GaN HEMT의 최적 임피던스 분석)

  • Kim, Min-Soo;Rhee, Young-Chul
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.5
    • /
    • pp.621-627
    • /
    • 2011
  • In this paper, we analysed performance for on-wafer GaN HEMT using load-pull in X-band, and studied optimum impedance point based on analysis result. We suggested method of optimum performance device by analysis of optimum impedance for solid state device on-wafer condition before packaging. The measured device is gate length 0.25um, and gate width is 400um, 800um. device 400um is performed $P_{sat}$=33.16dBm, PAE=67.36%, Gain=15.16dBm, and device 800um is performed $P_{sat}$=35.91dBm, PAE=69.23%, Gain=14.87dBm.

A BM25 based Passage Retrieval System for Developing an Efficient Question and Answering System (효율적인 질의응답시스템 개발을 위한 BM25기반의 단락 검색 시스템)

  • Lim, Heui Seok;Lee, Yong Shin;Rim, Hae Chang
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.4
    • /
    • pp.23-30
    • /
    • 2003
  • This paper proposes a passage retrieval system based on Okapi's BM25 for developing an efficient QA system and evaluates performances of the passage retrieval system. The test collection of TREC Q&A track which is composed of about one million documents was indexed and a hundred queries of TREC Q&A track are used as testing queries. The experimental results shows that the proposed passage retrieval system can reach to 100% recall rate by searching in only 1700 sentences while the conventional document retrieval system have to search about 120 thousands sentences which are about 70 times more than the proposed passage retrieval system.

  • PDF

BERT Sparse: Keyword-based Document Retrieval using BERT in Real time (BERT Sparse: BERT를 활용한 키워드 기반 실시간 문서 검색)

  • Kim, Youngmin;Lim, Seungyoung;Yu, Inguk;Park, Soyoon
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.3-8
    • /
    • 2020
  • 문서 검색은 오래 연구되어 온 자연어 처리의 중요한 분야 중 하나이다. 기존의 키워드 기반 검색 알고리즘 중 하나인 BM25는 성능에 명확한 한계가 있고, 딥러닝을 활용한 의미 기반 검색 알고리즘의 경우 문서가 압축되어 벡터로 변환되는 과정에서 정보의 손실이 생기는 문제가 있다. 이에 우리는 BERT Sparse라는 새로운 문서 검색 모델을 제안한다. BERT Sparse는 쿼리에 포함된 키워드를 활용하여 문서를 매칭하지만, 문서를 인코딩할 때는 BERT를 활용하여 쿼리의 문맥과 의미까지 반영할 수 있도록 고안하여, 기존 키워드 기반 검색 알고리즘의 한계를 극복하고자 하였다. BERT Sparse의 검색 속도는 BM25와 같은 키워드 기반 모델과 유사하여 실시간 서비스가 가능한 수준이며, 성능은 Recall@5 기준 93.87%로, BM25 알고리즘 검색 성능 대비 19% 뛰어나다. 최종적으로 BERT Sparse를 MRC 모델과 결합하여 open domain QA환경에서도 F1 score 81.87%를 얻었다.

  • PDF

Spectral Efficiency 0f Symmetric Balance Incomplete Block Design Codes (Symmetric Balance Incomplete Block Design Code의 Spectral Efficiency)

  • Jhee, Yoon Kyoo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.1
    • /
    • pp.117-123
    • /
    • 2013
  • By calculating the spectral efficiency of symmetric balance incomplete block design(BIBD) codes satisfying BER=$10^{-9}$, it can be found that ideal BIBD code design with m=2 and various q's is effective when effective power is high($P_{sr}=-10$ dBm). But BIBD code design with q > 2 and various m's can be effective when effective power is low($P_{sr}=-25$ dBm).