DOI QR코드

DOI QR Code

Embeded-type Search Function with Feedback for Smartphone Applications

스마트폰 애플리케이션을 위한 임베디드형 피드백 지원 검색체

  • Kang, Moonjoong (School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology) ;
  • Hwang, Mintae (Dept. of Information and Communication Engineering, Changwon National University)
  • Received : 2016.12.12
  • Accepted : 2016.12.28
  • Published : 2017.05.31

Abstract

In this paper, we have discussed the search function that can be embedded and used on Android-based applications. We used BM25 to suppress insignificant and too frequent words such as postpositions, Pivoted Length Normalization technique used to resolve the search priority problem related to each item's length, and Rocchio's method to pull items inferred to be related to the query closer to the query vector on Vector Space Model to support implicit feedback function. The index operation is divided into two methods; simple index to support offline operation and complex index for online operation. The implementation uses query inference function to guess user's future input by collating given present input with indexed data and with it the function is able to handle and correct user's error. Thus the implementation could be easily adopted into smartphone applications to improve their search functions.

본 논문에서는 안드로이드 기반의 각종 어플리케이션에 내장시켜 사용가능한 검색체에 대해 연구하였다. 이를 위해 조사와 같이 무의미하지만 자주 사용되는 단어를 빈도수에 따라 억제하는 BM25, 아이템의 길이 편차에 따른 검색 순위 문제를 해결하기 위해 아이템의 길이에 따라 중요도를 보정하는 Pivoted Length Normalization, 그리고 벡터공간 모형 상에서 쿼리 벡터를 관련 있는 것으로 판정된 아이템들의 벡터 그룹으로 끌어당겨 보정하는 Rocchio's Method를 사용해 묵시적 피드백 기능을 지원하도록 하였다. 그리고 색인 작업은 오프라인 동작을 위한 간단 색인과 온라인 동작을 위한 정밀 색인의 두 단계로 나누어 동작성을 보장하도록 하였다. 본 논문에서 연구한 피드백 지원 검색체는 쿼리 유추를 통해 사용자의 입력을 색인된 자료와 비교해 입력 내용을 예측하고 오타 등의 사용자 실수에 대해서도 대응하므로 스마트폰 어플리케이션에 손쉽게 탑재되어 검색 기능을 향상시킬 수 있을 것으로 기대한다.

Keywords

References

  1. K. Erk, and S. Pado, "A structured vector space model for word meaning in context," in Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, pp. 897-906, 2008.
  2. Y. Jun. "Cosine similarity measures for intuitionistic fuzzy sets and their applications," Mathematical and Computer Modelling, vol. 53, no. 1, pp. 91-97, 2010. https://doi.org/10.1016/j.mcm.2010.07.022
  3. L. Muflikhah, and B. Baharudin. Document clustering using concept space and cosine similarity measurement, Institute of Electrical and Electronics Engineers, 2009.
  4. G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, no. 11, pp. 613-620, Nov. 1975. https://doi.org/10.1145/361219.361220
  5. C. Zhai. (2015, March). Vector Space Model : Basic Idea. Text Retrieval and Search Engines [Internet]. Available: https://www.coursera.org/course/textretrieval.
  6. S. Robertson, and H. Zaragoza, "The Probabilistic Relevance Framework: BM25 and beyond," Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333- 89, April 2009. https://doi.org/10.1561/1500000019
  7. C. Zhai. (2015, March). 1.8 TF Transformation : Text Retrieval and Search Engines [Internet]. Available: https://www.coursera.org/course/textretrieval.
  8. S. Buttcher, C. Clarke, and B. Lushman. "Term proximity scoring for ad-hoc retrieval on very large text collections," in Proceedings of the 29th annual international ACM SIGIR conference. ACM, 2006.
  9. A. Signal, C. Buckley, and M. Mitra, "Pivoted document length normalisation," in Proceedings of ACM SIGIR, New York: NY, pp. 21-29, 1996.
  10. T. L. Chung, R. W. Luk, K. F. Wong, K. L. Kwok, and D. L Lee, "Adapting pivoted document- length normalization for query size: Experiments in chinese and english," ACM Transactions on Asian Language Information Processing (TALIP), vol. 5, no. 3, pp. 245-263, 2006. https://doi.org/10.1145/1194936.1194941
  11. Rocchio's Algorithm [Internet]. Available: http://www.cs.cmu.edu/-wcohen/10-605/rocchio.pdf.
  12. G. Salton and C. Buckley, "Improving retrieval performance by relevance feedback," Journal of the American Society for Information Science [Online]. Available: http://citeseerx.ist. psu.edu/viewdoc/summary?doi=10.1.1.92.3553.
  13. Shineware. Komoran ver 2.4 Java Korean Language Morphological Analyzer [Internet]. Available: http://shineware.tistory.com/entry/KOMORAN-ver-24.
  14. Kyro - Java Class Serialization Library [Internet]. Available: http://edoli.tistory.com/87.
  15. C. Zhai. (2015, March). 1.9 Doc Length Normalization: Text Retrieval and Search Engines [Internet]. Available: https://www.coursera.org/course/textretrieval.
  16. M. E. Voorhees, "Variations in relevance judgments and the measurement of retrieval effectiveness," Information processing & management, vol. 36, no. 5, pp. 697-716, 1999. https://doi.org/10.1016/S0306-4573(00)00010-8
  17. C. Zhai. (2015, March). 2.6 Evaluating Ranked Lists: Part 1. Text Retrieval and Search Engines [Internet]. Available: https://www.coursera.org/course/textretrieval.