Performance Analysis of a Korean Word Autocomplete System and New Evaluation Metrics

한국어 단어 자동완성 시스템의 성능 분석 및 새로운 평가 방법

  • Lee, Songwook (Department of Computer Science and Information Engineering, Korea National University of Transportation)
  • Received : 2015.04.09
  • Accepted : 2015.06.30
  • Published : 2015.07.31


The goal of this paper is to analyze the performance of a word autocomplete system for mobile devices such as smartphones, tablets, and PCs. The proposed system automatically completes a partially typed string into a full word, reducing the time and effort required by a user to enter text on these devices. We collect a large amount of data from Twitter and develop both unigram and bigram dictionaries based on the frequency of words. Using these dictionaries, we analyze the performance of the word autocomplete system and devise a keystroke profit rate and recovery rate as new evaluation metrics that better describe the characteristics of the word autocomplete problem compared to previous measures such as the mean reciprocal rank or recall.

본 연구의 목적은 스마트폰이나 태블릿 PC와 같이 문자 입력이 수월하지 않은 모바일 기기에서 사용자로 하여금 최소한의 키입력을 통해 최대한 빠르고 정확히 원하는 단어를 얻을 수 있도록 도와주는 단어 자동완성 시스템의 성능을 평가하는 것이다. 우리는 트위터에서 대량의 데이터를 수집하였으며, 수집된 데이터의 사용빈도에 따라 유니그램(unigram) 사전과 바이그램(bigram) 사전을 각각 구축하였다. 구축된 사전을 사용한 단어 자동완성 시스템의 성능을 평가하였으며 기존의 평가방법보다 단어 자동완성 기능의 특성을 잘 반영한 키입력 수익률과 복원율을 새로운 평가 방법으로 제안하였다.



  1. E. Fredkin, "Trie memory," Communications of the ACM, vol. 3, no. 9, pp. 490-499, 1960.
  2. K. J. Lee and S. W, Lee, "Error-driven noun-connection rule extraction for morphological analysis," Journal of the Korean Society of Marine Engineering, vol. 36, no. 8, pp. 1123-1128, 2012 (in Korean).
  3. A. Acharya, H. Zhu, and K. Shen, "Adaptive algorithms for cache-efficient trie search," Proceedings of the International Workshop on Algorithm Engineering and Experimentation, pp. 296-311, 1999.
  4. J. Aoe, K. Morimoto, M. Shishibori, and K. H. Park, "A trie compaction algorithm for a large set of keys," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 3, pp. 476-491, 1996.
  5. A. Nandi and H. V. Jagadish, "Effective phrase prediction," Proceedings of Very Large Data Bases 2007, pp. 219-230, 2007.
  6. M. Arias, J. M. Cantera, J. Vegas, P. del la Feunte, J. C. Alonso, G. G Bernardo, C. Llamas, and A. Zubizarreta, "Context-based personalization for mobile web search," Proceedings of Personalized Access, Profile Management, and Context Awareness in Databases, pp. 33-39, 2008.
  7. Z. Bar-Yossef and N. Kraus, "Context-sensitive query auto-completion," Proceedings of WWW, pp. 107-116, 2011.
  8. H. Bast and I. Weber, "Type less, find more: fast autocompletion search with a succinct index," Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 364-371, 2006.
  9. H. Bast and I. Weber, "When you're lost for words: Faceted search with autocompletion," Proceedings of the SIGIR 2006 Workshop on Faceted Search, pp. 31-35, 2006.
  10. H. Bast, C. W. Mortensen, and I. Weber, "Output-sensitive autocompletion search," String Processing and Information Retrieval, vol. 11, no. 4, pp. 269-286, 2008.
  11. A. van den Bosch, "Effects of context and recency in scaled word completion," Computational Linguistics in the Netherlands Journal, vol. 1, pp. 79-94, 2011.
  12. S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," Journal of Molecular Biology, vol. 48, no.3, pp. 443-453, 1970.