Browse > Article

A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments  

Yoon, Tai-Jin (부산대학교 정보컴퓨터공학부)
Cho, Hwan-Gue (부산대학교 정보컴퓨터공학부)
Chung, Woo-Keun (부산대학교 정보컴퓨터공학부)
Abstract
Advancing of mobile device is remarkable, so the research on mobile input device is getting more important issue. There are lots of input devices such as keypad, QWERTY keypad, touch and speech recognizer, but they are not as convenient as typical keyboard-based desktop input devices so input strings usually contain many typing errors. These input errors are not trouble with communication among person, but it has very critical problem with searching in database, such as dictionary and address book, we can not obtain correct results. Especially, Hangeul has more than 10,000 different characters because one Hangeul character is made by combination of consonants and vowels, frequency of error is higher than English. Generally, suffix tree is the most widely used data structure to deal with errors of query, but it is not enough for variety errors. In this paper, we propose fast approximate Korean word searching system, which allows variety typing errors. This system includes several algorithms for applying general approximate string searching to Hangeul. And we present profanity filters by using proposed system. This system filters over than 90% of coined profanities.
Keywords
Hangeul string; approximate string matching; global alignment;
Citations & Related Records
연도 인용수 순위
  • Reference
1 http://en.wikipedia.org/wiki/scunthorpe problem.
2 Chang-Keon Ryu, Hyong-Jun Kim, and Hwan-Gue Cho. Reconstructing evolution process of documents in spatio-temporal analysis. In ICCIT '08: Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology, pp.136-142, Washington, DC, USA, 2008. IEEE Computer Society.
3 Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, and D.J. Basic local alignment search tool. Journal of Molecular Biology., 215, 1990.
4 Antonin Guttman. R-trees: a dynamic index structure for spatial searching. Readings in database systems, pp.599-609, 1988.
5 http://en.wikipedia.org/wiki/swearfilter.
6 Korea Game Industry Agency, Sound Game language guide research, 2008.
7 Shekhar Dhupelia. Designing a vulgarity filtering system. In Game Programming Gems 5. Charles River Media, 2005.
8 Lai C. An empirical study of three machine learning methods for spam filtering. Know-Based System, vol.20, no.3, pp.249-254, 2007.   DOI   ScienceOn
9 Kyo Hyeon Park and Jee Hyong Lee. Developing a vulgarity filtering system for online games using svm. In Proceedings of the Korean Institute of Information Scientists and Engineers Autumn, 2006.
10 Ramachandran A Feamster N and Vempala S. Filtering spam with behavioral blacklisting. In Proceedings of the 14th ACM Conference on Computer and Communications Security (Alexandria, Virginia), pp.342-351, 2001.
11 Imoxion, Lewdness/Profanity Filtering System Using Syllable Information, patent, In 2001-0067853, 2001.
12 A. Apostolico. The myriad virtues of subword trees. Combinatorial Algorithms on Words, pp.85-96, 1985.
13 W. A. Burkhard and R. M. Keller. Some approaches to best-match file searching. Commun. ACM, vol.16, no.4, pp.230-236, 1973.   DOI
14 Marios Hadjieleftheriou, Nick Koudas, and Divesh Srivastava. Incremental maintenance of length normalized indexes for approximate string matching. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pp.429-440, New York, NY, USA, 2009. ACM.
15 Chang-Keon Ryu, Hyong-Jun Kim, Seung-Hyun Ji, Gyun Woo, and Hwan-Gue Cho. Detecting and tracing plagiarized documents by reconstruction plagiarism-evolution tree. Computer and Information Technology, 2008. CIT 2008. 8th IEEE International Conference on, pp.119-124, July 2008.
16 Hyong-Jun Kim, Chang-Keon Ryu, and Hwan-Gue Cho. A detecting and tracing algorithm for unauthorized internet-news plagiarism using spatiotemporal document evolution model. In SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing, pp.863-868, New York, NY, USA, 2009. ACM.
17 K. M. Chao and L. Zhang, Sequence Comparison Theory and Methods, Springer, 2009.
18 Sreenivas Gollapudi and Rina Panigrahy. A dictionary for approximate string search and longest prefix search. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management, pp.768-775, New York, NY, USA, 2006. ACM.
19 Trinh N. D. Huynh, Wing-Kai Hon, Tak-Wah Lam, and Wing-Kin Sung. Approximate string matching using compressed suffix arrays. Theoretical Computer Science, vol.352, no.1, pp.240-249, 2006.   DOI   ScienceOn
20 Gonzalo Navarro and Edgar Chavez. A metric index for approximate string matching. Theoretical Computer Science, vol.352, no.1, pp.266-279, 2006.   DOI   ScienceOn
21 Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The r*-tree: an efficient and robust access method for points and rectangles. In SIGMOD '90: Proceedings of the 1990 ACM SIGMOD international conference on Management of data, pp.322-331, New York, NY, USA, 1990. ACM.