Browse > Article
http://dx.doi.org/10.13088/jiis.2012.18.1.091

Personal Information Detection by Using Na$\ddot{i}$ve Bayes Methodology  

Kim, Nam-Won (College of Business Administration, Seoul National University)
Park, Jin-Soo (Graduate School of Business, Seoul National University)
Publication Information
Journal of Intelligence and Information Systems / v.18, no.1, 2012 , pp. 91-107 More about this Journal
Abstract
As the Internet becomes more popular, many people use it to communicate. With the increasing number of personal homepages, blogs, and social network services, people often expose their personal information online. Although the necessity of those services cannot be denied, we should be concerned about the negative aspects such as personal information leakage. Because it is impossible to review all of the past records posted by all of the people, an automatic personal information detection method is strongly required. This study proposes a method to detect or classify online documents that contain personal information by analyzing features that are common to personal information related documents and learning that information based on the Na$\ddot{i}$ve Bayes algorithm. To select the document classification algorithm, the Na$\ddot{i}$ve Bayes classification algorithm was compared with the Vector Space classification algorithm. The result showed that Na$\ddot{i}$ve Bayes reveals more excellent precision, recall, F-measure, and accuracy than Vector Space does. However, the measurement level of the Na$\ddot{i}$ve Bayes classification algorithm is still insufficient to apply to the real world. Lewis, a learning algorithm researcher, states that it is important to improve the quality of category features while applying learning algorithms to some specific domain. He proposes a way to incrementally add features that are dependent on related documents and in a step-wise manner. In another experiment, the algorithm learns the additional dependent features thereby reducing the noise of the features. As a result, the latter experiment shows better performance in terms of measurement than the former experiment does.
Keywords
Na$\ddot{i}$ve Bayes; Document Classification; Personal Information; Privacy; Security; Social Network Service;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Manning, C. D., P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.
2 Mason, R. O., "Four Ethical Issues of the Information Age", MIS Quarterly, Vol.10, No.1 (1986), 5-12.   DOI   ScienceOn
3 Meeder, B., J. Tam, P. G. Kelley, and L. F. Cranor, "RT@IWantPrivacy : Widespread Violation of Privacy Settings in the Twitter Social Network", Web 2.0 Privacy and Security Workshop, IEEE Symposium on Security and Privacy, 2010.
4 Mitchell, T. M., Machine Learning, McGraw Hill, 2010.
5 Parent, W. A., "Privacy : A Brief Survey of the Conceptual Landscape", Santa Clara Computer and High Tech. L. J., Vol.11(1995).
6 Peng, H., F. Long, and C. Ding, "Feature selection based on mutual information : criteria of max-dependency, max-relevance, and min-redundancy", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.27, No.8(2005), 1226-1238.   DOI
7 Schauer, F., "Internet Privacy and The Public- Private Distinction", Jurimetrics, Vol.38, No.4 (1998), 555-564.
8 Smith, R. E., Ben Franklin's Web Site : Privacy and Curiosity from Plymouth Rock to the Internet, Sheridan Books, 2000.
9 Solove, D. J., "A taxonomy of privacy", University of Pennsylvania Law Review, Vol.154, No.3(2006), 477-560.   DOI   ScienceOn
10 Steinbach, M., G. Karypis, and V. Kumar, "A Comparison of Document Clustering Techniques", KDD Workshop on Text Mining, 2000.
11 Tong, S. T., B. Van Der Heide, L. Langwell, and J. B. Walther, "Too Much of a Good Thing? The Relationship Between Number of Friends and Interpersonal Impressions on Facebook", Journal of Computer-Mediated Communication, Vol.13, No.3(2008), 531-549.   DOI   ScienceOn
12 Wacks, R., Privacy, New York : Oxford University Express, 1993.
13 Warren. S. D., L. D. Brandeis, "The Right to Privacy", Harvard Law Review, Vol.4, No.5 (1890), 193-220.   DOI   ScienceOn
14 Weible, R. J., Privacy and data : An empirical study of the influence of types of data and situational context upon privacy perceptions, D.B.A. : Mississippi State University, 1993.
15 Wolak J., D. Finkelhor, K. J. Mitchell, and M. L. Ybarra, "Online "Predators" and Their Victims", Psychology of Violence, Vol.1, No.1(2010), 13-35.   DOI
16 이강신, 이기혁, 박진식, 최일훈, 개인정보보호 기초와 활용, 서울 : 미디어그룹 인포더, 2010
17 권건보, 개인정보보호와 자기정보통제권, 서울 :경인문화사, 2005.
18 방송통신위원회, 트위터에 노출된 나의 정보는 얼마나 될까?, 방송통신위원회, 2011.
19 윤상오, "전자정부 구현을 위한 개인정보보호 정책에관한 연구 : 정부신뢰 구축의 관점에서", 한국지역정보화학회지, 12권 2호(2009), 1-29.
20 이창범, 조정현, "APT(Asia-Pacific Telecommunity)개인정보 및 프라이버시 보호 가이드라인 제정 방안 연구", 개인정보분쟁조정위원, 2003.
21 조동기, 김성우, "인터넷의 일상화와 개인정보 보호", KISDI 이슈리포트, 11권(2003), 10-11.
22 조동기, 김성우, "인터넷의 일상화와 개인정보 보호", KISDI 이슈리포트, 11권(2003), 10-11.
23 황인호, "개인정보보호 제도에서의 규제에 관한연구", 공법연구, 30권 4호(2002), 232-232.
24 Bayes, T., "An essay towards solving a problem in the doctrine of chances. Philos", Philosophical Transactions, Vol.53(1763), 370- 418.   DOI
25 Boyd, D. M. and N. B. Ellison, "Social Network Sites : Definition, History, and Scholarship", Journal of Computer-Mediated Communication, Vol.13, No.1(2008), 210-230.
26 Domingos, P. and M. Pazzani, "Beyond Independence : Conditions for the Optimality of the Simple Bayesian Classifier", Proceedings of the 13th International Conference on Machine Learning, (1996), 105-112.
27 Clarke, R., Beyond the OECD Guidelines : Privacy Protection for the 21st Century, (2000), Roger Clarke's Web-Site : http://www.roge rclarke.com/DV/PP21C.html.
28 Cooley, T. C., Laws of Torts, New York : Praeger, 1888.
29 Davies, S., Big Brother : Britain's web of surveillance and the new technological order, London : Pan, 1996.
30 Gross, R. and A. Acquisti, "Information revelation and privacy in online social networks", WPES '05 Proceedings of the 2005 ACM workshop on Privacy in the electronic society, 2005.
31 Information Commissioner's Office, Notification Handbook-A Complete Guide to Notification. Information Commissioner, 2001.
32 Jagatic, T., N. Johnson, M. Jakobsson, and F. Menczer, "Social phishing", Communications of the ACM, Vol.5, No.10(2007), 94-100.
33 Kobsa, A., "Personalized Hypermedia and International Privacy", Communications of the ACM, Vol.45, No.5(2002), 64-67.
34 Lampe, C. and N. B. Ellison, "Changes in use and perception of facebook", Proceedings of the 2008 ACM conference on Computer supported cooperative work CSCW '08, (2008), 721-730.
35 Lee, D. L., H. Chuang, and K. Seamons, "Document Ranking and the Vector-Space Model", IEEE Software, Vol.14, No.2(1997), 67-75.   DOI   ScienceOn
36 Lewis, D. D., Representation and Learning in Information Retrieval, Doctorial Dissertation : The Graduate School of the University of Massachusetts, 1992.
37 Livingstone, S., "Taking risky opportunities in youthful content creation : teenager's use of social networking sites for intimacy, privacy and self-expression", New Media Society, Vol.10, No.3(2008), 393-411.   DOI   ScienceOn
38 LoPucki, L. M., "Human Identification Theory and the Identity Theft Problem", Tex. L. Rev., Vol.80(2001), 89-135.