Browse > Article
http://dx.doi.org/10.7472/jksii.2014.15.6.107

A Design and Development of Big Data Indexing and Search System using Lucene  

Kim, DongMin (School of Computer Science, Kookmin University)
Choi, JinWoo (School of Computer Science, Kookmin University)
Woo, ChongWoo (School of Computer Science, Kookmin University)
Publication Information
Journal of Internet Computing and Services / v.15, no.6, 2014 , pp. 107-115 More about this Journal
Abstract
Recently, increased use of the internet resulted in generation of large and diverse types of data due to increased use of social media, expansion of a convergence of among industries, use of the various smart device. We are facing difficulties to manage and analyze the data using previous data processing techniques since the volume of the data is huge, form of the data varies and evolves rapidly. In other words, we need to study a new approach to solve such problems. Many approaches are being studied on this issue, and we are describing an effective design and development to build indexing engine of big data platform. Our goal is to build a system that could effectively manage for huge data set which exceeds previous data processing range, and that could reduce data analysis time. We used large SNMP log data for an experiment, and tried to reduce data analysis time through the fast indexing and searching approach. Also, we expect our approach could help analyzing the user data through visualization of the analyzed data expression.
Keywords
Big data; Big data Platform; Data Indexing; Data Searching;
Citations & Related Records
연도 인용수 순위
  • Reference
1 E. Hatcher, O. Gospodnetic, and M. McCandless, "Lucene in action", Manning Publications, Aug. 2010.
2 J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. Byers, "Big data: The next frontier for innovation, competition, and productivity", McKinsey Report, 2011.
3 S. Lee, W. Sung, S. Park, "Future of Big Data Technology", KOFST Issue Paper, 2012-03.
4 B. Chung, H. Kim, W. Choi, "Future social and big data Technology", IT Series. NIPA, 2012.
5 R. Kuc, "Apache Solr 4 Cookbook", Packt Publishing, Jan. 2013.
6 F. Junqueira and B. Reed, "ZooKeeper : Distributed Process Coordination.", O'Reilly Media, Inc., Nov. 2013.
7 R. Kuc and M. Rogozinski, "ElasticSearch Server", O'Reilly Media, Inc., Feb. 2013.
8 Lakshman, Avinash, and P. Malik. "Cassandra: a decentralized structured storage system." ACM SIGOPS Operating Systems Review 44.2 (2010): 35-40.
9 J. Luciani, "Lucandra/Solandra: A Cassandra-based Lucene backend.", http://blog.sematext.com/2010/02/09/lucandra-a-cassandr a-based-lucene-backend/
10 D. Yin and D. Liu, "Content-based Image Retrieval based on Hadoop", Mathematical Problems in Engineering, Vol. 2013, Article ID 684615, (2013)
11 A. Narang, V. Agarwal, M. Kedia, and V. Garg, "Highly Scalable Algorithm for Distributed Real-Time Text Indexing", International Conference on High Performance Computing (HiPC), pp. 332-341 (2009)
12 QIAO, Yuan-yuan, et al. "Offline traffic analysis system based on Hadoop." The Journal of China Universities of Posts and Telecommunications 20.5 (2013): 97-103.
13 Fock, Frank, and J. Katz. "SNMP4J-The Object Oriented SNMP API for Java Managers and Agents.", http://snmp4j.org/index.html
14 K. McCloghrie, M. Rose. "RFC 1066-Management Information Base for Network Management of TCP/IP-based Internets.", TWG, Aug. 1988.
15 D. Mauro and K. Schmidt, "Essential SNMP", O'Reilly Media, Inc., Sep. 2005.