Search | Korea Science

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
- Journal of Internet Computing and Services
- /
- v.14 no.6
- /
- pp.71-84
- /
- 2013
Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.
https://doi.org/10.7472/jksii.2013.14.6.71 인용 PDF KSCI

An Empirical Study on the Determinants of Supply Chain Management Systems Success from Vendor's Perspective (참여자관점에서 공급사슬관리 시스템의 성공에 영향을 미치는 요인에 관한 실증연구)

Kang, Sung-Bae;Moon, Tae-Soo;Chung, Yoon
- Asia pacific journal of information systems
- /
- v.20 no.3
- /
- pp.139-166
- /
- 2010
The supply chain management (SCM) systems have emerged as strong managerial tools for manufacturing firms in enhancing competitive strength. Despite of large investments in the SCM systems, many companies are not fully realizing the promised benefits from the systems. A review of literature on adoption, implementation and success factor of IOS (inter-organization systems), EDI (electronic data interchange) systems, shows that this issue has been examined from multiple theoretic perspectives. And many researchers have attempted to identify the factors which influence the success of system implementation. However, the existing studies have two drawbacks in revealing the determinants of systems implementation success. First, previous researches raise questions as to the appropriateness of research subjects selected. Most SCM systems are operating in the form of private industrial networks, where the participants of the systems consist of two distinct groups: focus companies and vendors. The focus companies are the primary actors in developing and operating the systems, while vendors are passive participants which are connected to the system in order to supply raw materials and parts to the focus companies. Under the circumstance, there are three ways in selecting the research subjects; focus companies only, vendors only, or two parties grouped together. It is hard to find researches that use the focus companies exclusively as the subjects probably due to the insufficient sample size for statistic analysis. Most researches have been conducted using the data collected from both groups. We argue that the SCM success factors cannot be correctly indentified in this case. The focus companies and the vendors are in different positions in many areas regarding the system implementation: firm size, managerial resources, bargaining power, organizational maturity, and etc. There are no obvious reasons to believe that the success factors of the two groups are identical. Grouping the two groups also raises questions on measuring the system success. The benefits from utilizing the systems may not be commonly distributed to the two groups. One group's benefits might be realized at the expenses of the other group considering the situation where vendors participating in SCM systems are under continuous pressures from the focus companies with respect to prices, quality, and delivery time. Therefore, by combining the system outcomes of both groups we cannot measure the system benefits obtained by each group correctly. Second, the measures of system success adopted in the previous researches have shortcoming in measuring the SCM success. User satisfaction, system utilization, and user attitudes toward the systems are most commonly used success measures in the existing studies. These measures have been developed as proxy variables in the studies of decision support systems (DSS) where the contribution of the systems to the organization performance is very difficult to measure. Unlike the DSS, the SCM systems have more specific goals, such as cost saving, inventory reduction, quality improvement, rapid time, and higher customer service. We maintain that more specific measures can be developed instead of proxy variables in order to measure the system benefits correctly. The purpose of this study is to find the determinants of SCM systems success in the perspective of vendor companies. In developing the research model, we have focused on selecting the success factors appropriate for the vendors through reviewing past researches and on developing more accurate success measures. The variables can be classified into following: technological, organizational, and environmental factors on the basis of TOE (Technology-Organization-Environment) framework. The model consists of three independent variables (competition intensity, top management support, and information system maturity), one mediating variable (collaboration), one moderating variable (government support), and a dependent variable (system success). The systems success measures have been developed to reflect the operational benefits of the SCM systems; improvement in planning and analysis capabilities, faster throughput, cost reduction, task integration, and improved product and customer service. The model has been validated using the survey data collected from 122 vendors participating in the SCM systems in Korea. To test for mediation, one should estimate the hierarchical regression analysis on the collaboration. And moderating effect analysis should estimate the moderated multiple regression, examines the effect of the government support. The result shows that information system maturity and top management support are the most important determinants of SCM system success. Supply chain technologies that standardize data formats and enhance information sharing may be adopted by supply chain leader organization because of the influence of focal company in the private industrial networks in order to streamline transactions and improve inter-organization communication. Specially, the need to develop and sustain an information system maturity will provide the focus and purpose to successfully overcome information system obstacles and resistance to innovation diffusion within the supply chain network organization. The support of top management will help focus efforts toward the realization of inter-organizational benefits and lend credibility to functional managers responsible for its implementation. The active involvement, vision, and direction of high level executives provide the impetus needed to sustain the implementation of SCM. The quality of collaboration relationships also is positively related to outcome variable. Collaboration variable is found to have a mediation effect between on influencing factors and implementation success. Higher levels of inter-organizational collaboration behaviors such as shared planning and flexibility in coordinating activities were found to be strongly linked to the vendors trust in the supply chain network. Government support moderates the effect of the IS maturity, competitive intensity, top management support on collaboration and implementation success of SCM. In general, the vendor companies face substantially greater risks in SCM implementation than the larger companies do because of severe constraints on financial and human resources and limited education on SCM systems. Besides resources, Vendors generally lack computer experience and do not have sufficient internal SCM expertise. For these reasons, government supports may establish requirements for firms doing business with the government or provide incentives to adopt, implementation SCM or practices. Government support provides significant improvements in implementation success of SCM when IS maturity, competitive intensity, top management support and collaboration are low. The environmental characteristic of competition intensity has no direct effect on vendor perspective of SCM system success. But, vendors facing above average competition intensity will have a greater need for changing technology. This suggests that companies trying to implement SCM systems should set up compatible supply chain networks and a high-quality collaboration relationship for implementation and performance.
PDF KSCI

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

Choi, Hochang;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.69-94
- /
- 2017
Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.
https://doi.org/10.13088/jiis.2017.23.3.069 인용 PDF KSCI

Search Result 1,693, Processing Time 0.026 seconds

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

An Empirical Study on the Determinants of Supply Chain Management Systems Success from Vendor's Perspective (참여자관점에서 공급사슬관리 시스템의 성공에 영향을 미치는 요인에 관한 실증연구)

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)