• Title/Summary/Keyword: Big Data Analytic

Search Result 68, Processing Time 0.028 seconds

An Efficient Method for Design and Implementation of Tweet Analysis System (효율적인 트윗 분석 시스템 설계 및 구현 방법)

  • Choi, Minseok
    • Journal of Digital Convergence
    • /
    • v.13 no.2
    • /
    • pp.43-50
    • /
    • 2015
  • Since the popularity of social network services (SNS) rise, the data produced from them is rapidly increased. The SNS data includes personal propensity or interest and propagates rapidly so there are many requests on analyzing the data for applying the analytic results to various fields. New technologies and services for processing and analyzing big data in the real-time are introduced but it is hard to apply them in a short time and low coast. In this paper, an efficient method to build a tweet analysis system without inducing new technologies or service platforms for handling big data is proposed. The proposed method was verified through building a prototype monitoring system to collect and analyze tweets using the MySQL database and the PHP scripts.

Utilization of SNS Review Data for a Comparison between Low Cost Carrier and Full Service Carrier (SNS 리뷰데이터의 활용 : 저가항공사와 대형항공사를 중심으로)

  • Woo, Mina
    • Journal of Information Technology Services
    • /
    • v.17 no.3
    • /
    • pp.1-16
    • /
    • 2018
  • There exist a number of studies pertaining to the determinants of customer satisfaction between low-cost and full-service carriers in the airline industry. Most studies measured service quality using SERVQUAL based on a survey method. This study offers a new perspective by employing a big data analytic approach using SNS data, which reflects the immediate response of customers as well as trends in real time. This study chose eight factors from TripAdvisor's customer review site as determinants of customer satisfaction and compared the differences between low-cost and full-service airlines. The factors analyzed were seat comfort, customer service, cleanliness, food and beverage, legroom, entertainment, value for money, and check-in and boarding. Additionally, ratings from domestic and foreign customers were compared. The findings show that customer service and value for money are significant factors in satisfaction with low-cost airlines while all variables except legroom and entertainment are significant for full-service airlines. The results show that SNS-based data and analysis of big data are important for improving decision-making effectiveness and increasing customer satisfaction in the airline industry.

Clustering of Smart Meter Big Data Based on KNIME Analytic Platform (KNIME 분석 플랫폼 기반 스마트 미터 빅 데이터 클러스터링)

  • Kim, Yong-Gil;Moon, Kyung-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.13-20
    • /
    • 2020
  • One of the major issues surrounding big data is the availability of massive time-based or telemetry data. Now, the appearance of low cost capture and storage devices has become possible to get very detailed time data to be used for further analysis. Thus, we can use these time data to get more knowledge about the underlying system or to predict future events with higher accuracy. In particular, it is very important to define custom tailored contract offers for many households and businesses having smart meter records and predict the future electricity usage to protect the electricity companies from power shortage or power surplus. It is required to identify a few groups with common electricity behavior to make it worth the creation of customized contract offers. This study suggests big data transformation as a side effect and clustering technique to understand the electricity usage pattern by using the open data related to smart meter and KNIME which is an open source platform for data analytics, providing a user-friendly graphical workbench for the entire analysis process. While the big data components are not open source, they are also available for a trial if required. After importing, cleaning and transforming the smart meter big data, it is possible to interpret each meter data in terms of electricity usage behavior through a dynamic time warping method.

A Study on Policy Priorities for Implementing Big Data Analytics in the Social Security Sector : Adopting AHP Methodology (AHP분석을 활용한 사회보장부문 빅 데이터 활용가능 영역 탐색 연구)

  • Ham, Young-Jin;Ahn, Chang-Won;Kim, Ki-Ho;Park, Gyu-Beom;Kim, Kyoung-June;Lee, Dae-Young;Park, Sun-Mi
    • Journal of Digital Convergence
    • /
    • v.12 no.8
    • /
    • pp.49-60
    • /
    • 2014
  • The primary purpose of this paper is to find out what issues are important in the Social Security sector, and then, through AHP methodology, this study analyzes what kind of big data methodologies and projects can be implemented to solves these issues. To the aim, this paper first confirmed 8 big data projects from reviewing all issues in the Social Security sector such as administrative works and social policies. After the result of pairwise comparison, policy validity is most important factors rather then effectiveness and practicability. With regard to the priorities among sub-big data projects, the project about preventing improper recipients has come out the most important project in terms of validity, effectiveness and practicability. And the results showed that the project about outreaching and reducing a blind spot on the welfare sector is weighed as a significant project. The results of this paper, in particular 8 sub-big data projects, will be useful to anyone who is interested in using big data and its methodologies for the social welfare sector.

Dynamic Cluster Management of Hadoop Distributed Filesystem (하둡 분산 파일시스템의 동적 클러스터 관리 기법)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.435-437
    • /
    • 2016
  • Hadoop Distributed File System(HDFS) is a file system for distributed processing of big data by replicating data to distributed data nodes. HDFS cluster shows a great scalability up to thousands of nodes, but it assumes a exclusive node cluster with numerous nodes for the big data processing. Various operational-purpose worker systems used by office are hardly considered as a part of cluster. This paper discusses this problem and proposes a dynamic cluster management technique to increase storage capability and analytic performance of hadoop cluster. The propsed technique can add legacy systems to the cluster and can remove them from the cluster dynamically depending on their availability.

  • PDF

Analysis of big data using Rhipe (Rhipe를 활용한 빅데이터 처리 및 분석)

  • Ko, Youngjun;Kim, Jinseog
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.975-987
    • /
    • 2013
  • The Hadoop system was developed by the Apache foundation based on GFS and MapReduce technologies of Google. Many modern systems for managing and processing the big data have been developing based on the Hadoop because the Hadoop was designed for scalability and distributed computing. The R software has been considered as a well-suited analytic tool in the Hadoop based systems because the R is flexible to other languages and has many libraries for complex analyses. We introduced Rhipe which is a R package supporting MapReduce programming easily under the Hadoop system, and implemented a MapReduce program using Rhipe for multiple regression especially. In addition, we compared the computing speeds of our program with the other packages (ff and bigmemory) for processing the large data. The simulation results showed that our program was more fast than ff and bigmemory as the size of data increases.

A Survey on the Performance Comparison of Map Reduce Technologies and the Architectural Improvement of Spark

  • Raghavendra, GS;Manasa, Bezwada;Vasavi, M.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.121-126
    • /
    • 2022
  • Hadoop and Apache Spark are Apache Software Foundation open source projects, and both of them are premier large data analytic tools. Hadoop has led the big data industry for five years. The processing velocity of the Spark can be significantly different, up to 100 times quicker. However, the amount of data handled varies: Hadoop Map Reduce can process data sets that are far bigger than Spark. This article compares the performance of both spark and map and discusses the advantages and disadvantages of both above-noted technologies.

DEA-AR/AHP Model Design for Efficiency Evaluation of Metropolitan Rapid Transit (지하철 효율성 평가를 위한 DEA-AR/AHP 모형 설계)

  • Sim, Gwang-Sic;Kim, Jae-Yun
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.34 no.3
    • /
    • pp.105-124
    • /
    • 2009
  • Data Envelopment Analysis (DEA) is a methodology of computing the relative efficiency of each decision making unit (DMU) by comparing it with other DMUs having similar input and output structure. In this paper, we compare the efficiency of Korean rail transit corporations using DEA. To do this, we design a DEA-AR/AHP model, and evaluate efficiency by comparing the subway operating agencies of six big cities. The analysis reveals that Seoul Metro and Seoul city railroad construction turn out to be the most efficient groups. The result of this research can provide helpful information for effective management in a domestic subway operating agency.

Finding Pluto: An Analytics-Based Approach to Safety Data Ecosystems

  • Barker, Thomas T.
    • Safety and Health at Work
    • /
    • v.12 no.1
    • /
    • pp.1-9
    • /
    • 2021
  • This review article addresses the role of safety professionals in the diffusion strategies for predictive analytics for safety performance. The article explores the models, definitions, roles, and relationships of safety professionals in knowledge application, access, management, and leadership in safety analytics. The article addresses challenges safety professionals face when integrating safety analytics in organizational settings in four operations areas: application, technology, management, and strategy. A review of existing conventional safety data sources (safety data, internal data, external data, and context data) is briefly summarized as a baseline. For each of these data sources, the article points out how emerging analytic data sources (such as Industry 4.0 and the Internet of Things) broaden and challenge the scope of work and operational roles throughout an organization. In doing so, the article defines four perspectives on the integration of predictive analytics into organizational safety practice: the programmatic perspective, the technological perspective, the sociocultural perspective, and knowledge-organization perspective. The article posits a four-level, organizational knowledge-skills-abilities matrix for analytics integration, indicating key organizational capacities needed for each area. The work shows the benefits of organizational alignment, clear stakeholder categorization, and the ability to predict future safety performance.

Some Considerations on the Problems of PSA(Pulse Sequence Analysis) as a Partial Discharge Analysis Method (부분방전 해석 방법으로 PSA(Pulse Sequence Analysis)의 문제점에 대한 고찰)

  • Kim, Jeong-Tae;Lee, Ho-Keun
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2004.11a
    • /
    • pp.327-330
    • /
    • 2004
  • Because of its effectiveness for the PD(partial discharge) pattern recognition, PSA(Pulse Sequence Analysis) has been considered as a new analytic method instead of conventional PRPDA(Phase Resolved Partial Discharge Analysis). However, PSA has a big problem that can misanalyze patterns in case of data missing resulting from poor sensitivity because it analyses the correlation between sequential pulses, which leads to hesitate to apply it to on-site. Therefore, in this paper, the problems of PSA such as data missing and noise adding cases were investigated. For the purpose, PD data obtained from various defects including noise adding data were used and analysed, The result showed that both cases can cause fatal errors in recognizing PD patterns. In case of the data missing, the error depends on the kinds of defect and the degree of degradation. Also, it could be noticed that the error due to adding noises was larger than that due to some data missing.

  • PDF