• 제목/요약/키워드: Big Data Cluster

검색결과 208건 처리시간 0.021초

빅 데이터의 새로운 고객 가치와 비즈니스 창출을 위한 대응 전략 (Correspondence Strategy for Big Data's New Customer Value and Creation of Business)

  • 고준철;이해욱;정지윤;강경식
    • 대한안전경영과학회지
    • /
    • 제14권4호
    • /
    • pp.229-238
    • /
    • 2012
  • Within last 10 years, internet has become a daily activity, and humankind had to face the Data Deluge, a dramatic increase of digital data (Economist 2012). Due to exponential increase in amount of digital data, large scale data has become a big issue and hence the term 'big data' appeared. There is no official agreement in quantitative and detailed definition of the 'big data', but the meaning is expanding to its value and efficacy. Big data not only has the standardized personal information (internal) like customer information, but also has complex data of external, atypical, social, and real time data. Big data's technology has the concept that covers wide range technology, including 'data achievement, save/manage, analysis, and application'. To define the connected technology of 'big data', there are Big Table, Cassandra, Hadoop, MapReduce, Hbase, and NoSQL, and for the sub-techniques, Text Mining, Opinion Mining, Social Network Analysis, Cluster Analysis are gaining attention. The three features that 'bid data' needs to have is about creating large amounts of individual elements (high-resolution) to variety of high-frequency data. Big data has three defining features of volume, variety, and velocity, which is called the '3V'. There is increase in complexity as the 4th feature, and as all 4features are satisfied, it becomes more suitable to a 'big data'. In this study, we have looked at various reasons why companies need to impose 'big data', ways of application, and advanced cases of domestic and foreign applications. To correspond effectively to 'big data' revolution, paradigm shift in areas of data production, distribution, and consumption is needed, and insight of unfolding and preparing future business by considering the unpredictable market of technology, industry environment, and flow of social demand is desperately needed.

k-NN Join Based on LSH in Big Data Environment

  • Ji, Jiaqi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • 제16권2호
    • /
    • pp.99-105
    • /
    • 2018
  • k-Nearest neighbor join (k-NN Join) is a computationally intensive algorithm that is designed to find k-nearest neighbors from a dataset S for every object in another dataset R. Most related studies on k-NN Join are based on single-computer operations. As the data dimensions and data volume increase, running the k-NN Join algorithm on a single computer cannot generate results quickly. To solve this scalability problem, we introduce the locality-sensitive hashing (LSH) k-NN Join algorithm implemented in Spark, an approach for high-dimensional big data. LSH is used to map similar data onto the same bucket, which can reduce the data search scope. In order to achieve parallel implementation of the algorithm on multiple computers, the Spark framework is used to accelerate the computation of distances between objects in a cluster. Results show that our proposed approach is fast and accurate for high-dimensional and big data.

Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data

  • Abdalla, Hemn Barzan;Ahmed, Awder Mohammed;Al Sibahee, Mustafa A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권5호
    • /
    • pp.1886-1908
    • /
    • 2020
  • With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face a burden in handling them. Additionally, the presence of the imbalance data in big data is a massive concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new indexing algorithm for retrieving big data in the MapReduce framework. In mappers, the data clustering is done based on the Sparse Fuzzy-c-means (Sparse FCM) algorithm. The reducer combines the clusters generated by the mapper and again performs data clustering with the Sparse FCM algorithm. The two-level query matching is performed for determining the requested data. The first level query matching is performed for determining the cluster, and the second level query matching is done for accessing the requested data. The ranking of data is performed using the proposed Monarch chaotic whale optimization algorithm (M-CWOA), which is designed by combining Monarch butterfly optimization (MBO) [22] and chaotic whale optimization algorithm (CWOA) [21]. Here, the Parametric Enabled-Similarity Measure (PESM) is adapted for matching the similarities between two datasets. The proposed M-CWOA outperformed other methods with maximal precision of 0.9237, recall of 0.9371, F1-score of 0.9223, respectively.

전자상거래 이용시 연관성 분석을 통한 맞춤형 상품추천 모델 설계 (Design of customized product recommendation model on correlation analysis when using electronic commerce)

  • ;박기용;최상현
    • 한국융합학회논문지
    • /
    • 제13권3호
    • /
    • pp.203-216
    • /
    • 2022
  • 본 연구에서는 COVID-19의 영향과 온라인 시장을 중심으로 구매패턴이 변화하는 현 경영환경의 시대에서 온라인 배송업체의 구매정보와 상품정보를 기반으로 군집분석과 연관성 분석을 실시하였다. 고객군집, 상품군집, 그리고 교차결합을 통해 데이터를 세분화시켜 결합군집을 생성하여 학문적으로 새로운 방안의 군집분석을 시도하였으며, 각각의 군집분석 결과를 토대로 연관성 분석을 하였다. 연관성 분석 결과, 상대적으로 결합군집에서 더 많은 연관 규칙이 도출 되었으며, 중복률은 더 적은 것으로 분석되어 효율성이 매우 높은 것으로 나타났다. 이는 고객의 니즈에 맞게 상품을 추천하기 위해서는 결합군집이 가장 적합한 모델이라고 판단된다. 결합군집 모델은 소비자에겐 시간 절약과 유용한 정보를 제공하면서, 해당 업체에는 판매량을 증가시키는 등의 긍정적인 효과를 가져올 것으로 사료된다. 향후 연구과제로써, 다양한 특성을 갖고 있는 다수의 온라인 배송업체들을 대상으로 비교·분석한다면 좀 더 명확하고 유의미한 연구결과를 도출할 수 있을것으로 기대된다.

의료클러스터 기반의 빅 데이터 환경에 대한 IP Spoofing 공격 발생시 상호협력 보안 모델 설계 (Designing Mutual Cooperation Security Model for IP Spoofing Attacks about Medical Cluster Basis Big Data Environment)

  • 안창호;백현철;서영건;정원창;박재흥
    • 융합보안논문지
    • /
    • 제16권7호
    • /
    • pp.21-29
    • /
    • 2016
  • 현재 우리사회는 네트워크를 통하여 실시간으로 교류되는 다양한 정보 환경에 노출되어 있다. 특히 정부의 의료정책은 대국민의료서비스 질을 향상시키기 위해 원격진료의 시행을 서두르고 있다. 이러한 원격진료의 시행은 향후 지역에 상관없이 맞춤형 환자 진료를 위한 빅 데이터 기반의 진료 정보 구축도 함께 요구하고 있다. 본 논문은 빅 데이터 기반의 권역별 의료클러스터 구축과 이에 대한 서비스 가용성을 해치는 공격이 발생할 경우 해당 공격을 탐지하고 적절한 대응이 가능한 방어 및 보안 협력모델을 제안하고 있다. 이를 위하여 동일 병원정보시스템으로 전국에 고루 분포된 지방의료원을 권역별 가상 의료클러스터 본부로 하는 네트워크 구성을 제안하였다. 아울러 의료클러스터에 발생할 수 있는 IP Spoofing 공격과 이에 따른 DDoS 공격에 실시간으로 대응 가능한 상호협력 보안 모델을 설계하여 단일 체계, 단일 보안정책이 가지는 한계성도 극복할 수 있도록 하였다.

An Analytic solution for the Hadoop Configuration Combinatorial Puzzle based on General Factorial Design

  • Priya, R. Sathia;Prakash, A. John;Uthariaraj, V. Rhymend
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권11호
    • /
    • pp.3619-3637
    • /
    • 2022
  • Big data analytics offers endless opportunities for operational enhancement by extracting valuable insights from complex voluminous data. Hadoop is a comprehensive technological suite which offers solutions for the large scale storage and computing needs of Big data. The performance of Hadoop is closely tied with its configuration settings which depends on the cluster capacity and the application profile. Since Hadoop has over 190 configuration parameters, tuning them to gain optimal application performance is a daunting challenge. Our approach is to extract a subset of impactful parameters from which the performance enhancing sub-optimal configuration is then narrowed down. This paper presents a statistical model to analyze the significance of the effect of Hadoop parameters on a variety of performance metrics. Our model decomposes the total observed performance variation and ascribes them to the main parameters, their interaction effects and noise factors. The method clearly segregates impactful parameters from the rest. The configuration setting determined by our methodology has reduced the Job completion time by 22%, resource utilization in terms of memory and CPU by 15% and 12% respectively, the number of killed Maps by 50% and Disk spillage by 23%. The proposed technique can be leveraged to ease the configuration tuning task of any Hadoop cluster despite the differences in the underlying infrastructure and the application running on it.

Design and Implementation of Incremental Learning Technology for Big Data Mining

  • Min, Byung-Won;Oh, Yong-Sun
    • International Journal of Contents
    • /
    • 제15권3호
    • /
    • pp.32-38
    • /
    • 2019
  • We usually suffer from difficulties in treating or managing Big Data generated from various digital media and/or sensors using traditional mining techniques. Additionally, there are many problems relative to the lack of memory and the burden of the learning curve, etc. in an increasing capacity of large volumes of text when new data are continuously accumulated because we ineffectively analyze total data including data previously analyzed and collected. In this paper, we propose a general-purpose classifier and its structure to solve these problems. We depart from the current feature-reduction methods and introduce a new scheme that only adopts changed elements when new features are partially accumulated in this free-style learning environment. The incremental learning module built from a gradually progressive formation learns only changed parts of data without any re-processing of current accumulations while traditional methods re-learn total data for every adding or changing of data. Additionally, users can freely merge new data with previous data throughout the resource management procedure whenever re-learning is needed. At the end of this paper, we confirm a good performance of this method in data processing based on the Big Data environment throughout an analysis because of its learning efficiency. Also, comparing this algorithm with those of NB and SVM, we can achieve an accuracy of approximately 95% in all three models. We expect that our method will be a viable substitute for high performance and accuracy relative to large computing systems for Big Data analysis using a PC cluster environment.

빅데이터 군집 분석을 이용한 학습성취도 예측 - 종단 연구를 중심으로 (Predicting Learning Achievement Using Big Data Cluster Analysis - Focusing on Longitudinal Study)

  • 고수정
    • 디지털콘텐츠학회 논문지
    • /
    • 제19권9호
    • /
    • pp.1769-1778
    • /
    • 2018
  • 빅데이터를 활용한 가치가 증대됨에 따라서 기업 뿐 아니라 교육 분야에서도 빅데이터 분석 기술을 활용한 여러 연구가 진행되고 있다. 본 논문에서는 빅데이터 군집 분석을 이용하여 학습성취도를 종단적으로 예측하는 방법을 제안한다. 제안한 방법에서는 한국아동 청소년패널조사(KCYPS) 자료의 중학교 1학년 학생의 학습 습관 유형을 기반으로 학생들을 Kmeans 알고리즘을 이용하여 학습 습관이 비슷한 그룹으로 분류하고, 그룹의 특징을 추출한다. 다음으로, 이와 같이 추출한 그룹의 특징을 이용하여 테스트 집합의 중학교 1학년 학생을 코사인 유사도를 사용하여 비슷한 학습 습관을 갖는 그룹으로 분류한 후, 이웃을 선정하고 학습성취도를 예측하였다. 본 논문에서 제안한 방법은 중학교의 학습 습관이 대학 및 전공 만족도까지 밀접한 영향을 미쳐서 고등학교의 학습성취도 뿐만 아니라 대학 및 전공에 대한 만족도까지도 예측이 가능하다는 것을 증명하였다.

Performance Optimization of Big Data Center Processing System - Big Data Analysis Algorithm Based on Location Awareness

  • Zhao, Wen-Xuan;Min, Byung-Won
    • International Journal of Contents
    • /
    • 제17권3호
    • /
    • pp.74-83
    • /
    • 2021
  • A location-aware algorithm is proposed in this study to optimize the system performance of distributed systems for processing big data with low data reliability and application performance. Compared with previous algorithms, the location-aware data block placement algorithm uses data block placement and node data recovery strategies to improve data application performance and reliability. Simulation and actual cluster tests showed that the location-aware placement algorithm proposed in this study could greatly improve data reliability and shorten the application processing time of I/O interfaces in real-time.

Analysis on Types of Golf Tourism After COVID-19 by using Big Data

  • Hyun Seok Kim;Munyeong Yun;Gi-Hwan Ryu
    • International Journal of Advanced Culture Technology
    • /
    • 제12권1호
    • /
    • pp.270-275
    • /
    • 2024
  • Introduction. In this study, purpose is to analize the types of golf tourism, inbound or outbound, by using big data and see how movement of industry is being changed and what changes have been made during and after Covid-19 in golf industry. Method Using Textom, a big data analysis tool, "golf tourism" and "Covid-19" were selected as keywords, and search frequency information of Naver and Daum was collected for a year from 1 st January, 2023 to 31st December, 2023, and data preprocessing was conducted based on this. For the suitability of the study and more accurate data, data not related to "golf tourism" was removed through the refining process, and similar keywords were grouped into the same keyword to perform analysis. As a result of the word refining process, top 36 keywords with the highest relevance and search frequency were selected and applied to this study. The top 36 keywords derived through word purification were subjected to TF-IDF analysis, visualization analysis using Ucinet6 and NetDraw programs, network analysis between keywords, and cluster analysis between each keyword through Concor analysis. Results By using big data analysis, it was found out option of oversea golf tourism is affecting on inbound golf travel. "Golf", "Tourism", "Vietnam", "Thailand" showed high frequencies, which proves that oversea golf tour is now the re-coming trends.