• Title/Summary/Keyword: 빅데이터 클러스터

Search Result 93, Processing Time 0.025 seconds

Basic Prototype Design and Verification of Hadoop Cluster based on Private Cloud Infrastructure for SMB (중소기업을 위한 프라이빗 클라우드 인프라 기반 하둡 클러스터의 기본 프로토타입 설계 및 실증)

  • Cha, Byung-Rae;Kim, Hyeong-Gyun;Kim, Dae-Gue;Kim, Jong-Won;Kim, Yong-Il
    • Journal of Advanced Navigation Technology
    • /
    • v.17 no.2
    • /
    • pp.225-233
    • /
    • 2013
  • Recently, Cloud Computing and Big Data has become a buzzword in the field of IT. In this paper, as part of special efforts to support small businesses (SMB) in these situations, we designed the basic prototypes ver. 0.1, 0.2, and 0.5 for Hadoop cluster based on private cloud infrastructure and implemented the part of basic prototypes. And we verified the performances of the basic prototypes using ASA Dataset.

Forecasting the Growth of Smartphone Market in Mongolia Using Bass Diffusion Model (Bass Diffusion 모델을 활용한 스마트폰 시장의 성장 규모 예측: 몽골 사례)

  • Anar Bataa;KwangSup Shin
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.193-212
    • /
    • 2022
  • The Bass Diffusion Model is one of the most successful models in marketing research, and management science in general. Since its publication in 1969, it has guided marketing research on diffusion. This paper illustrates the usage of the Bass diffusion model, using mobile cellular subscription diffusion as a context. We fit the bass diffusion model to three large developed markets, South Korea, Japan, and China, and the emerging markets of Vietnam, Thailand, Kazakhstan, and Mongolia. We estimate the parameters of the bass diffusion model using the nonlinear least square method. The diffusion of mobile cellular subscriptions does follow an S-curve in every case. After acquiring m, p, and q parameters we use k-Means Cluster Analysis for grouping countries into three groups. By clustering countries, we suggest that diffusion rates and patterns are similar, where countries with emerging markets can follow in the footsteps of countries with developed markets. The purpose was to predict the timing and the magnitude of the market maturity and to determine whether the data follow the typical diffusion curve of innovations from the Bass model.

A study on Korean tourism trends using social big data -Focusing on sentiment analysis- (소셜 빅데이터를 활용한 한국관광 트렌드에 관한연구 -감성분석을 중심으로-)

  • Youn-hee Choi;Kyoung-mi Yoo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.97-109
    • /
    • 2024
  • In the field of domestic tourism, tourism trend analysis of tourism consumers, both international tourists and domestic tourists, is essential not only for the Korean tourism market but also for local and governmental tourism policy makers. e will explore the keywords and sentiment analysis on social media to establish a marketing strategy plan and revitalize the domestic tourism industry through communication and information from tourism consumers. This study utilized TEXTOM 6.0 to analyze recent trends in Korean tourism. Data was collected from September 31, 2022, to August 31, 2023, using 'Korean tourism' and 'domestic tourism' as keywords, targeting blogs, cafes, and news provided by Naver, Daum, and Google. Through text mining, 100 key words and TF-IDF were extracted in order of frequency, and then CONCOR analysis and sentiment analysis were conducted. For Korean tourism keywords, words related to tourist destinations, travel companions and behaviors, tourism motivations and experiences, accommodation types, tourist information, and emotional connections ranked high. The results of the CONCOR analysis were categorized into five clusters related to tourist destinations, tourist information, tourist activities/experiences, tourism motivation/content, and inbound related. Finally, the sentiment analysis showed a high level of positive documents and vocabulary. This study analyzes the rapidly changing trends of Korean tourism through text mining on Korean tourism and is expected to provide meaningful data to promote domestic tourism not only for Koreans but also for foreigners visiting Korea.

Introduction to Digital Twin Convergence Medical Innovation Project (디지털 트윈 융합 의료혁신 선도 사업 소개)

  • Kwang-Man Ko;Jee-Hyun Koo;Byung-Suk Seo;Sun-Young Son
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.895-897
    • /
    • 2024
  • 본 논문에서는 2024년 4월부터 과학기술정보통신부 재원으로 시작하는 "디지털트윈 융합 의료혁신선도" 사업 내용을 소개한다. 본 사업은 첨단 의료기기 클러스터를 운영 중인 강원도를 중심으로 국내 디지털 의료기기 개발 혁신을 위한 디지털트윈 활용 기반 구축을 목표로 하며, 이를 위해 ▲디지털트윈 통합 인프라 구축(디지털트윈 모델, 디지털트윈 연계 플랫폼), ▲시뮬레이션 검증 인프라 구축, ▲의료기기 디지털트윈 사업화를 세부 과제로 진행할 예정이다.

Rapid Management Mechanism Against Harmful Materials of Agri-Food Based on Big Data Analysis (빅 데이터 분석 기반 농 식품 위해인자 신속관리 방법)

  • Park, Hyeon;Kang, Sung-soo;Jeong, Hoon;Kim, Se-Han
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.6
    • /
    • pp.1166-1174
    • /
    • 2015
  • There were the attempts to prevent the spread of harmful materials of the agri-food through the record tracking of the products with the bar code, the partial information tracking of the agri-food storage and the delivery vehicle, or the control of the temperature by intuition. However, there were many problems in the attempts because of the insufficient information, the information distortion and the independent information network of each distribution company. As a result, it is difficult to prevent the spread over the life-cycle of the agri-food using the attempts. To solve the problems, we propose the mechanism mainly to do context awareness, predict, and track the harmful materials of agri-food using big data processing.

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values (신뢰값 기반 대용량 트리플 처리를 위한 스파크 환경에서의 RDFS 온톨로지 추론)

  • Park, Hyun-Kyu;Lee, Wan-Gon;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.87-95
    • /
    • 2016
  • Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

The study on the diagnosis and measurement of post-information society by ANP (ANP를 활용한 후기정보사회의 수준진단과 측정에 관한 연구)

  • Song, Young-Jo;Kwak, Jeong-Ho
    • Informatization Policy
    • /
    • v.23 no.2
    • /
    • pp.73-97
    • /
    • 2016
  • Social changes due to ICT like Big Data, IoT, Cloud and Mobile is progressing rapidly. Now, we get out of the old-fashioned frame was measured at the level of the information society through the introduction of PC, Internet speed and Internet subscribers etc and there is a need for a new type of diagnostic information society framework. This study is the study for the framework established to diagnose and measure post-information society. The framework and indicators were chosen in accordance with the technological society coevolution theory and information society-related indicators presented from authoritative international organizations. Empirical results utilizing the indicators and framework developed in this study were as follows: First, the three sectors, six clusters (items), 25 nodes (indicators) that make up the information society showed that all strongly connected. Second, it was diagnosed as information society development (50.34%), technology-based expansion (25.03%) and ICT effect (24.63%) through a network analysis (ANP) for the measurement of importance of the information society. Third, the result of calculating the relative importance of the cluster and nodes showed us (1)social development potential (26.04%), (2)competitiveness (15.9%), (3)ICT literacy (15.5%) (4) (social)capital (24.3 %), (5)ICT acceptance(9.54%), (6)quality of life(8.7%). Consequently, We should take into account the effect of the economy and quality of life beyond ICT infrastructure-centric when we measure the post-information society. By applying the weighting we should performs a comparison between countries and we should diagnose the level of Korea and provide policy implications for the preparation of post-information society.

A Study on Research Paper Classification Using Keyword Clustering (키워드 군집화를 이용한 연구 논문 분류에 관한 연구)

  • Lee, Yun-Soo;Pheaktra, They;Lee, JongHyuk;Gil, Joon-Min
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.12
    • /
    • pp.477-484
    • /
    • 2018
  • Due to the advancement of computer and information technologies, numerous papers have been published. As new research fields continue to be created, users have a lot of trouble finding and categorizing their interesting papers. In order to alleviate users' this difficulty, this paper presents a method of grouping similar papers and clustering them. The presented method extracts primary keywords from the abstracts of each paper by using TF-IDF. Based on TF-IDF values extracted using K-means clustering algorithm, our method clusters papers to the ones that have similar contents. To demonstrate the practicality of the proposed method, we use paper data in FGCS journal as actual data. Based on these data, we derive the number of clusters using Elbow scheme and show clustering performance using Silhouette scheme.

Design and Implementation of a Benchmarking System Based on ArangoDB (ArangoDB기반 벤치마킹 시스템 설계 및 구현)

  • Choi, Do-Jin;Baek, Yeon-Hee;Lee, So-Min;Kim, Yun-A;Kim, Nam-Young;Choi, Jae-Young;Lee, Hyeon-Byeong;Lim, Jong-Tae;Bok, Kyoung-Soo;Song, Seok-Il;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.9
    • /
    • pp.198-208
    • /
    • 2021
  • ArangoDB is a NoSQL database system that has been popularly utilized in many applications for storing large amounts of data. In order to apply a new NoSQL database system such as ArangoDB, to real work environments we need a benchmarking system that can evaluate its performance. In this paper, we design and implement a ArangoDB based benchmarking system that measures a kernel level performance well as an application level performance. We partially modify YCSB to measure the performance of a NoSQL database system in the cluster environment. We also define three real-world workload types by analyzing the existing materials. We prove the feasibility of the proposed system through the benchmarking of three workload types. We derive available workloads in ArangoDB and show that performance at the kernel layer as well as the application layer can be visualized through benchmarking of three workload types. It is expected that applicability and risk reviews will be possible through benchmarking of this system in environments that need to transfer data from the existing database engine to ArangoDB.

Linguistic Features Discrimination for Social Issue Risk Classification (사회적 이슈 리스크 유형 분류를 위한 어휘 자질 선별)

  • Oh, Hyo-Jung;Yun, Bo-Hyun;Kim, Chan-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.541-548
    • /
    • 2016
  • The use of social media is already essential as a source of information for listening user's various opinions and monitoring. We define social 'risks' that issues effect negative influences for public opinion in social media. This paper aims to discriminate various linguistic features and reveal their effects for building an automatic classification model of social risks. Expecially we adopt a word embedding technique for representation of linguistic clues in risk sentences. As a preliminary experiment to analyze characteristics of individual features, we revise errors in automatic linguistic analysis. At the result, the most important feature is NE (Named Entity) information and the best condition is when combine basic linguistic features. word embedding, and word clusters within core predicates. Experimental results under the real situation in social bigdata - including linguistic analysis errors - show 92.08% and 85.84% in precision respectively for frequent risk categories set and full test set.