• 제목/요약/키워드: Big Data Cluster

검색결과 208건 처리시간 0.026초

Big Data Astronomy: Large-scale Graph Analyses of Five Different Multiverses

  • Hong, Sungryong
    • 천문학회보
    • /
    • 제43권2호
    • /
    • pp.36.3-37
    • /
    • 2018
  • By utilizing large-scale graph analytic tools in the modern Big Data platform, Apache Spark, we investigate the topological structures of five different multiverses produced by cosmological n-body simulations with various cosmological initial conditions: (1) one standard universe, (2) two different dark energy states, and (3) two different dark matter densities. For the Big Data calculations, we use a custom build of stand-alone Spark cluster at KIAS and Dataproc Compute Engine in Google Cloud Platform with the sample sizes ranging from 7 millions to 200 millions. Among many graph statistics, we find that three simple graph measurements, denoted by (1) $n_\k$, (2) $\tau_\Delta$, and (3) $n_{S\ge5}$, can efficiently discern different topology in discrete point distributions. We denote this set of three graph diagnostics by kT5+. These kT5+ statistics provide a quick look of various orders of n-points correlation functions in a computationally cheap way: (1) $n = 2$ by $n_k$, (2) $n = 3$ by $\tau_\Delta$, and (3) $n \ge 5$ by $n_{S\ge5}$.

  • PDF

Apache Spark와 OpenCV를 활용한 분산 클러스터 컴퓨팅 환경 대용량 이미지 머신러닝 시스템 (Image Machine Learning System using Apache Spark and OpenCV on Distributed Cluster)

  • 김하윤;김원집;이협건;김영운
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 춘계학술발표대회
    • /
    • pp.33-34
    • /
    • 2023
  • 성장하는 빅 데이터 시장과 빅 데이터 수의 기하급수적인 증가는 기존 컴퓨팅 환경에서 데이터 처리의 어려움을 야기한다. 특히 이미지 데이터 처리 속도는 데이터양이 많을수록 현저하게 느려진다. 이에 본 논문에서는 Apache Spark와 OpenCV를 활용한 분산 클러스터 컴퓨팅 환경의 대용량 이미지 머신러닝 시스템을 제안한다. 제안하는 시스템은 Apache Spark를 통해 분산 클러스터를 구성하며, OpenCV의 이미지 처리 알고리즘과 Spark MLlib의 머신러닝 알고리즘을 활용하여 작업을 수행한다. 제안하는 시스템을 통해 본 논문은 대용량 이미지 데이터 처리 및 머신러닝 작업 속도 향상 방법을 제시한다.

빅데이터에서 개선된 TI-FCM 클러스터링 알고리즘 (Improved TI-FCM Clustering Algorithm in Big Data)

  • 이광규
    • 전기전자학회논문지
    • /
    • 제23권2호
    • /
    • pp.419-424
    • /
    • 2019
  • FCM 알고리즘은 반복 최적화 기법을 통해 최적해를 찾는다. 특히, 클러스터링 초기 중심과 잡음의 위치, 몰려있는 밀도의 위치, 개수에 따라 실행시간 차이가 난다. 하지만 이 방법은 중심점을 점차 갱신해 나가는 방법으로 초기 클러스터 중심이 한 쪽으로 치우치게 되고 클러스터링 결과의 편차가 심해 클러스터링 대푯값의 신뢰도가 떨어진다. 따라서 본 논문에서는 삼각부등식을 이용하여 클러스터 간 거리를 최대한 멀어지게 하여 클러스터 중심 밀도를 결정하는 TI-FCM(Triangular Inequality-Fuzzy C-Means:삼각부등식-FCM)클러스터링 알고리즘을 제안한다. 제안된 방법은 대용량의 빅데이터에서도 FCM에 비해 실제 클러스터에 수렴하는 효과적인 방법이고 실험을 통해 기존 FCM보다 실행시간이 감소됨을 보였다.

하둡 분산 환경 기반의 데이터 수집 기법 연구 (A Study on the Data Collection Methods based Hadoop Distributed Environment)

  • 진고환
    • 한국융합학회논문지
    • /
    • 제7권5호
    • /
    • pp.1-6
    • /
    • 2016
  • 최근 빅데이터 활용과 분석기술의 발전을 위하여 많은 연구가 이루어지고 있고, 빅데이터를 분석하기 위하여 처리 플랫폼인 하둡을 도입하는 정부기관 및 기업이 점차 늘어가고 있는 추세이다. 이러한 빅데이터의 처리와 분석에 대한 관심이 고조되면서 그와 병행하여 데이터의 수집 기술이 주요한 이슈가 되고 있으나, 데이터 분석 기법의 연구에 비하여 수집 기술에 대한 연구는 미미한 상황이다. 이에 본 논문에서는 빅데이터 분석 플랫폼인 하둡을 클러스터로 구축하고 아파치 스쿱을 통하여 관계형 데이터베이스로부터 정형화된 데이터를 수집하고, 아파치 플룸을 통하여 센서 및 웹 애플리케이션의 데이터 파일, 로그 파일과 같은 비정형 데이터를 스트림 기반으로 수집하는 시스템을 제안한다. 이러한 융합을 통한 데이터 수집으로 빅데이터 분석의 기초적인 자료로 활용할 수 있을 것이다.

물류공동화 활성화를 위한 빅데이터 마이닝 적용 연구 : AHP 기법을 중심으로 (Study on the Application of Big Data Mining to Activate Physical Distribution Cooperation : Focusing AHP Technique)

  • 박영현;이재호;김경우
    • 무역학회지
    • /
    • 제46권5호
    • /
    • pp.65-81
    • /
    • 2021
  • The technological development in the era of the 4th industrial revolution is changing the paradigm of various industries. Various technologies such as big data, cloud, artificial intelligence, virtual reality, and the Internet of Things are used, creating synergy effects with existing industries, creating radical development and value creation. Among them, the logistics sector has been greatly influenced by quantitative data from the past and has been continuously accumulating and managing data, so it is highly likely to be linked with big data analysis and has a high utilization effect. The modern advanced technology has developed together with the data mining technology to discover hidden patterns and new correlations in such big data, and through this, meaningful results are being derived. Therefore, data mining occupies an important part in big data analysis, and this study tried to analyze data mining techniques that can contribute to the logistics field and common logistics using these data mining technologies. Therefore, by using the AHP technique, it was attempted to derive priorities for each type of efficient data mining for logisticalization, and R program and R Studio were used as tools to analyze this. Criteria of AHP method set association analysis, cluster analysis, decision tree method, artificial neural network method, web mining, and opinion mining. For the alternatives, common transport and delivery, common logistics center, common logistics information system, and common logistics partnership were set as factors.

이기종 클러스터 시스템에서 Cilk와 MPI 특성 비교 (Comparing Cilk and MPI on a heterogeneous cluster system)

  • 이규호;김준성
    • 전자공학회논문지CI
    • /
    • 제44권4호통권316호
    • /
    • pp.21-27
    • /
    • 2007
  • 최근 수년간의 급속한 기술의 발전과 대량생산 체제의 영향으로 개인용 컴퓨터와 간단한 네트워크 장비를 이용한 클러스터 시스템 구현이 용이해졌으나 개인용 컴퓨터의 교체 주기가 짧아짐에 따라 시스템 구성을 자유롭게 할 수 있는 클러스터 시스템의 이기종화를 초래하였다. 이기종 클러스터 시스템을 이용하여 구축된 병렬처리 시스템의 경우 그 성능을 효율적으로 사용하기 위해서는 각 노드의 성능을 고려한 작업 관리가 필요하다. 본 연구에서는 이기종 클러스터 시스템에서 MPI와 Cilk 병렬처리 시스템의 특성을 성능측면에서의 speedup과 활용도측면에서의 프로그램 코드의 복잡도를 정량적으로 살펴보았다. 실험에 따르면 작은 데이터를 이용하는 경우 Cilk가, 큰 데이터를 이용하거나 정규화된 데이터 교환 형태를 갖는 경우 MPI가 더 좋은 성능을 보였으며 코드 복잡도의 경우 Cilk가 간결한 프로그래밍 스타일을 제공함을 보였다.

Application of Urban Computing to Explore Living Environment Characteristics in Seoul : Integration of S-Dot Sensor and Urban Data

  • Daehwan Kim;Woomin Nam;Keon Chul Park
    • 인터넷정보학회논문지
    • /
    • 제24권4호
    • /
    • pp.65-76
    • /
    • 2023
  • This paper identifies the aspects of living environment elements (PM2.5, PM10, Noise) throughout Seoul and the urban characteristics that affect them by utilizing the big data of the S-Dot sensors in Seoul, which has recently become a hot topic. In other words, it proposes a big data based urban computing research methodology and research direction to confirm the relationship between urban characteristics and living environments that directly affect citizens. The temporal range is from 2020 to 2021, which is the available range of time series data for S-Dot sensors, and the spatial range is throughout Seoul by 500mX500m GRID. First of all, as part of analyzing specific living environment patterns, simple trends through EDA are identified, and cluster analysis is conducted based on the trends. After that, in order to derive specific urban planning factors of each cluster, basic statistical analysis such as ANOVA, OLS and MNL analysis were conducted to confirm more specific characteristics. As a result of this study, cluster patterns of environment elements(PM2.5, PM10, Noise) and urban factors that affect them are identified, and there are areas with relatively high or low long-term living environment values compared to other regions. The results of this study are believed to be a reference for urban planning management measures for vulnerable areas of living environment, and it is expected to be an exploratory study that can provide directions to urban computing field, especially related to environmental data in the future.

아동화 설계에 요구되는 치수 및 구조요인의 정량적 분석 -학령기 여아를 대상으로- (Quantitative Analysis of the Size and the Structural Factors of the Feet for Elementary School Girls' Shoe Design)

  • 전은경
    • 한국생활과학회지
    • /
    • 제15권4호
    • /
    • pp.651-658
    • /
    • 2006
  • This study was performed to provide the analysis on their size and the structural factors required in the process of design and manufacture of school girls' shoes. 371 elementary school girls in Kyungin and Youngnam area were participated in the size measurement. 25 foot items and 6 main body items were measured directly or indirectly using a digital photography. The results of the study are as follows: first, by most of measured items, the range of their foot size was very wide from the size of toddlers to adults'. That shows that the change of school girls' foot size occurred with their growth is pretty big. Second, from the structural factor analysis on 25 foot items, five factors were extracted such as 'the size of the foot', 'the volume of the foot,' 'the height and inclination of the foot,' 'the shape of the foot,' and 'the inside and outside inclination of the foot'. Third, from the cluster analysis, three clusters were classified: Cluster 1 was the group of 10 to 11 year old girls who had big-sized feet. The elementary school girls in the fourth to sixth grade belonged to this group. Cluster 2 consisted of girls who had small-sized and big-volumed feet. Cluster 3 had medium-sized and slim-shaped feet. Most of 6 to 7 year old elementary school girls belonged to this group. The above-mentioned results imply that many continual researches are required on children's shoe production reflecting the change of elementary school girls' feet size owing to their growth. The quantitative data on elementary school girls' feet size in this study could be used as basic information for the development of children's shoe design and its production.

  • PDF

다속성 빅데이터로부터 유용한 정보 추출에 관한 연구 - 서울시 1인 가구를 중심으로 - (A Study on Extraction of Useful Information from Big dataset of Multi-attributes - Focus on Single Household in Seoul -)

  • 최정민;김건우
    • 한국주거학회논문집
    • /
    • 제25권4호
    • /
    • pp.59-72
    • /
    • 2014
  • This study proposes a data-mining analysis method for examining variable multi-attribute big-data, which is considered to be more applicable in social science using a Correspondence Analysis of variables obtained by AIC model selection. The proposed method was applied on the Seoul Survey from 2005 to 2010 in order to extract interesting rules or patterns on characteristics of single household. The results found as follows. Firstly, this paper illustrated that the proposed method is efficiently able to apply on a big dataset of huge categorical multi attributes variables. Secondly, as a result of Seoul Survey analysis, it has been found that the more dissatisfied with residential environment the higher tendency of residential mobility in single household. Thirdly, it turned out that there are three types of single households based on the characteristics of their demographic characteristics, and it was different from recognition of home and partner of counselling by the three types of single households. Fourthly, this paper extracted eight significant variables with a spatial aggregated dataset which are highly correlated to the ratio of occupancy of single household in 25 Seoul Municipals, and to conclude, it investigated the relation between spatial distribution of single households and their demographic statistics based on the six divided groups obtained by Cluster Analysis.

빅데이터를 이용한 비건 패션 쟁점의 분석 -한국, 중국, 미국을 중심으로- (Perception and Trend Differences between Korea, China, and the US on Vegan Fashion -Using Big Data Analytics-)

  • 정지운;윤소정
    • 한국의류학회지
    • /
    • 제47권5호
    • /
    • pp.804-821
    • /
    • 2023
  • This study examines current trends and perceptions of veganism and vegan fashion in Korea, China, and the United States. Using big data tools Textom and Ucinet, we conducted cluster analysis between keywords. Further, frequency analysis using keyword extraction and CONCOR analysis obtained the following results. First, the nations' perceptions of veganism and vegan fashion differ significantly. Korea and the United States generally share a similar understanding of vegan fashion. Second, the industrial structures, such as products and businesses, impacted how Korea perceived veganism. Third, owing to its ongoing sociopolitical tensions, the United States views veganism as an ethical consumption method that ties into activism. In contrast, China views veganism as a healthy diet rather than a lifestyle and associates it with Buddhist vegetarianism. This perception is because of their religious history and culinary culture. Fundamentally, this study is meaningful for using big data to extract keywords related to vegan fashion in Korea, China, and the United States. This study deepens our understanding of vegan fashion by comparing perceptions across nations.