• Title/Summary/Keyword: Large-scale Data

Search Result 2,731, Processing Time 0.039 seconds

Real-time and Parallel Semantic Translation Technique for Large-Scale Streaming Sensor Data in an IoT Environment (사물인터넷 환경에서 대용량 스트리밍 센서데이터의 실시간·병렬 시맨틱 변환 기법)

  • Kwon, SoonHyun;Park, Dongwan;Bang, Hyochan;Park, Youngtack
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.54-67
    • /
    • 2015
  • Nowadays, studies on the fusion of Semantic Web technologies are being carried out to promote the interoperability and value of sensor data in an IoT environment. To accomplish this, the semantic translation of sensor data is essential for convergence with service domain knowledge. The existing semantic translation technique, however, involves translating from static metadata into semantic data(RDF), and cannot properly process real-time and large-scale features in an IoT environment. Therefore, in this paper, we propose a technique for translating large-scale streaming sensor data generated in an IoT environment into semantic data, using real-time and parallel processing. In this technique, we define rules for semantic translation and store them in the semantic repository. The sensor data is translated in real-time with parallel processing using these pre-defined rules and an ontology-based semantic model. To improve the performance, we use the Apache Storm, a real-time big data analysis framework for parallel processing. The proposed technique was subjected to performance testing with the AWS observation data of the Meteorological Administration, which are large-scale streaming sensor data for demonstration purposes.

Design and Implementation of Cloud-based Data Management System for Large-scale USN (대규모 USN을 위한 클라우드기반 데이터 관리 시스템 설계 및 구현)

  • Kim, Kyong-Og;Jeong, Kyong-Jin;Park, Kyoung-Wook;Kim, Jong-Chan;Jang, Moon-Suk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.352-354
    • /
    • 2010
  • Recently, the efficient management system for large-scale sensor data has been required due to the increasing deployment of large-scale sensor networks. In previous studies, sensor data was managed by distributed database system which built in a single server or a grid server. Thus, it has disadvantages such as low scalability, and high cost of building or managing the system. In this paper, we propose a cloud-based sensor data management system with low cast, high scalability, and efficiency. The proposed system can be work with the application of a variety of platforms, because processed results are provided through REST-based web service.

  • PDF

MarSel : The LD-based Marker Selection System for the Large-scale Datasets (MarSel : Large-scale Dataset에 대한 LD기반의 Marker 선택 시스템)

  • 김상준;여상수;김성권
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.253-255
    • /
    • 2004
  • 인간(human)에게 나타나는 다양성(variation)은 인체의 유전체(genome) 안에서 발생된 SNP(Single Nucleotide Polymorphism)에 의해 나타난다고 알려져 있다. 유전체내의 SNP과 다양성에 대한 연관 연구(Associate study)를 할 때에 약 30여 억 개로 추정되는 염기서열(DNA sequence)물 모두 분석한다면 많은 비용과 시간을 필요로 할 것이다. 이런 비용과 시간을 줄이기 위친 적은 수의 대표 SNP(=tagSNP)을 찾는 연구가 현재 진행 중이다. 우리는 LD계수|D;|을 block 분할에 이용하여 생물학적인 의미를 부여한 후, 전산적인 최적해를 찾는 접근을 이용했다. 또한, 기존 연구에서는 large-scale data에 대한 처리가 불가능해서 chromosome의 일부분의 데이터에 대해서안 분석이 시도되었다. 더욱 광범위한 분석을 위해서 chromosome 단위의 처리가 필요하다. 우리는 chromosome단위의 SNP data를 한 번에 처리가 가능한 시스템인 MarSel를 구현하였다

  • PDF

Testing Web Feeding Model for Star Formation in Galaxy Clusters in the COSMOS Field

  • Ko, Eunhee;Im, Myungshin;Lee, Seong-Kook;Hyun, Minhee
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.46 no.1
    • /
    • pp.52.3-53
    • /
    • 2021
  • It is yet to be understood what controls the star formation activity in high-redshift galaxy clusters. One recently proposed mechanism is that the star formation activity in galaxy clusters are fed by gas and galaxies in large-scale structures surrounding them, which we call as "web feeding model". Using galaxies in the COSMOS2015 catalog, with mass completeness at log(M/M⦿)≥9.54 and reliable photometric redshift data (σΔz/(1+z) ≲ 0.01), we study the star formation activities of galaxy clusters and their surrounding environment to test the web feeding model. We first identify the overdense regions with number density exceeding the 4σ-level from photometric redshift data as galaxy clusters, and we find that they are well matched with clusters identified in the X-ray extended source catalog. Furthermore, we identify galaxy large scale structures, and will present the correlation or anti-correlation between quiescent galaxy fraction, an indicator of star-forming activity, and the prevalence of galaxy large scale structures.

  • PDF

An experimental study on parallel implementation of an iterative method for large scale, sparse linear system (반복기법을 이용한 대규모, 소선형시스템의 병렬처리에 관한 연구)

  • 김상원;장수영
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1991.10a
    • /
    • pp.6-22
    • /
    • 1991
  • This thesis presents a parallel implementation of an iterative method for large scale, sparse linear system and gives result of computational experiments performed on both single transputer and multi transputer parallel computers. To solve linear system, we use conjugate gradient method and develope data storage techinique, data communication scheme. In addition to the explanation of conjugate gradient method, the result of computational experiment is summarized.

  • PDF

Data augmentation technique based on image binarization for constructing large-scale datasets (대형 이미지 데이터셋 구축을 위한 이미지 이진화 기반 데이터 증강 기법)

  • Lee JuHyeok;Kim Mi Hui
    • Journal of IKEEE
    • /
    • v.27 no.1
    • /
    • pp.59-64
    • /
    • 2023
  • Deep learning can solve various computer vision problems, but it requires a large dataset. Data augmentation technique based on image binarization for constructing large-scale datasets is proposed in this paper. By extracting features using image binarization and randomly placing the remaining pixels, new images are generated. The generated images showed similar quality to the original images and demonstrated excellent performance in deep learning models.

The Relationship between Food and Labor Expense, Profit Margin, and Customer Satisfaction within University Union Foodservice Operations in Korea

  • Won, Sun-Im;Lee, Jin-Mee
    • Food Quality and Culture
    • /
    • v.1 no.1
    • /
    • pp.58-61
    • /
    • 2007
  • The purpose of this study was to develop an effective cost control model for university foodservice operations by analyzing student satisfaction, as well as foodservice income statements for operational characteristics. The specific objectives were to examine the satisfaction of students for various foodservice quality dimensions, to determine the financial activities performed in foodservice operations by operational type, to examine their income statement data, and lastly, to compare the student satisfaction for foodservice quality with the financial data of the income statements. A total of 545 students from one university answered a satisfaction survey. The one-year income statements of three union foodservices (self-operated, small-scale contracted, and large-scale contracted) at the same university were analyzed. The results showed that the self-operated union foodservice had lower student satisfaction scores and higher food and labor cost ratios. The small-scale contract management foodservice data indicated the highest student satisfaction scores and the lowest food and labor cost ratios. The large-scale contract management foodservice data showed medium scores when comparing the three union foodservice operations. Overall, by comparing the satisfaction scores and operational profits, the small-scale union foodservices showed the highest satisfaction scores and profit.

  • PDF

Proposal For Improving Data Processing Performance Using Python (파이썬 활용한 데이터 처리 성능 향상방법 제안)

  • Kim, Hyo-Kwan;Hwang, Won-Yong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.4
    • /
    • pp.306-311
    • /
    • 2020
  • This paper deals with how to improve the performance of Python language with various libraries when developing a model using big data. The Python language uses the Pandas library for processing spreadsheet-format data such as Excel. In processing data, Python operates on an in-memory basis. There is no performance issue when processing small scale of data. However, performance issues occur when processing large scale of data. Therefore, this paper introduces a method for distributed processing of execution tasks in a single cluster and multiple clusters by using a Dask library that can be used with Pandas when processing data. The experiment compares the speed of processing a simple exponential model using only Pandas on the same specification hardware and the speed of processing using a dask together. This paper presents a method to develop a model by distributing a large scale of data by CPU cores in terms of performance while maintaining that python's advantage of using various libraries is easy.

Development of the Design Methodology for Large-scale Data Warehouse based on MongoDB

  • Lee, Junho;Joo, Kyungsoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.3
    • /
    • pp.49-54
    • /
    • 2018
  • A data warehouse is a system that collectively manages and integrates data of a company. And provides the basis for decision making for management strategy. Nowadays, analysis data volumes are reaching critical size challenging traditional data ware housing approaches. Current implemented solutions are mainly based on relational database that are no longer adapted to these data volume. NoSQL solutions allow us to consider new approaches for data warehousing, especially from the multidimensional data management point of view. In this paper, we extend the data warehouse design methodology based on relational database using star schema, and have developed a consistent design methodology from information requirement analysis to data warehouse construction for large scale data warehouse construction based on MongoDB, one of NoSQL.

Management Scheme According to Characteristics of PM-10 Occurred from Large Scale Development Site (대규모 단지조성 미세먼지 관리 방안)

  • Kwon, Woo-Taeg;Lee, Woo-Sik;Hong, Sang-Pyo
    • Journal of Environmental Impact Assessment
    • /
    • v.22 no.1
    • /
    • pp.79-87
    • /
    • 2013
  • The purpose of this study is to establish PM-10 management manual for developing large scale sites by assessing the status of PM-10 reduction at ongoing large scale development sites. After analyzing the meteorological conditions and air quality characteristics of Sihwa MTV development site, ISCST3 (Industrial Source Complex Short Term Model 3) was implemented to predict PM-10 generation. The outcomes of ISCST3 modelling were utilized for verification of site survey data. As a result of applying air pollution modeling, the diffusion rate of PM-10 decreases according as the wind speed decreases. And the emission rate of PM-10 increase is linear to the concentration of PM-10. The reduction target of PM-10 can be derived quantitatively from the difference between the forecasted emission rate and the permissible emission limit of PM-10. The assessment of PM-10 characteristics which is deduced from ISCST3 and site survey can be practically applied to accomplish environmentally acceptable air quality manual for large scale development sites.