• 제목/요약/키워드: data pre-processing

검색결과 813건 처리시간 0.023초

RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment

  • Muhammad Faseeh Qureshi, Nawab;Shin, Dong Ryeol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권9호
    • /
    • pp.4063-4086
    • /
    • 2016
  • Cloud computing is a robust technology, which facilitate to resolve many parallel distributed computing issues in the modern Big Data environment. Hadoop is an ecosystem, which process large data-sets in distributed computing environment. The HDFS is a filesystem of Hadoop, which process data blocks to the cluster nodes. The data block placement has become a bottleneck to overall performance in a Hadoop cluster. The current placement policy assumes that, all Datanodes have equal computing capacity to process data blocks. This computing capacity includes availability of same storage media and same processing performances of a node. As a result, Hadoop cluster performance gets effected with unbalanced workloads, inefficient storage-tier, network traffic congestion and HDFS integrity issues. This paper proposes a storage-tier-aware Robust Data Placement (RDP) scheme, which systematically resolves unbalanced workloads, reduces network congestion to an optimal state, utilizes storage-tier in a useful manner and minimizes the HDFS integrity issues. The experimental results show that the proposed approach reduced unbalanced workload issue to 72%. Moreover, the presented approach resolve storage-tier compatibility problem to 81% by predicting storage for block jobs and improved overall data block placement by 78% through pre-calculated computing capacity allocations and execution of map files over respective Namenode and Datanodes.

3차원 물체 인식을 위한 표면 분류 및 임계치의 선정 (Surface Classification and Its Threshold Value Selection for the Recognition of 3-D Objects)

  • 조동욱;백승재;김동원
    • 한국음향학회지
    • /
    • 제19권3호
    • /
    • pp.20-25
    • /
    • 2000
  • 본 논문에서는 3차원 물체 인식을 위한 표면 분류 및 임계치 선정 방법에 대해 제안 하고자 한다. 3차원 영상 처리는 크게 거리 영상의 획득과 특징 추출 그리고 정합 과정으로 이루어진다. 본 논문에서는 전체 3차원 영상 처리 시스템중 거리 영상을 입력으로 했을 시 형상 특징을 추출하는 방법에 대해 제안하고자 한다. 이를 위해 첫째, 거리 영상의 깊이 변화 부호 값의 분포 특성에 따라 표면을 분류하는 방법을 제안하고자 한다. 또한 평균 곡률과 가우스 곡률을 이용하여 표면을 분류했던 기존 방법을 토대로 그의 문제점이었던 실제 거리 영상에서의 임계치 선정 방법에 대하여 제안하고자 한다. 끝으로 제안한 방법의 유용성을 실험에 의해 입증하고자 한다.

  • PDF

다목적실용위성 시리즈 연구 동향 (Research Trends in KOMPSAT Series)

  • 이광재;오관영;채태병;이원진
    • 대한원격탐사학회지
    • /
    • 제35권6_4호
    • /
    • pp.1313-1318
    • /
    • 2019
  • 한국항공우주연구원에서는 총 3기의 다목적실용위성(3호, 3A호, 5호)을 개발 및 운영하고 있다. 위성개발의 주요 목적은 위성으로부터 획득되는 자료의 활용에 있다. 따라서 자료처리의 정확도 향상, 활용 분야의 확대를 위한 지속적 노력이 필요하다. 본 특별호에서는 다목적실용위성 광학 및 레이더 센서 기반의 전처리 기술 및 활용 기술에 대해서 소개하고 있다. 향후, 후속 다목적실용위성, 소형 위성 등이 개발 중에 있기 때문에 이에 대한 체계적인 연구 개발 및 투자가 필요할 것으로 사료된다.

Sub-Pixel Analysis of Hyperspectral Image Using Linear Spectral Mixing Model and Convex Geometry Concept

  • Kim, Dae-Sung;Kim, Yong-Il;Lim, Young-Jae
    • Korean Journal of Geomatics
    • /
    • 제4권1호
    • /
    • pp.1-8
    • /
    • 2004
  • In the middle-resolution remote sensing, the Ground Sampled Distance (GSD) that the detector senses and samples is generally larger than the actual size of the objects (or materials) of interest, and so several objects are embedded in a single pixel. In this case, as it is impossible to detect these objects by the conventional spatial-based image processing techniques, it has to be carried out at sub-pixel level through spectral properties. In this paper, we explain the sub-pixel analysis algorithm, also known as the Linear Spectral Mixing (LSM) model, which has been experimented using the Hyperion data. To find Endmembers used as the prior knowledge for LSM model, we applied the concept of the convex geometry on the two-dimensional scatter plot. The Atmospheric Correction and Minimum Noise Fraction techniques are presented for the pre-processing of Hyperion data. As LSM model is the simplest approach in sub-pixel analysis, the results of our experiment is not good. But we intend to say that the sub-pixel analysis shows much more information in comparison with the image classification.

  • PDF

Scalable Service Placement in the Fog Computing Environment for the IoT-Based Smart City

  • Choi, Jonghwa;Ahn, Sanghyun
    • Journal of Information Processing Systems
    • /
    • 제15권2호
    • /
    • pp.440-448
    • /
    • 2019
  • The Internet of Things (IoT) is one of the main enablers for situation awareness needed in accomplishing smart cities. IoT devices, especially for monitoring purposes, have stringent timing requirements which may not be met by cloud computing. This deficiency of cloud computing can be overcome by fog computing for which fog nodes are placed close to IoT devices. Because of low capabilities of fog nodes compared to cloud data centers, fog nodes may not be deployed with all the services required by IoT devices. Thus, in this article, we focus on the issue of fog service placement and present the recent research trends in this issue. Most of the literature on fog service placement deals with determining an appropriate fog node satisfying the various requirements like delay from the perspective of one or more service requests. In this article, we aim to effectively place fog services in accordance with the pre-obtained service demands, which may have been collected during the prior time interval, instead of on-demand service placement for one or more service requests. The concept of the logical fog network is newly presented for the sake of the scalability of fog service placement in a large-scale smart city. The logical fog network is formed in a tree topology rooted at the cloud data center. Based on the logical fog network, a service placement approach is proposed so that services can be placed on fog nodes in a resource-effective way.

전이 학습 및 SHAP 분석을 활용한 트랜스포머 기반 감정 분류 모델 (A Transformer-Based Emotion Classification Model Using Transfer Learning and SHAP Analysis )

  • 임수빈 ;이병천 ;전인수 ;문지훈
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 춘계학술발표대회
    • /
    • pp.706-708
    • /
    • 2023
  • In this study, we embark on a journey to uncover the essence of emotions by exploring the depths of transfer learning on three pre-trained transformer models. Our quest to classify five emotions culminates in discovering the KLUE (Korean Language Understanding Evaluation)-BERT (Bidirectional Encoder Representations from Transformers) model, which is the most exceptional among its peers. Our analysis of F1 scores attests to its superior learning and generalization abilities on the experimental data. To delve deeper into the mystery behind its success, we employ the powerful SHAP (Shapley Additive Explanations) method to unravel the intricacies of the KLUE-BERT model. The findings of our investigation are presented with a mesmerizing text plot visualization, which serves as a window into the model's soul. This approach enables us to grasp the impact of individual tokens on emotion classification and provides irrefutable, visually appealing evidence to support the predictions of the KLUE-BERT model.

ALGORITHM OF REVISED-OTFTOOL

  • Chung Eun-Jung;Kim Hyor-Young;Rhee Myung-Hyun
    • Journal of Astronomy and Space Sciences
    • /
    • 제23권3호
    • /
    • pp.269-288
    • /
    • 2006
  • We revised the OTFTOOL which was developed in Five College Radio Astronomy Observatory (FCRAO) for the On-The-Fly (OTF) observation. Besides the improvement of data resampling function of conventional OTFTOOL, we added a new SELF referencing mode and data pre-reduction function. Since OTF observation data have a large redundancy, we can choose and use only good quality samples excluding bad samples. Sorting out the bad samples is based on the floating level, rms level, antenna trajectory, elevation, $T_{sys}$, and number of samples. And, spikes are also removed. Referencing method can be chosen between CLASSICAL mode in which the references are taken from the OFFs observation and ELLIPSOIDAL mode in which the references are taken from the inner source free region (this is named as SELF reference). Baseline is subtracted with the source free channel windows and the baseline order chosen by the user. Passing through these procedures, the raw OTF data will be an FITS datacube. The revised-OTFTOOL maximizes the advantages of OTF observation by sorting out the bad samples in the earliest stage. And the new self-referencing method, the ELLIPSOIDAL mode, is very powerful to reduce the data. Moreover since it is possible to see the datacube at once without moving them into other data reduction programs, it is very useful and convenient to check whether the data resampling works well or not. We expect that the revised-OTFTOOL can be applied to the facilities of the OTF observation like SRAO, NRAO, and FCRAO.

측정 점데이터로부터 단면 데이터 추출에 관한 연구 (A Study on Cross-sectioning Methods for Measured Point Data)

  • 우혁제;강의철;이관행
    • 한국정밀공학회:학술대회논문집
    • /
    • 한국정밀공학회 2000년도 추계학술대회 논문집
    • /
    • pp.272-276
    • /
    • 2000
  • Reverse engineering refers to the process that creates a physical part from acquiring the surface data of an existing part using a scanning device. In recent years, as the non-contact type scanning devices become more popular, the huge amount of point data can be obtained with high speed. The point data handling process, therefore, becomes more important since the scan data need to be refined for the efficiency of subsequent tasks such as mesh generation and surface fitting. As one of point handling functions, the cross-sectioning function is still frequently used for extracting the necessary data from the point cloud. The commercial reverse engineering software supports cross-sectioning functions, however, these are only for cross-sectioning the point cloud with the constant spacing and direction. In this paper, adaptive cross-sectioning point cloud which allow the changes of the spacing and directions of cross-sections according to the constant spacing and direction. In this paper, adaptive cross-sectioning algorithms which allow the changes of the spacing and directions of cross-sections according to the curvature difference of the point cloud data are proposed.

  • PDF

빅데이터의 정규화 전처리과정이 기계학습의 성능에 미치는 영향 (Effectiveness of Normalization Pre-Processing of Big Data to the Machine Learning Performance)

  • 조준모
    • 한국전자통신학회논문지
    • /
    • 제14권3호
    • /
    • pp.547-552
    • /
    • 2019
  • 최근, 빅데이터 분야에서는 빅 데이터의 양적 팽창이 주요 이슈로 떠오르고 있다. 더군다나 이러한 빅데이터는 기계학습의 입력값으로 사용되어지고 있으며 이들의 성능을 향상시키기 위해 정규화 전처리가 필요하다. 이러한 성능은 빅데이터 컬럼의 범위나 정규화 전처리 방식에 따라 크게 좌우된다. 본 논문에서는 다양한 종류의 정규화 전처리 방식과 빅데이터 컬럼의 범위를 조절하면서 서포트벡터머신(SVM)의 기계학습방식에 적용함으로써 더욱 효과적인 정규화 전처리 방식을 파악하고자 하였다. 이를 위하여 파이썬언어와 주피터 노트북 환경에서 기계학습을 수행하고 분석하였다.

대기오염 모델링을 위한 기상자료 전처리 프로그램 개발에 관한 연구 (A Study on Development of the Meteorological Data Preprocessing Program for Air Pollution Modeling)

  • 임익현;배성환
    • 한국전자통신학회논문지
    • /
    • 제10권1호
    • /
    • pp.47-54
    • /
    • 2015
  • 최근, 산업화와 도시화로 연료소비가 급격하게 증가하고, 주요 도시들의 대기오염이 심화됨에 따라 대기질 관리를 위해 대기확산모델의 이용 및 개발과 관련된 연구들이 다양하게 진행되고 있다. 본 연구에서는 국내에서 U.S. EPA가 제공하는 대기확산모델의 활용범위 확장을 목적으로 국내 기상자료를 미국 자료체계로의 변환기능과 모델입력용 기상자료의 생성이 가능한 "국내 기상자료 전처리 프로그램"을 개발하고, 사례연구에 적용을 통해서 프로그램의 활용성을 평가하였다. 평가 결과, 국내 기상자료를 미국 자료체계로 정확하게 변환처리하고, 대기질 모델링 과정에서 오류의 발생 없이 예측이 가능하여 향후 국내 기상자료 전처리 도구로 높은 활용성을 나타내었다.