• Title/Summary/Keyword: Large Scale Data

Search Result 2,773, Processing Time 0.033 seconds

Outlier Detection in Time Series Monitoring Datasets using Rule Based and Correlation Analysis Method (규칙기반 및 상관분석 방법을 이용한 시계열 계측 데이터의 이상치 판정)

  • Jeon, Jesung;Koo, Jakap;Park, Changmok
    • Journal of the Korean GEO-environmental Society
    • /
    • v.16 no.5
    • /
    • pp.43-53
    • /
    • 2015
  • In this study, detection methods of outlier in various monitoring data that fit into big data category were developed and outlier detections were conducted for both artificial data and real field monitoring data. Rule-based methods applied rate of change and probability of error for monitoring data are effective to detect a large-scale short faults and constant faults having no change within a certain period. There are however, problems with misjudgement that consider the normal data with a large scale variation as outlier caused by using independent single dataset. Rule-based methods for noise faults detection have a limit to application of real monitoring data due to the problem with a choice of proper window size of data and finding of threshold for outlier judgment. A correlation analysis among different two datasets were very effective to detect localized outlier and abnormal variation for short and long-term monitoring dataset if reasonable range of training data could be selected.

A Hierarchical Data Dissemination Protocol in Large-Scale Wireless Sensor Networks (대규모 무선 센서 네트워크에서 계층적 데이터 전달 프로토콜)

  • Chu, Seong-Eun;Kang, Dae-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.8
    • /
    • pp.1505-1510
    • /
    • 2008
  • In large-scale wireless sensor networks, the deployed nodes cannot be replaced or recharged after first deployment. Also, dead nodes maγ lead to the partition of whole networks. While performing data dissemination under a battery power constraint, energy efficiency is a key design factor of routing protocol. As a solution for the efficient data dissemination, in this paper, we propose a protocol namely Hierarchical Data Dissemination (HDD) which provides scalable and efficient data delivery to multiple sources and mobile sinks. HDD uses the facts that sink nodes are central gathering Points and source-centric data forwarding paths are constructed and it is maintained with two-tier communications. The performance of HDD is compared with TTDD about the energy consumption, data delivery time and data success ration. The extensive simulation results show that HDD Routing Protocol outperforms TIDD by more than $1.5{\sim}3times$ on energy consumption.

The Application of Generalized Additive Model in the Effectiveness of Scale in Funding Policy on SMEs Overall Performance (일반화 가법 모형을 이용한 정책금융 수혜규모가 중소기업 경영성과에 미치는 효과성 연구)

  • Ha, SeungYin;Jang, Myoung Gyun;Lee, GunHee
    • The Journal of Small Business Innovation
    • /
    • v.20 no.2
    • /
    • pp.35-50
    • /
    • 2017
  • The aims of this study is to analyze the effectiveness of firms financial status quo and the scale of financial support on SMEs overall performance. We have gathered the financial guarantee data from 1998 to 2013, provided by Korea Credit Guarantee Fund (KODIT), to analyze the effectiveness of Financial policy. To classify both financial status quo and scale of financial support, we utilized the following variables; Interest Coverage Ratio (ICR) and newly guaranteed amount ratio. To take the measurement of the overall performance, we employed profitability, growth ratio and activity index. To minimize the effect of repeated financial support (redundancy benefits), firms were selected based on the following criteria: firms that receive no financial support prior to implementing such policy over the last 3 years and no new financial support over the last 2 years. Results suggest that firms with higher ICR and large newly guaranteed amount influence on financial performance in terms of profitability index. Firms with lower ICR and large scale financial support showed a better performance compare to firms with small-scale financial support. Firms with large-scale financial support, irrespective of ICR inclined to have better performance to those of small-scale financial support in terms of growth index. For activity index, however, firms with large scale support led to higher performance in the short term. In turn, our analysis presents objective perspective with respect to the effectiveness of financial policy through credit guarantee on overall performance of SMEs. This study, therefore, implies that well-balanced SMEs supporting policy may lead to better directions.

  • PDF

DNA Pooling as a Tool for Case-Control Association Studies of Complex Traits

  • Ahn, Chul;King, Terri M.;Lee, Kyusang;Kang, Seung-Ho
    • Genomics & Informatics
    • /
    • v.3 no.1
    • /
    • pp.1-7
    • /
    • 2005
  • Case-control studies are widely used for disease gene mapping using individual genotyping data. However, analyses of large samples are often impractical due to the expense of individual genotyping. The use of DNA pooling can significantly reduce the number of genotyping reactions required; hence reducing the cost of large-scale case-control association studies. Here, we discuss the design and analysis of DNA pooling genetic association studies.

On the Spectral Eddy Viscosity in Isotropic Turbulence

  • Park Noma;Yoo Jung Yu;Choi Haecheon
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.105-106
    • /
    • 2003
  • The spectral eddy viscosity model is investigated through the large eddy simulation of the decaying and forced isotropic turbulence. It is shown that the widely accepted 'plateau and cusp' model overpredicts resolved kinetic energy due to the amplification of energy at intermediate wavenumbers. Whereas, the simple plateau model reproduces a correct energy spectrum. This result overshadows a priori tests based on the filtered DNS or experimental data. An alternative method for the validation of subgrid-scale model is discussed.

  • PDF

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Implementation of an open API-based virtual network provisioning automation platform for large-scale data transfer (대용량 데이터 전송을 위한 오픈 API 기반 가상 네트워크 프로비저닝 자동화 플랫폼 구현)

  • Kim, Yong-hwan;Park, Seongjin;Kim, Dongkyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.9
    • /
    • pp.1320-1329
    • /
    • 2022
  • Currently, advanced national research network groups are continuously conducting R&D for the requirement to provide SDN/NFV-based network automation and intelligence technology for R&E users. In addition, the requirement for providing large-scale data transmission with the high performance networking facility, compared to general network environments, is gradually increasing in the advanced national research networks. Accordingly, in this paper, we propose an open API-based virtual network provisioning automation platform for large data transmission researched and developed to respond to the networking requirements of the national research network and present the implementation results. The platform includes the KREONET-S VDN system that provides SDN-based network virtualization technology, and the Kubernetes system that provides container-oriented server virtualization technology, and the Globus Online, a high-performance data transmission system. In this paper, the environment configurations, the system implemetation results for the interworking between the heterogeneous systems, and the automated virtual network provisioning implementation results are presented.

Formation of a large-scale quasi-circular flare ribbon enclosing three-ribbon through two-step eruptive flares

  • Lim, Eun-Kyung;Yurchyshyn, Vasyl;Kumar, Pankaj;Cho, Kyuhyoun;Kim, Sujin;Cho, Kyung-Suk
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.41 no.2
    • /
    • pp.42.1-42.1
    • /
    • 2016
  • The formation process and the dynamical properties of a large-scale quasi-circular flare ribbon were investigated using the SDO AIA and HMI data along with data from RHESSI and SOT. Within one hour time interval, two subsequent M-class flares were detected from the NOAA 12371 that had a ${\beta}{\gamma}{\delta}$ configuration with one bipolar sunspot group in the east and one unipolar spot in the west embedded in a decayed magnetic field. Earlier M2.0 flare was associated with a coronal loop eruption, and a two-ribbon structure formed within the bipolar sunspot group. On the other hand, the later M2.6 flare was associated with a halo CME, and a quasi-circular ribbon developed encircling the full active region. The observed quasi-circular ribbon was strikingly large in size spanning 650" in north-south and 500" in east-west direction. It showed the well-known sequential brightening in the clockwise direction during the decay phase of the M2.6 flare at the estimated speed of 160.7 km s-1. The quasi-circular ribbon also showed the radial expansion, especially in the southern part. Interestingly, at the time of the later M2.6 flare, the third flare ribbon parallel to the early two-ribbon structure also developed near the unipolar sunspot, then showed a typical separation in pair with the eastern most ribbon of the early two ribbons. The potential field reconstruction based on the PFSS model showed a fan shaped magnetic configuration including fan-like field lines stemming from the unipolar spot and fanning out toward the background decayed field. This large-scale fan-like field overarched full active region, and the footpoints of fan-like field lines were co-spatial with the observed quasi-circular ribbon. From the NLFF magnetic field reconstruction, we confirmed the existence of a twisted flux rope structure in the bipolar spot group before the first M2.0 flare. Hard X-ray emission signatures were detected at the site of twisted flux rope during the pre-flare phase of the M2.0 flare. Based on the analysis of both two-ribbon structure and quasi-circular ribbon, we suggest that a tether-cutting reconnection between sheared arcade overarching the twisted flux rope embedded in a fan-like magnetic field may have triggered the first M2.0 flare, then secondary M2.6 flare was introduced by the fan-spine reconnection because of the interaction between the expanding field and the nearby quasi-null and formed the observed large-scale quasi-circular flare ribbon.

  • PDF

A Novel Reference Model for Cloud Manufacturing CPS Platform Based on oneM2M Standard (제조 클라우드 CPS를 위한 oneM2M 기반의 플랫폼 참조 모델)

  • Yun, Seongjin;Kim, Hanjin;Shin, Hyeonyeop;Chin, Hoe Seung;Kim, Won-Tae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.2
    • /
    • pp.41-56
    • /
    • 2019
  • Cloud manufacturing is a new concept of manufacturing process that works like a single factory with connected multiple factories. The cloud manufacturing system is a kind of large-scale CPS that produces products through the collaboration of distributed manufacturing facilities based on technologies such as cloud computing, IoT, and virtualization. It utilizes diverse and distributed facilities based on centralized information systems, which allows flexible composition user-centric and service-oriented large-scale systems. However, the cloud manufacturing system is composed of a large number of highly heterogeneous subsystems. It has difficulties in interconnection, data exchange, information processing, and system verification for system construction. In this paper, we derive the user requirements of various aspects of the cloud manufacturing system, such as functional, human, trustworthiness, timing, data and composition, based on the CPS Framework, which is the analysis methodology for CPS. Next, by analyzing the user requirements we define the system requirements including scalability, composability, interactivity, dependability, timing, interoperability and intelligence. We map the defined CPS system requirements to the requirements of oneM2M, which is the platform standard for IoT, so that the support of the system requirements at the level of the IoT platform is verified through Mobius, which is the implementation of oneM2M standard. Analyzing the verification result, finally, we propose a large-scale cloud manufacturing platform based on oneM2M that can meet the cloud manufacturing requirements to support the overall features of the Cloud Manufacturing CPS with dependability.

Application of a large-scale ensemble climate simulation database for estimating the extreme rainfall (극한강우량 산정을 위한 대규모 기후 앙상블 모의자료의 적용)

  • Kim, Youngkyu;Son, Minwoo
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.3
    • /
    • pp.177-189
    • /
    • 2022
  • The purpose of this study is to apply the d4PDF (Data for Policy Decision Making for Future Change) constructed from a large-scale ensemble climate simulation to estimate the probable rainfall with low frequency and high intensity. In addition, this study analyzes the uncertainty caused by the application of the frequency analysis by comparing the probable rainfall estimated using the d4PDF with that estimated using the observed data and frequency analysis at Geunsam, Imsil, Jeonju, and Jangsu stations. The d4PDF data consists of a total of 50 ensembles, and one ensemble provides climate and weather data for 60 years such as rainfall and temperature. Thus, it was possible to collect 3,000 annual maximum daily rainfall for each station. By using these characteristics, this study does not apply the frequency analysis for estimating the probability rainfall, and we estimated the probability rainfall with a return period of 10 to 1000 years by distributing 3,000 rainfall by the magnitude based on a non-parametric approach. Then, the estimated probability rainfall using d4PDF was compared with those estimated using the Gumbel or GEV distribution and the observed rainfall, and the deviation between two probability rainfall was estimated. As a result, this deviation increased as the difference between the return period and the observation period increased. Meanwhile, the d4PDF reasonably suggested the probability rainfall with a low frequency and high intensity by minimizing the uncertainty occurred by applying the frequency analysis and the observed data with the short data period.