• Title/Summary/Keyword: Large Scale data

Search Result 2,791, Processing Time 0.034 seconds

Streaming Decision Tree for Continuity Data with Changed Pattern (패턴의 변화를 가지는 연속성 데이터를 위한 스트리밍 의사결정나무)

  • Yoon, Tae-Bok;Sim, Hak-Joon;Lee, Jee-Hyong;Choi, Young-Mee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.94-100
    • /
    • 2010
  • Data Mining is mainly used for pattern extracting and information discovery from collected data. However previous methods is difficult to reflect changing patterns with time. In this paper, we introduce Streaming Decision Tree(SDT) analyzing data with continuity, large scale, and changed patterns. SDT defines continuity data as blocks and extracts rules using a Decision Tree's learning method. The extracted rules are combined considering time of occurrence, frequency, and contradiction. In experiment, we applied time series data and confirmed resonable result.

Satellite monitoring of large-scale air pollution in East Asia

  • Chung, Y.S.;Park, K.H.;Kim, H.S.;Kim, Y.S.
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.786-789
    • /
    • 2003
  • The detection of sandstorms and industrial pollutants has been the emphasis of this study. Data obtained from meteorological satellites, NOAA and GMS, have been used for detailed analysis. MODIS and Landsat images are also used for the application of future KOMPSAT- 2. Verification of satellite observations has been made with air pollution data obtained by ground-level monitors. It was found that satellite measurements agree well with concentrations and variations of air pollutants measured on the ground, and that satellite technique is a very useful device for monitoring large-scale air pollution in East Asia. The quantitative analysis of satellite image data on air pollution is the goal in the future studies.

  • PDF

Optimal Provider Mobility in Large-Scale Named- Data Networking

  • Do, Truong-Xuan;Kim, Younghan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.10
    • /
    • pp.4054-4071
    • /
    • 2015
  • Named-Data Networking (NDN) is one of the promising approaches for the Future Internet to cope with the explosion and current usage pattern of Internet traffic. Content provider mobility in the NDN allows users to receive real-time traffic when the content providers are on the move. However, the current solutions for managing these mobile content providers suffer several issues such as long handover latency, high cost, and non-optimal routing path. In this paper, we survey main approaches for provider mobility in NDN and propose an optimal scheme to support the mobile content providers in the large-scale NDN domain. Our scheme predicts the movement of the provider and uses state information in the NDN forwarding plane to set up an optimal new routing path for mobile providers. By numerical analysis, our approach provides NDN users with better service access delay and lower total handover cost compared with the current solutions.

Big Data Astronomy: Large-scale Graph Analyses of Five Different Multiverses

  • Hong, Sungryong
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.2
    • /
    • pp.36.3-37
    • /
    • 2018
  • By utilizing large-scale graph analytic tools in the modern Big Data platform, Apache Spark, we investigate the topological structures of five different multiverses produced by cosmological n-body simulations with various cosmological initial conditions: (1) one standard universe, (2) two different dark energy states, and (3) two different dark matter densities. For the Big Data calculations, we use a custom build of stand-alone Spark cluster at KIAS and Dataproc Compute Engine in Google Cloud Platform with the sample sizes ranging from 7 millions to 200 millions. Among many graph statistics, we find that three simple graph measurements, denoted by (1) $n_\k$, (2) $\tau_\Delta$, and (3) $n_{S\ge5}$, can efficiently discern different topology in discrete point distributions. We denote this set of three graph diagnostics by kT5+. These kT5+ statistics provide a quick look of various orders of n-points correlation functions in a computationally cheap way: (1) $n = 2$ by $n_k$, (2) $n = 3$ by $\tau_\Delta$, and (3) $n \ge 5$ by $n_{S\ge5}$.

  • PDF

A review and comparison of convolution neural network models under a unified framework

  • Park, Jimin;Jung, Yoonsuh
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.161-176
    • /
    • 2022
  • There has been active research in image classification using deep learning convolutional neural network (CNN) models. ImageNet large-scale visual recognition challenge (ILSVRC) (2010-2017) was one of the most important competitions that boosted the development of efficient deep learning algorithms. This paper introduces and compares six monumental models that achieved high prediction accuracy in ILSVRC. First, we provide a review of the models to illustrate their unique structure and characteristics of the models. We then compare those models under a unified framework. For this reason, additional devices that are not crucial to the structure are excluded. Four popular data sets with different characteristics are then considered to measure the prediction accuracy. By investigating the characteristics of the data sets and the models being compared, we provide some insight into the architectural features of the models.

DEVELOPMENT PROCESS OF INFORMATION FLOW RETRIEVAL SYSTEM FOR LARGE-SCALE CONSTRUCTION PROJECTS

  • Jinho Shin;Hyun-soo Lee ;Moonseo Park;Jung-ho Yu;Jungseok Kim
    • International conference on construction engineering and project management
    • /
    • 2011.02a
    • /
    • pp.556-560
    • /
    • 2011
  • Players of construction projects proceed with each work process by information gathering, modification and communication. Due to the complex and long-span lifecycle projects increased, it became more important to grasp this mechanism for the successful project performance in construction project. Hence, most project information management systems or knowledge management systems equip information retrieval system. There are two logic to infer the meaning of retrieval target; inductive reasoning and deductive reasoning. The former is based on metadata explaining the target and the later is based on relation between data. To infer the information flow, it is necessary to define the correlation between players and work processes. However, most established information retrieval systems are based on index search system and it is not focused on correlation between data but data itself. Thus, this research aims to research on process of information flow retrieval system for large-scale construction projects.

  • PDF

Prediction of small-scale leak flow rate in LOCA situations using bidirectional GRU

  • Hye Seon Jo;Sang Hyun Lee;Man Gyun Na
    • Nuclear Engineering and Technology
    • /
    • v.56 no.9
    • /
    • pp.3594-3601
    • /
    • 2024
  • It is difficult to detect a small-scale leakage in a nuclear power plant (NPP) quickly and take appropriate action. Delaying these procedures can have adverse effects on NPPs. In this paper, we propose leak flow rate prediction using the bidirectional gated recurrent unit (Bi-GRU) method to detect leakage quickly and accurately in small-scale leakage situations because large-scale leak rates are known to be predicted accurately. The data were acquired by simulating small loss-of-coolant accidents (LOCA) or small-scale leakage situations using the modular accident analysis program (MAAP) code. In addition, to improve prediction performance, data were collected by distinguishing the break sizes in more detail. In addition, the prediction accuracy was improved by performing both LOCA diagnosis and leak flow rate prediction in small LOCA situations. The prediction model developed using the Bi-GRU showed a superior prediction performance compared with other artificial intelligence methods. Accordingly, the accurate and effective prediction model for small-scale leakage situations proposed herein is expected to support operators in decision-making and taking actions.

A Data Generator for Database Benchmarks and its Performance Evaluation (데이터베이스 벤치마크를 위한 데이터 생성기와 성능 평가)

  • Ok, Eun-Taek;Jeong, Hoe-Jin;Lee, Sang-Ho
    • The KIPS Transactions:PartD
    • /
    • v.10D no.6
    • /
    • pp.907-916
    • /
    • 2003
  • Database benchmarks require efficient of large-scale data. This presents the system architecture, control flows, and characteristics of the data generator we have developed. The data generator features generation of large-scale data, column-by-column data generation, a number of data distributions and verification, and real data generation. An extensive conparison with other data generators in terms of function is also presented. Finally, empirical performance experiments between RAID systems and non-RAID one have been conducted to alleviate I/O bottleneck. The test results can serve as guidelines to help confifure system architecture.

Improvement Index and Characteristic for the Safety Management Level of Domestic Construction Companies (국내 건설회사의 안전관리수준 향상지수 및 특성 분석)

  • Son, Chang-Baek;Lee, Dong-Eun;Choi, Seung-Mo
    • Journal of the Korean Society of Safety
    • /
    • v.22 no.4
    • /
    • pp.51-56
    • /
    • 2007
  • In order to present basic data for the balancing improvement of safety management level in domestic construction companies the improvement index and characteristic of safety management level are offered by comparing the year 2006's safety level with the year 2001's one. The companies under concern are classified into 51 large scale companies and 61 middle and small scale ones. The safety management level of both head office and construction sites is improved for all companies without regard to the scale. Specially, the improvement index of middle and small scale companies shows the higher rate than large ones and head office higher than construction sites.

A Range Query Method using Index in Large-scale Database Systems (대규모 데이터베이스 시스템에서 인덱스를 이용한 범위 질의 방법)

  • Kim, Chi-Yeon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.5
    • /
    • pp.1095-1101
    • /
    • 2012
  • As the amount of data increases explosively, a large scale database system is emerged to store, retrieve and manipulate it. There are several issues in this environments such as, consistency, availability and fault tolerance. In this paper, we address a efficient range-query method where data management services are separated from transaction management services in large-scale database systems. A study had been proposed using partitions to protect independence of two modules and to resolve the phantom problem, but this method was efficient only when range-query is specified by a key. So, we present a new method that can improve the efficiency when range-query is specified by a key attribute as well as other attributes. The presented method can guarantee the independence of separated modules and alleviate overheads for range-query using partial index.