• Title/Summary/Keyword: large data

Search Result 14,238, Processing Time 0.041 seconds

BIM Geometry Cache Structure for Data Streaming with Large Volume (대용량 BIM 형상 데이터 스트리밍을 위한 캐쉬 구조)

  • Kang, Tae-Wook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.9
    • /
    • pp.1-8
    • /
    • 2017
  • The purpose of this study is to propose a cache structure for processing large-volume building information modeling (BIM) geometry data,whereit is difficult to allocate physical memory. As the number of BIM orders has increased in the public sector, it is becoming more common to visualize and calculate large-volume BIM geometry data. Design and review collaboration can require a lot of time to download large-volume BIM data through the network. If the BIM data exceeds the physical free-memory limit, visualization and geometry computation cannot be possible. In order to utilize large amounts of BIM data on insufficient physical memory or a low-bandwidth network, it is advantageous to cache only the data necessary for BIM geometry rendering and calculation time. Thisstudy proposes acache structure for efficiently rendering and calculating large-volume BIM geometry data where it is difficult to allocate enough physical memory.

Training Data Sets Construction from Large Data Set for PCB Character Recognition

  • NDAYISHIMIYE, Fabrice;Gang, Sumyung;Lee, Joon Jae
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.225-234
    • /
    • 2019
  • Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.

Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set (대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개)

  • Lim, Yong-B.;Cho, J.;Um, Kyung-A;Lee, Sun-Ah
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.2
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

ANALYSIS AND INTERCOMPARISON OF VARIOUS GLOBAL EVAPORATION PRODUCTS

  • School of Marine Science and Technology, Tokai University, Tsuyoshi Watabe;School of Marine Science and Technology, Tokai University, Masahisa Kubota
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.285-288
    • /
    • 2008
  • We analyzed evaporation data in the Japanese Ocean Flux Data Sets with Use of Remote Sensing Observations (J-OFURO) Ver.2. There exists huge evaporation in Gulf Stream, Kuroshio Extension, the ocean dessert and the southern part of the Indian Ocean. The temporal variation of evaporation is overwhelmingly large, of which the standard deviation is more than 120(mm), in the Kuroshio Extension region. Also, the result of harmonic analysis gives that this large variation is closely related to annual variation. In addition, the first EOF mode shows long-term variation showing the maximum amplitude between 1992 and 1994 and remarkable decrease after 1994, and large amplitude in the equatorial region and northeast of Australia. The second and third modes were strongly influenced by El Nino. Moreover, we compared J-OFURO2 evaporation product with other products. We used six kinds of data sets (HOAPS3 and GSSTF2 of satellite data, NRA1, NRA2, ERA40 and JRA25 of reanalysis data) for comparison. Most products show underestimation in the most regions, in particular, in the northern North Pacific, mid-latitudes of the eastern South Pacific, and high-latitudes of the South Pacific compared with J-OFUR02. On the other hand, JRA25 and NRA2 show large overestimation in the equatorial regions. RMS difference between NRA2 and J-OFURO2 in the Kuroshio Extension was significantly large, more than 120(mm).

  • PDF

Data Partitioning on MapReduce by Leveraging Data Utility (맵리듀스에서 데이터의 유용성을 이용한 데이터 분할 기법)

  • Kim, Jong Wook
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.5
    • /
    • pp.657-666
    • /
    • 2013
  • Today, many aspects of our lives are characterized by the rapid influx of large amounts of data from various application domains. The applications that produce this massive of data span a large spectrum, from social media to business intelligence or biology. This massive influx of data necessitates large scale parallelism for efficiently supporting a large class of analysis tasks. Recently, there have been extensive studies in using MapReduce framework to support large parallelism. While this technique has produced impressive results in diverse applications, the same can not be said for multimedia applications where most of users are interested in a small number of results having high or low score. Thus, in this paper, we develop the data partitioning algorithm which is able to efficiently process large data set having different data utility. The experiment results show that the proposed technique provides significant execution time gains over the existing solution.

Application of Data Processing Technology on Large Clusters to Distribution Automation System (대용량 데이터 처리기술을 배전자동화 시스템에 적용)

  • Lee, Sung-Woo;Ha, Bok-Nam;Seo, In-Yong;Jang, Moon-Jong
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.60 no.2
    • /
    • pp.245-251
    • /
    • 2011
  • Quantities of data in the DMS (Distribution management system) or SCADA (Supervisory control and data acquisition) system is enormously large as illustrated by the usage of term flooding of data. This enormous quantity of data is transmitted to the status data or event data of the on-site apparatus in real-time. In addition, if GIS (Geographic information system) and AMR (Automatic meter reading), etc are integrated, the quantity of data to be processed in real-time increases unimaginably. Increase in the quantity of data due to addition of system or increase in the on-site facilities cannot be handled through the currently used Single Thread format of data processing technology. However, if Multi Thread technology that utilizes LF-POOL (Leader Follower -POOL) is applied in processing large quantity of data, large quantity of data can be processed in short period of time and the load on the server can be minimized. In this Study, the actual materialization and functions of LF POOL technology are examined.

Trends in Compute Express Link(CXL) Technology (CXL 인터커넥트 기술 연구개발 동향)

  • S.Y. Kim;H.Y. Ahn;Y.M. Park;W.J. Han
    • Electronics and Telecommunications Trends
    • /
    • v.38 no.5
    • /
    • pp.23-33
    • /
    • 2023
  • With the widespread demand from data-intensive tasks such as machine learning and large-scale databases, the amount of data processed in modern computing systems is increasing exponentially. Such data-intensive tasks require large amounts of memory to rapidly process and analyze massive data. However, existing computing system architectures face challenges when building large-scale memory owing to various structural issues such as CPU specifications. Moreover, large-scale memory may cause problems including memory overprovisioning. The Compute Express Link (CXL) allows computing nodes to use large amounts of memory while mitigating related problems. Hence, CXL is attracting great attention in industry and academia. We describe the overarching concepts underlying CXL and explore recent research trends in this technology.

A Real-Time Rendering Algorithm of Large-Scale Point Clouds or Polygon Meshes Using GLSL (대규모 점군 및 폴리곤 모델의 GLSL 기반 실시간 렌더링 알고리즘)

  • Park, Sangkun
    • Korean Journal of Computational Design and Engineering
    • /
    • v.19 no.3
    • /
    • pp.294-304
    • /
    • 2014
  • This paper presents a real-time rendering algorithm of large-scale geometric data using GLSL (OpenGL shading language). It details the VAO (vertex array object) and VBO(vertex buffer object) to be used for up-loading the large-scale point clouds and polygon meshes to a graphic video memory, and describes the shader program composed by a vertex shader and a fragment shader, which manipulates those large-scale data to be rendered by GPU. In addition, we explain the global rendering procedure that creates and runs the shader program with the VAO and VBO. Finally, a rendering performance will be measured with application examples, from which it will be demonstrated that the proposed algorithm enables a real-time rendering of large amount of geometric data, almost impossible to carry out by previous techniques.

Removing Large-scale Variations in Regularly and Irregularly Spaced Data

  • Cho, Jungyeon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.1
    • /
    • pp.43.2-43.2
    • /
    • 2019
  • In many astrophysical systems, smooth large-scale variations coexist with small-scale fluctuations. For example, a large-scale velocity or density gradient can exist in molecular clouds that have small-scale fluctuations by turbulence. In redshifted 21cm observations, we also have two types of signals - the Galactic foreground emissions that change smoothly and the redshifted 21cm signals that fluctuate fast in frequency space. In many cases, the large-scale variations make it difficult to extract information on small-scale fluctuations. We propose a simple technique to remove smooth large-scale variations. Our technique relies on multi-point structure functions and can obtain the magnitudes of small-scale fluctuations. It can also be used to design filters that can remove large-scale variations and retrieve small-scale data. We discuss how to apply our technique to irregularly spaced data, such as rotation measure observations toward extragalactic radio point sources.

  • PDF

RDFS Rule based Parallel Reasoning Scheme for Large-Scale Streaming Sensor Data (대용량 스트리밍 센서데이터 환경에서 RDFS 규칙기반 병렬추론 기법)

  • Kwon, SoonHyun;Park, Youngtack
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.686-698
    • /
    • 2014
  • Recently, large-scale streaming sensor data have emerged due to explosive supply of smart phones, diffusion of IoT and Cloud computing technology, and generalization of IoT devices. Also, researches on combination of semantic web technology are being actively pushed forward by increasing of requirements for creating new value of data through data sharing and mash-up in large-scale environments. However, we are faced with big issues due to large-scale and streaming data in the inference field for creating a new knowledge. For this reason, we propose the RDFS rule based parallel reasoning scheme to service by processing large-scale streaming sensor data with the semantic web technology. In the proposed scheme, we run in parallel each job of Rete network algorithm, the existing rule inference algorithm and sharing data using the HBase, a hadoop database, as a public storage. To achieve this, we implement our system and evaluate performance through the AWS data of the weather center as large-scale streaming sensor data.