• Title/Summary/Keyword: data pre-processing

Search Result 800, Processing Time 0.024 seconds

Review on Pre-processing of Earthquake Data from KEPRI Seismic Monitoring System (전력연구원 지진관측자료의 사전자료처리 기법 및 효과적인 활용에 관한 고찰)

  • 연관희;박동희;최원학;장천중
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.6 no.2
    • /
    • pp.39-50
    • /
    • 2002
  • Several pre-processing techniques for earthquake data from earthquake monitoring institutes in Korea including Korea Electric Power Research Institute are thoroughly reviewed. Among these techniques for removing an instrumental response, removing the non-causal ringing distortion by FIR filter, checking calibration status of seismic stations, and minimizing the window effect are introduced and applied to real data. It is also recommended that analysts evaluate S/N ratio in the frequency domain and consider the possibility of using the saturated earthquake data.

Radar Signal Processor Design Using FPGA (FPGA를 이용한 레이더 신호처리 설계)

  • Ha, Changhun;Kwon, Bojun;Lee, Mangyu
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.20 no.4
    • /
    • pp.482-490
    • /
    • 2017
  • The radar signal processing procedure is divided into the pre-processing such as frequency down converting, down sampling, pulse compression, and etc, and the post-processing such as doppler filtering, extracting target information, detecting, tracking, and etc. The former is generally designed using FPGA because the procedure is relatively simple even though there are large amounts of ADC data to organize very quickly. On the other hand, in general, the latter is parallel processed by multiple DSPs because of complexity, flexibility and real-time processing. This paper presents the radar signal processor design using FPGA which includes not only the pre-processing but also the post-processing such as doppler filtering, bore-sight error, NCI(Non-Coherent Integration), CFAR(Constant False Alarm Rate) and etc.

Detection of Subsurface Defects in Metal Materials Using Infrared Thermography; Image Processing and Finite Element Modeling

  • Ranjit, Shrestha;Kim, Won Tae
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.34 no.2
    • /
    • pp.128-134
    • /
    • 2014
  • Infrared thermography is an emerging approach to non-contact, non-intrusive, and non-destructive inspection of various solid materials such as metals, composites, and semiconductors for industrial and research interests. In this study, data processing was applied to infrared thermography measurements to detect defects in metals that were widely used in industrial fields. When analyzing experimental data from infrared thermographic testing, raw images were often not appropriate. Thus, various data analysis methods were used at the pre-processing and processing levels in data processing programs for quantitative analysis of defect detection and characterization; these increased the infrared non-destructive testing capabilities since subtle defects signature became apparent. A 3D finite element simulation was performed to verify and analyze the data obtained from both the experiment and the image processing techniques.

A Study on the Use of Stopword Corpus for Cleansing Unstructured Text Data (비정형 텍스트 데이터 정제를 위한 불용어 코퍼스의 활용에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.891-897
    • /
    • 2022
  • In big data analysis, raw text data mostly exists in various unstructured data forms, so it becomes a structured data form that can be analyzed only after undergoing heuristic pre-processing and computer post-processing cleansing. Therefore, in this study, unnecessary elements are purified through pre-processing of the collected raw data in order to apply the wordcloud of R program, which is one of the text data analysis techniques, and stopwords are removed in the post-processing process. Then, a case study of wordcloud analysis was conducted, which calculates the frequency of occurrence of words and expresses words with high frequency as key issues. In this study, to improve the problems of the "nested stopword source code" method, which is the existing stopword processing method, using the word cloud technique of R, we propose the use of "general stopword corpus" and "user-defined stopword corpus" and conduct case analysis. The advantages and disadvantages of the proposed "unstructured data cleansing process model" are comparatively verified and presented, and the practical application of word cloud visualization analysis using the "proposed external corpus cleansing technique" is presented.

Development of a Pre-Processing Program for Flow Analysis Based on the Object-Oriented Programming Concept (OOP 개념에 기초한 유동해석용 전처리 프로그램 개발)

  • Myong, Hyon-Kook;Ahn, Jong-Ki
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.32 no.1
    • /
    • pp.70-77
    • /
    • 2008
  • A pre-processing program based on the OOP(object-oriented programming) concept has been developed. The program consists of the input of a 2D or 3D flow problem to a CFD program by means of an user-friendly interface and the subsequent transformation of this input into a form suitable for the solver(PowerCFD) using unstructured cell-centered method. User-friendly GUI(graphic user interface) has been built on the base of MFC(Microsoft Foundation Class). The program is organized as modules by classes based on VTK(Visualization ToolKit)-library, and these classes are made to function through inheritance and cooperation which is an important and valuable concept of object-oriented programming. The major functions of this program are introduced and demonstrated, which include mesh generation, boundary settings, solver settings, generation of grid connectivity and geometric data etc.

Design of Client-Server Model For Effective Processing and Utilization of Bigdata (빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계)

  • Park, Dae Seo;Kim, Hwa Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.109-122
    • /
    • 2016
  • Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

A Survey on Deep Learning-based Pre-Trained Language Models (딥러닝 기반 사전학습 언어모델에 대한 이해와 현황)

  • Sangun Park
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.11-29
    • /
    • 2022
  • Pre-trained language models are the most important and widely used tools in natural language processing tasks. Since those have been pre-trained for a large amount of corpus, high performance can be expected even with fine-tuning learning using a small number of data. Since the elements necessary for implementation, such as a pre-trained tokenizer and a deep learning model including pre-trained weights, are distributed together, the cost and period of natural language processing has been greatly reduced. Transformer variants are the most representative pre-trained language models that provide these advantages. Those are being actively used in other fields such as computer vision and audio applications. In order to make it easier for researchers to understand the pre-trained language model and apply it to natural language processing tasks, this paper describes the definition of the language model and the pre-learning language model, and discusses the development process of the pre-trained language model and especially representative Transformer variants.

Study on Performance Improvement of Video in the H.264 Codec (H.264 코덱에서 동영상 성능개선 연구)

  • Bong, Jeong-Sik;Jeon, Joon-Hyeon
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.532-535
    • /
    • 2005
  • These days, many image processing techniques have been studied for effective image compression. Among those, 2D image filtering is widely used for 2D image processing. The 2D image filtering can be implemented by performing ID linear filtering separately in the direction of horizontal and vertical. Efficiency of image compression depends on what filtering method is used. Generally, circular convolution is widely used in the 2D image filtering for image processing. However it doesn't consider correlations at the region of image boundary, therefore filtering can not be performed effectively. To solve this problem. I proposed new convolution technique using Symmetric-Mirroring convolution, satisfying the 'alias-free' and 'error-free' requirement in the reconstructed image. This method could provide more effective performance than former compression methods. Because it used very high correlative data when performed at the boundary region. In this paper, pre-processing filtering in H.264 codec was adopted to analyze efficiency of proposed filtering technique, and the simulator developed by Matlab language was used to examine the performance of the proposed method.

  • PDF

A Study on Data Pre-filtering Methods for Fault Diagnosis (시스템 결함원인분석을 위한 데이터 로그 전처리 기법 연구)

  • Lee, Yang-Ji;Kim, Duck-Young;Hwang, Min-Soon;Cheong, Young-Soo
    • Korean Journal of Computational Design and Engineering
    • /
    • v.17 no.2
    • /
    • pp.97-110
    • /
    • 2012
  • High performance sensors and modern data logging technology with real-time telemetry facilitate system fault diagnosis in a very precise manner. Fault detection, isolation and identification in fault diagnosis systems are typical steps to analyze the root cause of failures. This systematic failure analysis provides not only useful clues to rectify the abnormal behaviors of a system, but also key information to redesign the current system for retrofit. The main barriers to effective failure analysis are: (i) the gathered data (event) logs are too large in general, and further (ii) they usually contain noise and redundant data that make precise analysis difficult. This paper therefore applies suitable pre-processing techniques to data reduction and feature extraction, and then converts the reduced data log into a new format of event sequence information. Finally the event sequence information is decoded to investigate the correlation between specific event patterns and various system faults. The efficiency of the developed pre-filtering procedure is examined with a terminal box data log of a marine diesel engine.