• 제목/요약/키워드: Preprocessing method

검색결과 1,070건 처리시간 0.032초

Ontology based Preprocessing Scheme for Mining Data Streams from Sensor Networks (센서 네트워크의 데이터 스트림 마이닝을 위한 온톨로지 기반의 전처리 기법)

  • Jung, Jason J.
    • Journal of Intelligence and Information Systems
    • /
    • 제15권3호
    • /
    • pp.67-80
    • /
    • 2009
  • By a number of sensors and sensor networks, we can collect environmental information from a certain sensor space. To discover more useful information and knowledge, we want to employ data mining methodologies to sensor data stream from such sensor spaces. In this paper, we present a novel data preprocessing scheme to improve the performances of the data mining algorithms. Especially, ontologies are applied to represent meanings of the sensor data. For evaluating the proposed method, we have collected sensor streams for about 30 days, and simulated them to compare with other approaches.

  • PDF

A Preprocessing Algorithm for Efficient Lossless Compression of Gray Scale Images

  • Kim, Sun-Ja;Hwang, Doh-Yeun;Yoo, Gi-Hyoung;You, Kang-Soo;Kwak, Hoon-Sung
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2005년도 ICCAS
    • /
    • pp.2485-2489
    • /
    • 2005
  • This paper introduces a new preprocessing scheme to replace original data of gray scale images with particular ordered data so that performance of lossless compression can be improved more efficiently. As a kind of preprocessing technique to maximize performance of entropy encoder, the proposed method converts the input image data into more compressible form. Before encoding a stream of the input image, the proposed preprocessor counts co-occurrence frequencies for neighboring pixel pairs. Then, it replaces each pair of adjacent gray values with particular ordered numbers based on the investigated co-occurrence frequencies. When compressing ordered image using entropy encoder, we can expect to raise compression rate more highly because of enhanced statistical feature of the input image. In this paper, we show that lossless compression rate increased by up to 37.85% when comparing results from compressing preprocessed and non-preprocessed image data using entropy encoder such as Huffman, Arithmetic encoder.

  • PDF

A Study on the Development of Surface Defect Inspection Preprocessing Algorithm for Cold Mill Strip (냉연 표면흠 검사를 위한 전처리 알고리듬에 관한 연구)

  • Kim, Jong-Woong;Kim, Kyoung-Min;Moon, Yun-Shik;Park, Gwi-Tae;Lee, Jong-Hak;Jung, Jin-Yang
    • Proceedings of the KIEE Conference
    • /
    • 대한전기학회 1996년도 하계학술대회 논문집 B
    • /
    • pp.1240-1242
    • /
    • 1996
  • In a still mill, the effective surface defect inspection algorithm is necessary. For this purpose, this paper proposed the preprocessing algorithm for surface defect inspection of cold mill strip. This consists of live steps. They are edge detection, binarizing, noise deletion, combining of fragmented defect and selecting the largest defect. Especially, binarizing is a critical problem. Bemuse the performance of the preprocessing is largely depend on the binarized image. So, we develope the adaptive thresholding method, which is multilevel thresholding. The thresholding value is varied according to the mean graylevel value of each test image. To investigate the performance of the proposed algorithm, we classified the detected defect using neural network. The test image is 20 defect images captured at German Sick Co. This algorithm is proved to have good property in cold mill strip surface inspection.

  • PDF

Text mining-based Data Preprocessing and Accident Type Analysis for Construction Accident Analysis (건설사고 분석을 위한 텍스트 마이닝 기반 데이터 전처리 및 사고유형 분석)

  • Yoon, Young Geun;Lee, Jae Yun;Oh, Tae Keun
    • Journal of the Korean Society of Safety
    • /
    • 제37권2호
    • /
    • pp.18-27
    • /
    • 2022
  • Construction accidents are difficult to prevent because several different types of activities occur simultaneously. The current method of accident analysis only indicates the number of occurrences for one or two variables and accidents have not reduced as a result of safety measures that focus solely on individual variables. Even if accident data is analyzed to establish appropriate safety measures, it is difficult to derive significant results due to a large number of data variables, elements, and qualitative records. In this study, in order to simplify the analysis and approach this complex problem logically, data preprocessing techniques, such as latent class cluster analysis (LCCA) and predictor importance were used to discover the most influential variables. Finally, the correlation was analyzed using an alluvial flow diagram consisting of seven variables and fourteen elements based on accident data. The alluvial diagram analysis using reduced variables and elements enabled the identification of accident trends into four categories. The findings of this study demonstrate that complex and diverse construction accident data can yield relevant analysis results, assisting in the prevention of accidents.

Development of the Modified Preprocessing Method for Pipe Wall Thinning Data in Nuclear Power Plants (원자력 발전소 배관 감육 측정데이터의 개선된 전처리 방법 개발)

  • Seong-Bin Mun;Sang-Hoon Lee;Young-Jin Oh;Sung-Ryul Kim
    • Transactions of the Korean Society of Pressure Vessels and Piping
    • /
    • 제19권2호
    • /
    • pp.146-154
    • /
    • 2023
  • In nuclear power plants, ultrasonic test for pipe wall thickness measurement is used during periodic inspections to prevent pipe rupture due to pipe wall thinning. However, when measuring pipe wall thickness using ultrasonic test, a significant amount of measurement error occurs due to the on-site conditions of the nuclear power plant. If the maximum pipe wall thinning rate is decided by the measured pipe wall thickness containing a significant error, the pipe wall thinning rate data have significant uncertainty and systematic overestimation. This study proposes preprocessing of pipe wall thinning measurement data using support vector machine regression algorithm. By using support vector machine, pipe wall thinning measurement data can be smoothened and accordingly uncertainty and systematic overestimation of the estimated pipe wall thinning rate data can be reduced.

Study on non-destructive sorting technique for lettuce(Lactuca sativa L) seed using fourier transform near-Infrared spectrometer (FT-NIR을 이용한 상추(Lactuca sativa L) 종자의 비파괴 선별 기술에 관한 연구)

  • Ahn, Chi-Kook;Cho, Byoung-Kwan;Kang, Jum-Soon;Lee, Kang-Jin
    • Korean Journal of Agricultural Science
    • /
    • 제39권1호
    • /
    • pp.111-116
    • /
    • 2012
  • Nondestructive evaluation of seed viability is one of the highly demanding technologies for seed production industry. Conventional seed sorting technologies, such as tetrazolium and standard germination test are destructive, time consuming, and labor intensive methods. Near infrared spectroscopy technique has shown good potential for nondestructive quality measurements for food and agricultural products. In this study, FT-NIR spectroscopy was used to classify normal and artificially aged lettuce seeds. The spectra with the range of 1100~2500 nm were scanned for lettuce seeds and analyzed using the principal component analysis(PCA) method. To classify viable seeds from nonviable seeds, a calibration modeling set was developed with a partial least square(PLS) method. The calibration model developed from PLS resulted in 98% classification accuracy with the Savitzky-Golay $1^{st}$ derivative preprocessing method. The prediction accuracy for the test data set was 93% with the MSC(Multiplicative Scatter Correction) preprocessing method. The results show that FT-NIR has good potential for discriminating non-viable lettuce seeds from viable ones.

An Improvement of Lossless Image Compression for Mobile Game (모바일 게임을 위한 개선된 무손실 이미지 압축)

  • Kim Se-Woong;Jo Byung-Ho
    • The KIPS Transactions:PartB
    • /
    • 제13B권3호
    • /
    • pp.231-238
    • /
    • 2006
  • In this paper, the method to make lossless image compression that holds considerable part of total volume of mobile game has been proposed. To increase the compression rate, we compress the image by Deflate algorithm defined in RFC 1951 after reorganize it at preprocessing stage before conducting actual compression. At the stage of preprocessing, we obtained the size of a dictionary based on the information of image which is the feature of Dictionary-Based Coding, and increased the better compression rate than compressing in a general manner using in a way of restructuring image by pixel packing method and DPCM prediction technique. It has shown that the method increased 9.7% of compression rate compare with existing mobile image format, after conducting the test of compression rate applying the suggested compression method into various mobile games.

Data Preprocessing Method for Lightweight Automotive Intrusion Detection System (차량용 경량화 침입 탐지 시스템을 위한 데이터 전처리 기법)

  • Sangmin Park;Hyungchul Im;Seongsoo Lee
    • Journal of IKEEE
    • /
    • 제27권4호
    • /
    • pp.531-536
    • /
    • 2023
  • This paper proposes a sliding window method with frame feature insertion for immediate attack detection on in-vehicle networks. This method guarantees real-time attack detection by labeling based on the attack status of the current frame. Experiments show that the proposed method improves detection performance by giving more weight to the current frame in CNN computation. The proposed model was designed based on a lightweight LeNet-5 architecture and it achieves 100% detection for DoS attacks. Additionally, by comparing the complexity with conventional models, the proposed model has been proven to be more suitable for resource-constrained devices like ECUs.

An Efficient Character Image Enhancement and Region Segmentation Using Watershed Transformation (Watershed 변환을 이용한 효율적인 문자 영상 향상 및 영역 분할)

  • Choi, Young-Kyoo;Rhee, Sang-Burm
    • The KIPS Transactions:PartB
    • /
    • 제9B권4호
    • /
    • pp.481-490
    • /
    • 2002
  • Off-line handwritten character recognition is in difficulty of incomplete preprocessing because it has not dynamic information has various handwriting, extreme overlap of the consonant and vowel and many error image of stroke. Consequently off-line handwritten character recognition needs to study about preprocessing of various methods such as binarization and thinning. This paper considers running time of watershed algorithm and the quality of resulting image as preprocessing for off-line handwritten Korean character recognition. So it proposes application of effective watershed algorithm for segmentation of character region and background region in gray level character image and segmentation function for binarization by extracted watershed image. Besides it proposes thinning methods that effectively extracts skeleton through conditional test mask considering routing time and quality of skeleton, estimates efficiency of existing methods and this paper's methods as running time and quality. Average execution time on the previous method was 2.16 second and on this paper method was 1.72 second. We prove that this paper's method removed noise effectively with overlap stroke as compared with the previous method.

Prediction of Distillation Column Temperature Using Machine Learning and Data Preprocessing (머신 러닝과 데이터 전처리를 활용한 증류탑 온도 예측)

  • Lee, Yechan;Choi, Yeongryeol;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • 제59권2호
    • /
    • pp.191-199
    • /
    • 2021
  • A distillation column, which is a main facility of the chemical process, separates the desired product from a mixture by using the difference of boiling points. The distillation process requires the optimization and the prediction of operation because it consumes much energy. The target process of this study is difficult to operate efficiently because the composition of feed flow is not steady according to the supplier. To deal with this problem, we could develop a data-driven model to predict operating conditions. However, data preprocessing is essential to improve the predictive performance of the model because the raw data contains outlier and noise. In this study, after optimizing the predictive model based long-short term memory (LSTM) and Random forest (RF), we used a low-pass filter and one-class support vector machine for data preprocessing and compared predictive performance according to the method and range of the preprocessing. The performance of the predictive model and the effect of the preprocessing is compared by using R2 and RMSE. In the case of LSTM, R2 increased from 0.791 to 0.977 by 23.5%, and RMSE decreased from 0.132 to 0.029 by 78.0%. In the case of RF, R2 increased from 0.767 to 0.938 by 22.3%, and RMSE decreased from 0.140 to 0.050 by 64.3%.