• Title/Summary/Keyword: Data preprocessing technique

Search Result 167, Processing Time 0.023 seconds

Evaluation of Firmness and Sweetness Index of Tomatoes using Hyperspectral Imaging

  • Rahman, Anisur;Faqeerzada, Mohammad Akbar;Joshi, Rahul;Cho, Byoung-Kwan
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.44-44
    • /
    • 2017
  • The objective of this study was to evaluate firmness, and sweetness index (SI) of tomatoes (Lycopersicum esculentum) by using hyperspectral imaging (HSI) in the range of 1000-1400 nm. The mean spectra of the 95 matured tomato samples were extracted from the hyperspectral images, and the reference firmness and sweetness index of the same sample were measured and calibrated with their corresponding spectral data by partial least squares (PLS) regression with different preprocessing method. The results showed that the regression model developed by PLS regression based on Savitzky-Golay (S-G) second-derivative preprocessed spectra resulted in better performance for firmness, and SI of tomatoes compared to models developed by other preprocessing methods, with correlation coefficients (rpred) of 0.82, and 0.74 with standard error of prediction (SEP) of 0.86 N, and 0.63 respectively. Then, the feature wavelengths were identified using model-based variable selection method, i.e., variable important in projection (VIP), resulting from the PLS regression analyses and finally chemical images were derived by applying the respective regression coefficient on the spectral image in a pixel-wise manner. The resulting chemical images provided detailed information on firmness, and sweetness index (SI) of tomatoes. Therefore, these research demonstrated that HIS technique has a potential for rapid and non-destructive evaluation of the firmness and sweetness index of tomatoes.

  • PDF

Fishery R&D Big Data Platform and Metadata Management Strategy (수산과학 빅데이터 플랫폼 구축과 메타 데이터 관리방안)

  • Kim, Jae-Sung;Choi, Youngjin;Han, Myeong-Soo;Hwang, Jae-Dong;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.93-103
    • /
    • 2019
  • In this paper, we introduce a big data platform and a metadata management technique for fishery science R & D information. The big data platform collects and integrates various types of fisheries science R & D information and suggests how to build it in the form of a data lake. In addition to existing data collected and accumulated in the field of fisheries science, we also propose to build a big data platform that supports diverse analysis by collecting unstructured big data such as satellite image data, research reports, and research data. Next, by collecting and managing metadata during data extraction, preprocessing and storage, systematic management of fisheries science big data is possible. By establishing metadata in a standard form along with the construction of a big data platform, it is meaningful to suggest a systematic and continuous big data management method throughout the data lifecycle such as data collection, storage, utilization and distribution.

  • PDF

Malicious Code Detection using the Effective Preprocessing Method Based on Native API (Native API 의 효과적인 전처리 방법을 이용한 악성 코드 탐지 방법에 관한 연구)

  • Bae, Seong-Jae;Cho, Jae-Ik;Shon, Tae-Shik;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.4
    • /
    • pp.785-796
    • /
    • 2012
  • In this paper, we propose an effective Behavior-based detection technique using the frequency of system calls to detect malicious code, when the number of training data is fewer than the number of properties on system calls. In this study, we collect the Native APIs which are Windows kernel data generated by running program code. Then we adopt the normalized freqeuncy of Native APIs as the basic properties. In addition, the basic properties are transformed to new properties by GLDA(Generalized Linear Discriminant Analysis) that is an effective method to discriminate between malicious code and normal code, although the number of training data is fewer than the number of properties. To detect the malicious code, kNN(k-Nearest Neighbor) classification, one of the bayesian classification technique, was used in this paper. We compared the proposed detection method with the other methods on collected Native APIs to verify efficiency of proposed method. It is presented that proposed detection method has a lower false positive rate than other methods on the threshold value when detection rate is 100%.

Data Preprocessing Technique and Service Operation Architecture for Demand Forecasting of Electric Vehicle Charging Station (전기자동차 충전소 수요 예측 데이터 전처리 기법 및 서비스 운영 아키텍처)

  • Joongi Hong;Suntae Kim;Jeongah Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.2
    • /
    • pp.131-138
    • /
    • 2023
  • Globally, the eco-friendly industry is developing due to the climate crisis. Electric vehicles are an eco-friendly industry that is attracting attention as it is expected to reduce carbon emissions by 30~70% or more compared to internal combustion engine vehicles. As electric vehicles become more popular, charging stations have become an important factor for purchasing electric vehicles. Recent research is using artificial intelligence to identify local demand for charging stations and select locations that can maximize economic impact. In this study, in order to contribute to the improvement of the performance of the electric vehicle charging station demand prediction model, nationwide data that can be used in the artificial intelligence model was defined and a pre-processing technique was proposed. In addition, a preprocessor, artificial intelligence model, and service web were implemented for real charging station demand prediction, and the value of data as a location selection factor was verified.

Machine Learning-Based Malicious URL Detection Technique (머신러닝 기반 악성 URL 탐지 기법)

  • Han, Chae-rim;Yun, Su-hyun;Han, Myeong-jin;Lee, Il-Gu
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.3
    • /
    • pp.555-564
    • /
    • 2022
  • Recently, cyberattacks are using hacking techniques utilizing intelligent and advanced malicious codes for non-face-to-face environments such as telecommuting, telemedicine, and automatic industrial facilities, and the damage is increasing. Traditional information protection systems, such as anti-virus, are a method of detecting known malicious URLs based on signature patterns, so unknown malicious URLs cannot be detected. In addition, the conventional static analysis-based malicious URL detection method is vulnerable to dynamic loading and cryptographic attacks. This study proposes a technique for efficiently detecting malicious URLs by dynamically learning malicious URL data. In the proposed detection technique, malicious codes are classified using machine learning-based feature selection algorithms, and the accuracy is improved by removing obfuscation elements after preprocessing using Weighted Euclidean Distance(WED). According to the experimental results, the proposed machine learning-based malicious URL detection technique shows an accuracy of 89.17%, which is improved by 2.82% compared to the conventional method.

CoReHA: conductivity reconstructor using harmonic algorithms for magnetic resonance electrical impedance tomography (MREIT)

  • Jeon, Ki-Wan;Lee, Chang-Ock;Kim, Hyung-Joong;Woo, Eung-Je;Seo, Jin-Keun
    • Journal of Biomedical Engineering Research
    • /
    • v.30 no.4
    • /
    • pp.279-287
    • /
    • 2009
  • Magnetic resonance electrical impedance tomography (MREIT) is a new medical imaging modality providing cross-sectional images of a conductivity distribution inside an electrically conducting object. MREIT has rapidly progressed in its theory, algorithm and experimental technique and now reached the stage of in vivo animal and human experiments. Conductivity image reconstructions in MREIT require various steps of carefully implemented numerical computations. To facilitate MREIT research, there is a pressing need for an MREIT software package with an efficient user interface. In this paper, we present an example of such a software, called CoReHA which stands for conductivity reconstructor using harmonic algorithms. It offers various computational tools including preprocessing of MREIT data, identification of boundary geometry, electrode modeling, meshing and implementation of the finite element method. Conductivity image reconstruction methods based on the harmonic $B_z$ algorithm are used to produce cross-sectional conductivity images. After summarizing basics of MREIT theory and experimental method, we describe technical details of each data processing task for conductivity image reconstructions. We pay attention to pitfalls and cautions in their numerical implementations. The presented software will be useful to researchers in the field of MREIT for simulation as well as experimental studies.

A Content-Based Image Classification using Neural Network (신경망을 이용한 내용기반 영상 분류)

  • 이재원;김상균
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.505-514
    • /
    • 2002
  • In this Paper, we propose a method of content-based image classification using neural network. The images for classification ate object images that can be divided into foreground and background. To deal with the object images efficiently, object region is extracted with a region segmentation technique in the preprocessing step. Features for the classification are texture and shape features extracted from wavelet transformed image. The neural network classifier is constructed with the extracted features and the back-propagation learning algorithm. Among the various texture features, the diagonal moment was more effective. A test with 300 training data and 300 test data composed of 10 images from each of 30 classes shows correct classification rates of 72.3% and 67%, respectively.

  • PDF

Color Component Analysis For Image Retrieval (이미지 검색을 위한 색상 성분 분석)

  • Choi, Young-Kwan;Choi, Chul;Park, Jang-Chun
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.403-410
    • /
    • 2004
  • Recently, studies of image analysis, as the preprocessing stage for medical image analysis or image retrieval, are actively carried out. This paper intends to propose a way of utilizing color components for image retrieval. For image retrieval, it is based on color components, and for analysis of color, CLCM (Color Level Co-occurrence Matrix) and statistical techniques are used. CLCM proposed in this paper is to project color components on 3D space through geometric rotate transform and then, to interpret distribution that is made from the spatial relationship. CLCM is 2D histogram that is made in color model, which is created through geometric rotate transform of a color model. In order to analyze it, a statistical technique is used. Like CLCM, GLCM (Gray Level Co-occurrence Matrix)[1] and Invariant Moment [2,3] use 2D distribution chart, which use basic statistical techniques in order to interpret 2D data. However, even though GLCM and Invariant Moment are optimized in each domain, it is impossible to perfectly interpret irregular data available on the spatial coordinates. That is, GLCM and Invariant Moment use only the basic statistical techniques so reliability of the extracted features is low. In order to interpret the spatial relationship and weight of data, this study has used Principal Component Analysis [4,5] that is used in multivariate statistics. In order to increase accuracy of data, it has proposed a way to project color components on 3D space, to rotate it and then, to extract features of data from all angles.

Applying Image Processing Algorithm to Raw LiDAR Data for Extracting Ground Information (LiDAR 원시자료에서의 지면정보 추출을 위한 영상처리기법 적용 연구)

  • Choi, Yun-Woong;Sohn, Duk-Jae;Cho, Gi-Sung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.27 no.5
    • /
    • pp.575-583
    • /
    • 2009
  • Various algorithms and methods, related to preprocessing of LiDAR data, are being developed and proposed. These methods are two ways, one of them is to use the regular form such as DSM or the image converted from raw LiDAR data, and the other is to use raw LiDAR data directly. The image processing method is one of representative method for the regular grid form data. This method is easy to apply to a numerical analysis technique and has an advantage of modeling and noise elimination through smoothing, but it lose the information during the data conversion. This study apply the image processing method to the irregular raw LiDAR data directly for the extracting ground information with minimized information loss and evaluate the extracting accuracy of ground information.

A Machine Learning Approach for Mechanical Motor Fault Diagnosis (기계적 모터 고장진단을 위한 머신러닝 기법)

  • Jung, Hoon;Kim, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.1
    • /
    • pp.57-64
    • /
    • 2017
  • In order to reduce damages to major railroad components, which have the potential to cause interruptions to railroad services and safety accidents and to generate unnecessary maintenance costs, the development of rolling stock maintenance technology is switching from preventive maintenance based on the inspection period to predictive maintenance technology, led by advanced countries. Furthermore, to enhance trust in accordance with the speedup of system and reduce maintenances cost simultaneously, the demand for fault diagnosis and prognostic health management technology is increasing. The objective of this paper is to propose a highly reliable learning model using various machine learning algorithms that can be applied to critical rolling stock components. This paper presents a model for railway rolling stock component fault diagnosis and conducts a mechanical failure diagnosis of motor components by applying the machine learning technique in order to ensure efficient maintenance support along with a data preprocessing plan for component fault diagnosis. This paper first defines a failure diagnosis model for rolling stock components. Function-based algorithms ANFIS and SMO were used as machine learning techniques for generating the failure diagnosis model. Two tree-based algorithms, RadomForest and CART, were also employed. In order to evaluate the performance of the algorithms to be used for diagnosing failures in motors as a critical railroad component, an experiment was carried out on 2 data sets with different classes (includes 6 classes and 3 class levels). According to the results of the experiment, the random forest algorithm, a tree-based machine learning technique, showed the best performance.