• Title/Summary/Keyword: outlier discrimination

Search Result 6, Processing Time 0.023 seconds

Building the Outlier Candidate Discrimination Training Data based on Inventory for Automatic Classification of Transferred Records (이관 기록물 분류 자동화를 위한 목록 기반 이상치 판별 학습데이터 구축)

  • Jeong, Ji-Hye;Lee, Gemma;Wang, Hosung;Oh, Hyo-Jung
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.22 no.1
    • /
    • pp.43-59
    • /
    • 2022
  • Electronic public records are classified simultaneously as production, a preservation period is granted, and after a certain period, they are transferred to an archive and preserved. This study intends to find a way to improve the efficiency in classifying transferred records and maintain consistent standards. To this end, the current record classification work process carried out by the National Archives of Korea was analyzed, and problems were identified. As a way to minimize the manual work of record classification by converging the required improvement, the process of identifying outlier candidates based on a list consisting of classified information of the transferred records was proposed and systemized. Furthermore, the proposed outlier discrimination process was applied to the actual records transferred to the National Archives of Korea. The results were standardized and constructed as a training data format that can be used for machine learning in the future.

Robust Optical Flow Detection Using 2D Histogram with Variable Resolution (가변 분해능을 가진 2차원 히스토그램을 이용한 강건한 광류검출)

  • CHON Jaechoon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.23 no.1
    • /
    • pp.49-57
    • /
    • 2005
  • The proposed algorithm is to achieve the robust optical flow detection which is applicable for the case that the outlier rate is over 80%. If the outlier rate of optical flows is over 30%, the discrimination between the inliers and outlier with the conventional algorithm is very difficult. The proposed algorithm is to overcome such difficulty with three steps of grouping algorithm; 1) constructing the 2D histogram with two axies of the lengths and the directions of optical flows. 2) sorting the number of optical flows in each bin of the two-dimensional histogram in the descending order and removing some bins with lower number of optical flows than threshold. 3) increasing the resolution of the two-dimensional histogram if the number of optical flows in a specific bin is over 20% and decreasing the resolution if the number of optical flows is less than 10%. Such processing is repeated until the number of optical flows falls into the range of 10%-20% in all the bins. The proposed algorithm works well on the different kinds of images with many of wrong optical flows. Experimental results are included.

Robust Optical Flow Detection Using 2D histogram with Variable Resolution (가변 분해능을 가진 2차원 히스토그램을 이용한 강건한 광류인식)

  • CHON Jaechoon;KIM Hyongsuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.3 s.303
    • /
    • pp.51-64
    • /
    • 2005
  • The proposed algorithm is to achieve the robust optical flow detection which is applicable for the case that the outlier rate is over $80\%$. If the outlier rate of optical flows is over $30\%$, the discrimination between the inliers and outlier with the conventional algorithm is very difficult. The proposed algorithm is to overcome such difficulty withthree steps of grouping algorithm; 1) constructing the 2 D histogram with two axies of the lengths and the directions of optical flows. 2) sorting the number of optical flows in each bin of the two-dimensional histogram in the descendingorder and removing some bins with lower number of optical flows than threshold 3) increasing the resolution of the two-dimensional histogram if the number of optical flows in a specific bin is over $20\%$ and decreasing theresolution if the number of optical flows is less than $10\%$. Such processing is repeated until the the number of optical flows falls into the range of $10\%-20\%$ in all the bins. The proposed algorithm works well on the different kinds of images with many of wrong optical flows. Experimental results are included.

A study on the difference and calibration of empirical influence function and sample influence function (경험적 영향함수와 표본영향함수의 차이 및 보정에 관한 연구)

  • Kang, Hyunseok;Kim, Honggie
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.527-540
    • /
    • 2020
  • While analyzing data, researching outliers, which are out of the main tendency, is as important as researching data that follow the general tendency. In this study we discuss the influence function for outlier discrimination. We derive sample influence functions of sample mean, sample variance, and sample standard deviation, which were not directly derived in previous research. The results enable us to mathematically examine the relationship between the empirical influence function and sample influence function. We can also consider a method to approximate the sample influence function by the empirical influence function. Also, the validity of the relationship between the approximated sample influence function and the empirical influence function is also verified by the simulation of random sampled data in normal distribution. As the result of a simulation, both the relationship between the two influence functions, sample and empirical, and the method of approximating the sample influence function through the emperical influence function were verified. This research has significance in proposing a method that reduces errors in the approximation of the empirical influence function and in proposing an effective and practical method that proceeds from previous research that approximates the sample influence function directly through empirical influence function by constant revision.

Design of Heuristic Decision Tree (HDT) Using Human Knowledge (인간 지식을 이용한 경험적 의사결정트리의 설계)

  • Yoon, Tae-Tok;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.525-531
    • /
    • 2009
  • Data mining is the process of extracting hidden patterns from collected data. At this time, for collected data which take important role as the basic information for prediction and recommendation, the process to discriminate incorrect data in order to enhance the performance of analysis result, is needed. The existing methods to discriminate unexpected data from collected data, mainly relies on methods which are based on statistics or simple distance between data. However, for these methods, the problematic point that even meaningful data could be excluded from analysis due that the environment and characteristic of the relevant data are not considered, exists. This study proposes a method to endow human heuristic knowledge with weight value through the comparison between collected data and human heuristic knowledge, and to use the value for creating a decision tree. The data discrimination by the method proposed is more credible as human knowledge is reflected in the created tree. The validity of the proposed method is verified through an experiment.

Comparative Analysis of Anomaly Detection Models using AE and Suggestion of Criteria for Determining Outliers

  • Kang, Gun-Ha;Sohn, Jung-Mo;Sim, Gun-Wu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.23-30
    • /
    • 2021
  • In this study, we present a comparative analysis of major autoencoder(AE)-based anomaly detection methods for quality determination in the manufacturing process and a new anomaly discrimination criterion. Due to the characteristics of manufacturing site, anomalous instances are few and their types greatly vary. These properties degrade the performance of an AI-based anomaly detection model using the dataset for both normal and anomalous cases, and incur a lot of time and costs in obtaining additional data for performance improvement. To solve this problem, the studies on AE-based models such as AE and VAE are underway, which perform anomaly detection using only normal data. In this work, based on Convolutional AE, VAE, and Dilated VAE models, statistics on residual images, MSE, and information entropy were selected as outlier discriminant criteria to compare and analyze the performance of each model. In particular, the range value applied to the Convolutional AE model showed the best performance with AUC PRC 0.9570, F1 Score 0.8812 and AUC ROC 0.9548, accuracy 87.60%. This shows a performance improvement of an accuracy about 20%P(Percentage Point) compared to MSE, which was frequently used as a standard for determining outliers, and confirmed that model performance can be improved according to the criteria for determining outliers.