• Title/Summary/Keyword: outlier detection method

Search Result 130, Processing Time 0.023 seconds

Outlier Detection Method for Mobile Banking with User Input Pattern and E-finance Transaction Pattern (사용자 입력 패턴 및 전자 금융 거래 패턴을 이용한 모바일 뱅킹 이상치 탐지 방법)

  • Min, Hee Yeon;Park, Jin Hyung;Lee, Dong Hoon;Kim, In Seok
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.157-170
    • /
    • 2014
  • As the increase of transaction using mobile banking continues, threat to the mobile financial security is also increasing. Mobile banking service performs the financial transaction using the dedicate application which is made by financial corporation. It provides the same services as the internet banking service. Personal information such as credit card number, which is stored in the mobile banking application can be used to the additional attack caused by a malicious attack or the loss of the mobile devices. Therefore, in this paper, to cope with the mobile financial accident caused by personal information exposure, we suggest outlier detection method which can judge whether the transaction is conducted by the appropriate user or not. This detection method utilizes the user's input patterns and transaction patterns when a user uses the banking service on the mobile devices. User's input and transaction pattern data involves the information which can be used to discern a certain user. Thus, if these data are utilized appropriately, they can be the information to distinguish abnormal transaction from the transaction done by the appropriate user. In this paper, we collect the data of user's input patterns on a smart phone for the experiment. And we use the experiment data which domestic financial corporation uses to detect outlier as the data of transaction pattern. We verify that our proposal can detect the abnormal transaction efficiently, as a result of detection experiment based on the collected input and transaction pattern data.

Developing data quality management algorithm for Hypertension Patients accompanied with Diabetes Mellitus By Data Mining (데이터 마이닝을 이용한 고혈압환자의 당뇨질환 동반에 관한 데이터 질 관리 알고리즘 개발)

  • Hwang, Kyu-Yeon;Lee, Eun-Sook;Kim, Go-Won;Hong, Sung-Ok;Park, Jong-Son;Kwak, Mi-Sook;Lee, Ye-Jin;Im, Chae-Hyuk;Park, Tae-Hyun;Park, Jong-Ho;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.14 no.7
    • /
    • pp.309-319
    • /
    • 2016
  • There is a need to develop a data quality management algorithm in order to improve the quality of health care data. In this study, we developed a data quality control algorithms associated diseases related to diabetes in patients with hypertension. To make a data quality algorithm, we extracted hypertension patients from 2011 and 2012 discharge damage survey data. As the result of developing Data quality management algorithm, significant factors in hypertension patients with diabetes are gender, age, Glomerular disorders in diabetes mellitus, Diabetic retinopathy, Diabetic polyneuropathy, Closed [percutaneous] [needle] biopsy of kidney. Depending on the decision tree results, we defined Outlier which was probability values associated with a patient having diabetes corporal with hypertension or more than 80%, or not more than 20%, and found six groups with extreme values for diabetes accompanying hypertension patients. Thus there is a need to check the actual data contained in the Outlier(extreme value) groups to improve the quality of the data.

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

Robust Estimation and Outlier Detection

  • Myung Geun Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.33-40
    • /
    • 1994
  • The conditional expectation of a random variable in a multivariate normal random vector is a multiple linear regression on its predecessors. Using this fact, the least median of squares estimation method developed in a multiple linear regression is adapted to a multivariate data to identify influential observations. The resulting method clearly detect outliers and it avoids the masking effect.

  • PDF

Development of the Financial Account Pre-screening System for Corporate Credit Evaluation (분식 적발을 위한 재무이상치 분석시스템 개발)

  • Roh, Tae-Hyup
    • The Journal of Information Systems
    • /
    • v.18 no.4
    • /
    • pp.41-57
    • /
    • 2009
  • Although financial information is a great influence upon determining of the group which use them, detection of management fraud and earning manipulation is a difficult task using normal audit procedures and corporate credit evaluation processes, due to the shortage of knowledge concerning the characteristics of management fraud, and the limitation of time and cost. These limitations suggest the need of systemic process for !he effective risk of earning manipulation for credit evaluators, external auditors, financial analysts, and regulators. Moot researches on management fraud have examined how various characteristics of the company's management features affect the occurrence of corporate fraud. This study examines financial characteristics of companies engaged in fraudulent financial reporting and suggests a model and system for detecting GAAP violations to improve reliability of accounting information and transparency of their management. Since the detection of management fraud has limited proven theory, this study used the detecting method of outlier(upper, and lower bound) financial ratio, as a real-field application. The strength of outlier detecting method is its use of easiness and understandability. In the suggested model, 14 variables of the 7 useful variable categories among the 76 financial ratio variables are examined through the distribution analysis as possible indicators of fraudulent financial statements accounts. The developed model from these variables show a 80.82% of hit ratio for the holdout sample. This model was developed as a financial outlier detecting system for a financial institution. External auditors, financial analysts, regulators, and other users of financial statements might use this model to pre-screen potential earnings manipulators in the credit evaluation system. Especially, this model will be helpful for the loan evaluators of financial institutes to decide more objective and effective credit ratings and to improve the quality of financial statements.

Damaged cable detection with statistical analysis, clustering, and deep learning models

  • Son, Hyesook;Yoon, Chanyoung;Kim, Yejin;Jang, Yun;Tran, Linh Viet;Kim, Seung-Eock;Kim, Dong Joo;Park, Jongwoong
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.17-28
    • /
    • 2022
  • The cable component of cable-stayed bridges is gradually impacted by weather conditions, vehicle loads, and material corrosion. The stayed cable is a critical load-carrying part that closely affects the operational stability of a cable-stayed bridge. Damaged cables might lead to the bridge collapse due to their tension capacity reduction. Thus, it is necessary to develop structural health monitoring (SHM) techniques that accurately identify damaged cables. In this work, a combinational identification method of three efficient techniques, including statistical analysis, clustering, and neural network models, is proposed to detect the damaged cable in a cable-stayed bridge. The measured dataset from the bridge was initially preprocessed to remove the outlier channels. Then, the theory and application of each technique for damage detection were introduced. In general, the statistical approach extracts the parameters representing the damage within time series, and the clustering approach identifies the outliers from the data signals as damaged members, while the deep learning approach uses the nonlinear data dependencies in SHM for the training model. The performance of these approaches in classifying the damaged cable was assessed, and the combinational identification method was obtained using the voting ensemble. Finally, the combination method was compared with an existing outlier detection algorithm, support vector machines (SVM). The results demonstrate that the proposed method is robust and provides higher accuracy for the damaged cable detection in the cable-stayed bridge.

A Novel Network Anomaly Detection Method based on Data Balancing and Recursive Feature Addition

  • Liu, Xinqian;Ren, Jiadong;He, Haitao;Wang, Qian;Sun, Shengting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.3093-3115
    • /
    • 2020
  • Network anomaly detection system plays an essential role in detecting network anomaly and ensuring network security. Anomaly detection system based machine learning has become an increasingly popular solution. However, due to the unbalance and high-dimension characteristics of network traffic, the existing methods unable to achieve the excellent performance of high accuracy and low false alarm rate. To address this problem, a new network anomaly detection method based on data balancing and recursive feature addition is proposed. Firstly, data balancing algorithm based on improved KNN outlier detection is designed to select part respective data on each category. Combination optimization about parameters of improved KNN outlier detection is implemented by genetic algorithm. Next, recursive feature addition algorithm based on correlation analysis is proposed to select effective features, in which a cross contingency test is utilized to analyze correlation and obtain a features subset with a strong correlation. Then, random forests model is as the classification model to detection anomaly. Finally, the proposed algorithm is evaluated on benchmark datasets KDD Cup 1999 and UNSW_NB15. The result illustrates the proposed strategies enhance accuracy and recall, and decrease the false alarm rate. Compared with other algorithms, this algorithm still achieves significant effects, especially recall in the small category.

An outlier weight adjustment using generalized ratio-cum-product method for two phase sampling (이중추출법에서 일반화 ratio-cum-product 방법을 이용한 이상점 가중치 보정법)

  • Oh, Jung-Taek;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1185-1199
    • /
    • 2016
  • Two phase sampling (double sampling) is often used when there is inadequate population information for proper stratification. Many recent papers have been devoted to the estimation method to improve the precision of the estimator using first phase information. In this study we suggested outlier weight adjustment methods to improve estimation precision based on the weight of the generalized ratio-cum-product estimator. Small simulation studies are conducted to compare the suggested methods and the usual method. Real data analysis is also performed.

Loop-Filtering for Reducing Comer outlier (모서리 잡음 제거를 위한 Loop 필터링 기법)

  • 홍윤표;전병우
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.5
    • /
    • pp.217-223
    • /
    • 2004
  • In block-based lossy video compression, severe quantization causes discontinuities along block boundaries so that annoying blocking artifacts are visible in decoded video images. These blocking artifacts significantly decrease the subjective image quality. In order to reduce the blocking artifacts in decoded images, many algorithms have been proposed. However studies on so called comer outlier, have been very limited. Corner outliers make image edges look disconnected from those of neighboring blocks at cross block boundary. In order to solve this problem we propose a corner outlier detection and compensation algorithm as loop-filtering in spatial domain. Experiment results show that the proposed method provides much improved subjective image quality.

Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model (VAE(Variational AutoEncoder) 기반 머신러닝 모델을 활용한 체중 라이프로그 이상탐지에 관한 연구)

  • Kim, Jiyong;Park, Minseo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.91-98
    • /
    • 2022
  • Lifelog data continuously collected through a wearable device may contain many outliers, so in order to improve data quality, it is necessary to find and remove outliers. In general, since the number of outliers is less than the number of normal data, a class imbalance problem occurs. To solve this imbalance problem, we propose a method that applies Variational AutoEncoder to outliers. After preprocessing the outlier data with proposed method, it is verified through a number of machine learning models(classification). As a result of verification using body weight data, it was confirmed that the performance was improved in all classification models. Based on the experimental results, when analyzing lifelog body weight data, we propose to apply the LightGBM model with the best performance after preprocessing the data using the outlier processing method proposed in this study.