Search | Korea Science

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

Choi, JunHyeog;Jun, Sunghae
- Journal of the Korea Society of Computer and Information
- /
- v.21 no.8
- /
- pp.77-84
- /
- 2016
In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.
https://doi.org/10.9708/jksci.2016.21.8.077 인용 PDF KSCI

Identification of Incorrect Data Labels Using Conditional Outlier Detection

Hong, Charmgil
- Journal of Korea Multimedia Society
- /
- v.23 no.8
- /
- pp.915-926
- /
- 2020
Outlier detection methods help one to identify unusual instances in data that may correspond to erroneous, exceptional, or surprising events or behaviors. This work studies conditional outlier detection, a special instance of the outlier detection problem, in the context of incorrect data label identification. Unlike conventional (unconditional) outlier detection methods that seek abnormalities across all data attributes, conditional outlier detection assumes data are given in pairs of input (condition) and output (response or label). Accordingly, the goal of conditional outlier detection is to identify incorrect or unusual output assignments considering their input as condition. As a solution to conditional outlier detection, this paper proposes the ratio-based outlier scoring (ROS) approach and its variant. The propose solutions work by adopting conventional outlier scores and are able to apply them to identify conditional outliers in data. Experiments on synthetic and real-world image datasets are conducted to demonstrate the benefits and advantages of the proposed approaches.
https://doi.org/10.9717/kmms.2020.23.8.915 인용 PDF KSCI HTML

On the Efficiency of Outlier Cleaners in Spatial Data Analysis (공간통계분석에서 이상점 수정방법의 효율성비교)

이진희;신기일
- The Korean Journal of Applied Statistics
- /
- v.17 no.2
- /
- pp.327-336
- /
- 2004
Many researchers have used the robust variogram to reduce the effect of outliers in spatial data analysis. Recently it is known that estimating the variogram after replacing outliers is more efficient. In this paper, we suggest a new data cleaner for geostatistic data analysis and compare the efficiency of outlier cleaners.
https://doi.org/10.5351/KJAS.2004.17.2.327 인용 PDF KSCI

A Multiple Imputation for Reducing Outlier Effect (이상점 영향력 축소를 통한 무응답 대체법)

Kim, Man-Gyeom;Shin, Key-Il
- The Korean Journal of Applied Statistics
- /
- v.27 no.7
- /
- pp.1229-1241
- /
- 2014
Most of sampling surveys have outliers and non-response missing values simultaneously. In that case, due to the effect of outliers, the result of imputation is not good enough to meet a given precision. To overcome this situation, outlier treatment should be conducted before imputation. In this paper in order for reducing the effect of outlier, we study outlier imputation methods and outlier weight adjustment methods. For the outlier detection, the method suggested by She and Owen (2011) is used. A small simulation study is conducted and for real data analysis, Monthly Labor Statistic and Briquette Consumption Survey Data are used.
https://doi.org/10.5351/KJAS.2014.27.7.1229 인용 PDF KSCI

Outlier Detection in Time Series Monitoring Datasets using Rule Based and Correlation Analysis Method (규칙기반 및 상관분석 방법을 이용한 시계열 계측 데이터의 이상치 판정)

Jeon, Jesung;Koo, Jakap;Park, Changmok
- Journal of the Korean GEO-environmental Society
- /
- v.16 no.5
- /
- pp.43-53
- /
- 2015
In this study, detection methods of outlier in various monitoring data that fit into big data category were developed and outlier detections were conducted for both artificial data and real field monitoring data. Rule-based methods applied rate of change and probability of error for monitoring data are effective to detect a large-scale short faults and constant faults having no change within a certain period. There are however, problems with misjudgement that consider the normal data with a large scale variation as outlier caused by using independent single dataset. Rule-based methods for noise faults detection have a limit to application of real monitoring data due to the problem with a choice of proper window size of data and finding of threshold for outlier judgment. A correlation analysis among different two datasets were very effective to detect localized outlier and abnormal variation for short and long-term monitoring dataset if reasonable range of training data could be selected.
https://doi.org/10.14481/jkges.2015.16.5.43 인용 PDF KSCI

Development of Integrated Outlier Analysis System for Construction Monitoring Data (건설 계측 데이터에 대한 통합 이상치 분석 시스템 개발)

Jeon, Jesung
- Journal of the Korean GEO-environmental Society
- /
- v.21 no.5
- /
- pp.5-11
- /
- 2020
Outliers detection and elimination included in field monitoring datum are essential for effective foundation of unusual movement, long and short range forecast of stability and future behavior to various structures. Integrated outlier analysis system for assessing long term time series data was developed in this study. Outlier analysis could be conducted in two step of primary analysis targeted at single dataset and second multi datasets analysis using synthesis value. Integrated outlier analysis system presents basic information for evaluating stability and predicting movement of structure combined with real-time safety management platform. Field application results showed increased correlation between synthesis value including similar sort of sensor showing constant trend and each single dataset. Various monitoring data in case of showing different trend can be used to analyse outlier through correlation-weighted value.
https://doi.org/10.14481/jkges.2020.21.5.5 인용 PDF KSCI

TIME-VARIANT OUTLIER DETECTION METHOD ON GEOSENSOR NETWORKS

Kim, Dong-Phil;I, Gyeong-Min;Lee, Dong-Gyu;Ryu, Keun-Ho
- Proceedings of the KSRS Conference
- /
- 2008.10a
- /
- pp.410-413
- /
- 2008
Existing Outlier detections have been widely studied in geosensor networks. Recently, machine learning and data mining have been applied the outlier detection method to build a model that distinguishes outliers based on anchored criterion. However, it is difficult for the existing methods to detect outliers against incoming time-variant data, because outlier detection needs to monitor incoming data and classify irregular attacks. Therefore, in order to solve the problem, we propose a time-variant outlier detection using 2-dimensional grid method based on unanchored criterion. In the paper, outliers using geosensor data was performed to classify efficiently. The proposed method can be utilized applications such as network intrusion detection, stock market analysis, and error data detection in bank account.
PDF

Outlier detection in dental research (치의학 연구에서 이상치의 처리)

Kim, Ki-Yeol
- The Journal of the Korean dental association
- /
- v.55 no.9
- /
- pp.604-616
- /
- 2017
In clinical dental research, errors occur in spite of careful study design and conduct. Data cleaning procedures intend to identify and correct these errors or at least to minimize their influence on study. Outlier is the one of these errors. Outlier detection is the first step in data analysis process which has a serious effect in the field of dental research. Hence, this paper aims to introduce the methods to detect the outliers and to examine their influences in statistical data analysis.
PDF

Multiple Imputation Reducing Outlier Effect using Weight Adjustment Methods (가중치 보정을 이용한 다중대체법)

Kim, Jin-Young;Shin, Key-Il
- The Korean Journal of Applied Statistics
- /
- v.26 no.4
- /
- pp.635-647
- /
- 2013
Imputation is a commonly used method to handle missing survey data. The performance of the imputation method is influenced by various factors, especially an outlier. The removal of the outlier in a data set is a simple and effective approach to reduce the effect of an outlier. In this paper in order to improve the precision of multiple imputation, we study a imputation method which reduces the effect of outlier using various weight adjustment methods that include the removal of an outlier method. The regression method in PROC/MI in SAS is used for multiple imputation and the obtained final adjusted weight is used as a weight variable to obtain the imputed values. Simulation studies compared the performance of various weight adjustment methods and Monthly Labor Statistic data is used for real data analysis.
https://doi.org/10.5351/KJAS.2013.26.4.635 인용 PDF KSCI

Outlier prediction in sensor network data using periodic pattern (주기 패턴을 이용한 센서 네트워크 데이터의 이상치 예측)

Kim, Hyung-Il
- Journal of Sensor Science and Technology
- /
- v.15 no.6
- /
- pp.433-441
- /
- 2006
Because of the low power and low rate of a sensor network, outlier is frequently occurred in the time series data of sensor network. In this paper, we suggest periodic pattern analysis that is applied to the time series data of sensor network and predict outlier that exist in the time series data of sensor network. A periodic pattern is minimum period of time in which trend of values in data is appeared continuous and repeated. In this paper, a quantization and smoothing is applied to the time series data in order to analyze the periodic pattern and the fluctuation of each adjacent value in the smoothed data is measured to be modified to a simple data. Then, the periodic pattern is abstracted from the modified simple data, and the time series data is restructured according to the periods to produce periodic pattern data. In the experiment, the machine learning is applied to the periodic pattern data to predict outlier to see the results. The characteristics of analysis of the periodic pattern in this paper is not analyzing the periods according to the size of value of data but to analyze time periods according to the fluctuation of the value of data. Therefore analysis of periodic pattern is robust to outlier. Also it is possible to express values of time attribute as values in time period by restructuring the time series data into periodic pattern. Thus, it is possible to use time attribute even in the general machine learning algorithm in which the time series data is not possible to be learned.
https://doi.org/10.5369/JSST.2006.15.6.433 인용 PDF KSCI

Search Result 234, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)