• Title/Summary/Keyword: Outlier model

Search Result 213, Processing Time 0.033 seconds

Improvements in Speaker Adaptation Using Weighted Training (가중 훈련을 이용한 화자 적응 시스템의 향상)

  • 장규철;우수영;진민호;박용규;유창동
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3
    • /
    • pp.188-193
    • /
    • 2003
  • Regardless of the distribution of the adaptation data in the testing environment, model-based adaptation methods that have so far been reported in various literature incorporates the adaptation data undiscriminatingly in reducing the mismatch between the training and testing environments. When the amount of data is small and the parameter tying is extensive, adaptation based on outlier data can be detrimental to the performance of the recognizer. The distribution of the adaptation data plays a critical role on the adaptation performance. In order to maximally improve the recognition rate in the testing environment using only a small number of adaptation data, supervised weighted training is applied to the structural maximum a posterior (SMAP) algorithm. We evaluate the performance of the proposed weighted SMAP (WSMAP) and SMAP on TIDIGITS corpus. The proposed WSMAP has been found to perform better for a small amount of data. The general idea of incorporating the distribution of the adaptation data is applicable to other adaptation algorithms.

An Application of Support Vector Machines to Customer Loyalty Classification of Korean Retailing Company Using R Language

  • Nguyen, Phu-Thien;Lee, Young-Chan
    • The Journal of Information Systems
    • /
    • v.26 no.4
    • /
    • pp.17-37
    • /
    • 2017
  • Purpose Customer Loyalty is the most important factor of customer relationship management (CRM). Especially in retailing industry, where customers have many options of where to spend their money. Classifying loyal customers through customers' data can help retailing companies build more efficient marketing strategies and gain competitive advantages. This study aims to construct classification models of distinguishing the loyal customers within a Korean retailing company using data mining techniques with R language. Design/methodology/approach In order to classify retailing customers, we used combination of support vector machines (SVMs) and other classification algorithms of machine learning (ML) with the support of recursive feature elimination (RFE). In particular, we first clean the dataset to remove outlier and impute the missing value. Then we used a RFE framework for electing most significant predictors. Finally, we construct models with classification algorithms, tune the best parameters and compare the performances among them. Findings The results reveal that ML classification techniques can work well with CRM data in Korean retailing industry. Moreover, customer loyalty is impacted by not only unique factor such as net promoter score but also other purchase habits such as expensive goods preferring or multi-branch visiting and so on. We also prove that with retailing customer's dataset the model constructed by SVMs algorithm has given better performance than others. We expect that the models in this study can be used by other retailing companies to classify their customers, then they can focus on giving services to these potential vip group. We also hope that the results of this ML algorithm using R language could be useful to other researchers for selecting appropriate ML algorithms.

Selective Histogram Matching of Multi-temporal High Resolution Satellite Images Considering Shadow Effects in Urban Area (도심지역의 그림자 영향을 고려한 다시기 고해상도 위성영상의 선택적 히스토그램 매칭)

  • Yeom, Jun-Ho;Kim, Yong-Il
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.20 no.2
    • /
    • pp.47-54
    • /
    • 2012
  • Additional high resolution satellite images, other period or site, are essential for efficient city modeling and analysis. However, the same ground objects have a radiometric inconsistency in different satellite images and it debase the quality of image processing and analysis. Moreover, in an urban area, buildings, trees, bridges, and other artificial objects cause shadow effects, which lower the performance of relative radiometric normalization. Therefore, in this study, we exclude shadow areas and suggest the selective histogram matching methods for image based application without supplementary digital elevation model or geometric informations of sun and sensor. We extract the shadow objects first using adjacency informations with the building edge buffer and spatial and spectral attributes derived from the image segmentation. And, Outlier objects like a asphalt roads are removed. Finally, selective histogram matching is performed from the shadow masked multi-temporal Quickbird-2 images.

Robust Parameter Estimation using Fuzzy RANSAC (퍼지 RANSAC을 이용한 강건한 인수 예측)

  • Lee Joong-Jae;Jang Hyo-Jong;Kim Gye-Young;Choi Hyung-il
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.2
    • /
    • pp.252-266
    • /
    • 2006
  • Many problems in computer vision are mainly based on mathematical models. Their optimal solutions can be found by estimating the parameters of each model. However, provided an input data set is involved outliers which are relative]V larger than normal noises, they lead to incorrect results. RANSAC is a representative robust algorithm which is used to resolve the problem. One major problem with RANSAC is that it needs priori knowledge(i.e. a percentage of outliers) of the distribution of data. To solve this problem, we propose a FRANSAC algorithm which improves the rejection rate of outliers and the accuracy of solutions. This is peformed by categorizing all data into good sample set, bad sample set and vague sample set using a fuzzy classification at each iteration and sampling in only good sample set. In the experimental results, we show that the performance of the proposed algorithm when it is applied to the linear regression and the calculation of a homography.

Automatic generation of reliable DEM using DTED level 2 data from high resolution satellite images (고해상도 위성영상과 기존 수치표고모델을 이용하여 신뢰성이 향상된 수치표고모델의 자동 생성)

  • Lee, Tae-Yoon;Jung, Jae-Hoon;Kim, Tae-Jung
    • Spatial Information Research
    • /
    • v.16 no.2
    • /
    • pp.193-206
    • /
    • 2008
  • If stereo images is used for Digital Elevation Model (DEM) generation, a DEM is generally made by matching left image against right image from stereo images. In stereo matching, tie-points are used as initial match candidate points. The number and distribution of tie-points influence the matching result. DEM made from matching result has errors such as holes, peaks, etc. These errors are usually interpolated by neighbored pixel values. In this paper, we propose the DEM generation method combined with automatic tie-points extraction using existing DEM, image pyramid, and interpolating new DEM using existing DEM for more reliable DEM. For test, we used IKONOS, QuickBird, SPOT5 stereo images and a DTED level 2 data. The test results show that the proposed method automatically makes reliable DEMs. For DEM validation, we compared heights of DEM by proposed method with height of existing DTED level 2 data. In comparison result, RMSE was under than 15 m.

  • PDF

Outlier-Object Detection Using an Image Pair Based on Regression Analysis: Noise Variance Estimation and Performance Analysis (영상 쌍에서 회귀분석에 기초한 이상 물체 검출: 잡음분산의 추정과 성능 분석)

  • Kim, Dong-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.5
    • /
    • pp.25-34
    • /
    • 2008
  • By comparing two images, which are captured with the same scene at different time, we can detect a set of outliers, such as occluding objects due to moving vehicles. To reduce the influence from the different intensity properties of the images, an intensity compensation scheme, which is based on the polynomial regression model, is employed. For an accurate detection of outliers alleviating the influence from a set of outliers, a simple technique that reruns the regression is employed. In this paper, an algorithm that iteratively reruns the regression is theoretically analyzed by observing the convergence property of the estimates of the noise variance. Using a correction constant for the estimate of the noise variance is proposed. The correction enables the detection algorithm robust to the choice of thresholds for selecting outliers. Numerical analysis using both synthetic and Teal images are also shown in this paper to show the robust performance of the detection algorithm.

An Outlier Data Analysis using Support Vector Regression (Support Vector Regression을 이용한 이상치 데이터분석)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.876-880
    • /
    • 2008
  • Outliers are the observations which are very larger or smaller than most observations in the given data set. These are shown by some sources. The result of the analysis with outliers may be depended on them. In general, we do data analysis after removing outliers. But, in data mining applications such as fraud detection and intrusion detection, outliers are included in training data because they have crucial information. In regression models, simple and multiple regression models need to eliminate outliers from given training data by standadized and studentized residuals to construct good model. In this paper, we use support vector regression(SVR) based on statistical teaming theory to analyze data with outliers in regression. We verify the improved performance of our work by the experiment using synthetic data sets.

A Maximum Power Demand Prediction Method by Average Filter Combination (평균필터 조합을 통한 최대수요전력 예측기법)

  • Yu, Chan-Jik;Kim, Jae-Sung;Roh, Kyung-Woo;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.227-239
    • /
    • 2020
  • This paper introduces a method for predicting the maximum power demand despite communication errors in industrial sites. Due to the recent policy of de-nuclearization in Korea, the price of electricity is inevitable, and the amount of electricity used and maximum load management for the management of power demand are becoming important issues. Accordingly, it is important to predict and manage peak power. However, problems such as loss and modulation of measured power data occur at industrial sites due to noise generated by various facilities and sensors. It is difficult to predict the exact value when measured effective power data are lost. The study presents a model for predicting and correcting anomalies and missing values when measured effective power data are lost. The models used in this study are expected to be useful in predicting peak power demand in the event of communication errors at industrial sites.

Design of Anomaly Detection System Based on Big Data in Internet of Things (빅데이터 기반의 IoT 이상 장애 탐지 시스템 설계)

  • Na, Sung Il;Kim, Hyoung Joong
    • Journal of Digital Contents Society
    • /
    • v.19 no.2
    • /
    • pp.377-383
    • /
    • 2018
  • Internet of Things (IoT) is producing various data as the smart environment comes. The IoT data collection is used as important data to judge systems's status. Therefore, it is important to monitor the anomaly state of the sensor in real-time and to detect anomaly data. However, it is necessary to convert the IoT data into a normalized data structure for anomaly detection because of the variety of data structures and protocols. Thus, we can expect a good quality effect such as accurate analysis data quality and service quality. In this paper, we propose an anomaly detection system based on big data from collected sensor data. The proposed system is applied to ensure anomaly detection and keep data quality. In addition, we applied the machine learning model of support vector machine using anomaly detection based on time-series data. As a result, machine learning using preprocessed data was able to accurately detect and predict anomaly.

Development of Travel Time Estimation Algorithm for National Highway by using Self-Organizing Neural Networks (자기조직형 신경망 이론을 이용한 국도 통행시간 추정 알고리즘)

  • Do, Myungsik;Bae, Hyunesook
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.3D
    • /
    • pp.307-315
    • /
    • 2008
  • The aim of this study is to develop travel time estimation model by using Self-Organized Neural network(in brief, SON) algorithm. Travel time data based on vehicles equipped with GPS and number-plate matching collected from National road number 3 (between Jangji-IC and Gonjiam-IC), which is pilot section of National Highway Traffic Management System were employed. We found that the accuracies of travel time are related to location of detector, the length of road section and land-use properties. In this paper, we try to develop travel time estimation using SON to remedy defects of existing neural network method, which could not additional learning and efficient structure modification. Furthermore, we knew that the estimation accuracy of travel time is superior to optimum located detectors than based on existing located detectors. We can expect the results of this study will make use of location allocation of detectors in highway.