• Title/Summary/Keyword: Multivariate Outliers

Search Result 39, Processing Time 0.023 seconds

The Effects of Physical Environment in Coffee Shops on Customer Brand Loyalty: With a Focus on the Comparison between Mediating Effects of Customer Satisfaction and Emotional Responses (커피전문점의 물리적 환경이 브랜드 충성도에 미치는 영향: 고객만족과 감정 반응의 매개 효과 비교를 중심으로)

  • Kim, Su-Jin;Lee, Hyung-Ryong
    • Journal of the East Asian Society of Dietary Life
    • /
    • v.21 no.4
    • /
    • pp.609-624
    • /
    • 2011
  • The purpose of this study was to examine the physical environmental factors in coffee shops which determine customer brand loyalty, and to investigate the mediated effects of customer satisfaction and emotional responses on the causal relationship between the physical environmental factors and brand loyalty. A sample of 400 coffee shop customers was collected from Seoul and Gyeonggi in March, 2011 through a self-administered questionnaire. 351 of 400 subjects were used for validity and reliability analysis. 12 outliers were removed from the analysis, and 339 subjects were used to derive the results. Multiple linear regression and stepwise regression were conducted after the construct validity and reliability. The results can be summarized as follows: (1) Physical environmental factors in coffee shops consists of 5 dimensions such as facility aesthetics, cleanliness, ambiance, layout, and internet environment. (2) Facility aesthetics, ambiance, and internet environment had an influence on brand loyalty. (3) The effects of cleanliness and layout on brand loyalty, were not significant on multivariate analysis. However, the relationship between cleanliness and brand loyalty was mediated by emotional responses and also the relationship between layout and brand loyalty was mediated by customer satisfaction. (4) The mediating effects of customer satisfaction were higher than those of emotional responses.

Prediction of High Level Ozone Concentration in Seoul by Using Multivariate Statistical Analyses (다변량 통계분석을 이용한 서울시 고농도 오존의 예측에 관한 연구)

  • 허정숙;김동술
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.9 no.3
    • /
    • pp.207-215
    • /
    • 1993
  • In order to statistically predict $O_3$ levels in Seoul, the study used the TMS (telemeted air monitoring system) data from the Department of Environment, which have monitored at 20 sites in 1989 and 1990. Each data in each site was characterized by 6 major criteria pollutants ($SO_2, TSP, CO, NO_2, THC, and O_3$) and 2 meteorological parameters, such as wind speed and wind direction. To select proper variables and to determine each pollutant's behavior, univariate statistical analyses were extensively studied in the beginning, and then various applied statistical techniques like cluster analysis, regression analysis, and expert system have been intensively examined. For the initial study of high level $O_3$ prediction, the raw data set in each site was separated into 2 group based on 60 ppb $O_3$ level. A hierarchical cluster analysis was applied to classify the group based on 60 ppb $O_3$ into small calsses. Each class in each site has its own pattern. Next, multiple regression for each class was repeatedly applied to determine an $O_3$ prediction submodel and to determine outliers in each class based on a certain level of standardized redisual. Thus, a prediction submodel for each homogeneous class could be obtained. The study was extended to model $O_3$ prediction for both on-time basis and 1-hr after basis. Finally, an expect system was used to build a unified classification rule based on examples of the homogenous classes for all of sites. Thus, a concept of high level $O_3$ prediction model was developed for one of $O_3$ alert systems.

  • PDF

The Use of Local Outlier Factor(LOF) for Improving Performance of Independent Component Analysis(ICA) based Statistical Process Control(SPC) (LOF를 이용한 ICA 기반 통계적 공정관리의 성능 개선 방법론)

  • Lee, Jae-Shin;Kang, Bok-Young;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.36 no.1
    • /
    • pp.39-55
    • /
    • 2011
  • Process monitoring has been emphasized for the monitoring of complex system such as chemical processing industries to achieve the efficiency enhancement, quality management, safety improvement. Recently, ICA (Independent Component Analysis) based MSPC (Multivariate Statistical Process Control) was widely used in process monitoring approaches. Moreover, DICA (Dynamic ICA) has been introduced to consider the system dynamics. However, the existing approaches show the limitation that their performances are strongly dependent on the statistical distributions of control variables. To improve the limitation, we propose a novel approach for process monitoring by integrating DICA and LOF (Local Outlier Factor). In this paper, we aim to improve the fault detection rate with the proposed method. LOF detects local outliers by using density of surrounding space so that its performance is regardless of data distribution. Therefore, the proposed method not only can consider the system dynamics but can also assure robust performance regardless of the statistical distributions of control variables. Comparison experiments were conducted on the widely used benchmark dataset, Tennessee Eastman process (TE process), and showed the improved performance than existing approaches.

AGE ESTIMATION TECHNIQUE OF INDUSTRIALIZED TIMBER PLANTATION USING VARIOUS REMOTE SENSING DATA

  • Kim, Jong-Hong;Heo, Joon;Park, Ji-Sang
    • Proceedings of the KSRS Conference
    • /
    • v.1
    • /
    • pp.94-97
    • /
    • 2006
  • Timber stand age information of timber in industrialized plantation forest is generally collected by field surveying which is labor-intensive, time-consuming, and very costly. It is also inconsistent in analyses perspective. As an alternative, The objective of this research is to present a practical solution for estimating timber age of loblolly pine plantation using Landsat thematic mapper (TM) images, shuttle radar topography mission (SRTM), and national elevation dataset (NED). A multivariate regression model was developed based upon satellite image-based information (i.e.normalized difference vegetation index (NDVI), tasseled cap (TC) transformation, and derived tree heights). A residual studentized technique was applied to remove potential outliers. After that, a refined age estimation model with a correlation coefficient R-square of 84.6% was obtained. Finally, the feasibility test of estimated model was performed by comparing estimated and measured stand ages of timber plantations using test datasets of plantation stands (2,032 stands). The result shows that the proposed method of this study can estimate loblolly pine stand age within an error of $2{\sim}3$ years in an effective and consistent way in terms of time and cost.

  • PDF

The Air Quality Analysis in Underground Shopping Centers Using Pattern Recognition (Pattern Recognition을 이용한 지하상가에서의 대기오염물질의 농도 분석에 관한 연구)

  • 김동술;김형석
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.6 no.1
    • /
    • pp.1-10
    • /
    • 1990
  • The purpose of the study was to analyze air quality in underground shopping centers using pattern recognition methods. In order to perform this, the concentraion of air pollutants such as $CO, NO_2, NO_x, SO_2$, and particulate matters was measured at the 11 different shopping centers in Seoul metropolitan area and the total of 47 samples were obtained at random based on the size of shopping centers. To introduce a new concept of the "average concentration" for the indoor air quality analyses, the various multivariate statistical analyses have been studied. Thus, a cluster analysis was applied to separate the samples into pseudo-patterns and a disjoint principal component analysis was used to generate homogeneous patterns after removing outliers from the pseudo-patterns. The 6 homogeneous patterns were then obtained as follows:the first pattern was a group of clean sites;the second a group of sites having high dust concentration;the third a group of sites having high dust and $NO_x$ concentration;the fourth a group of sites having low dust and $SO_2$ concentraion and high CO concentration;the fifth a group of sites having high $NO_2 and SO_2$ concentration;and the final a group of miscellaneous sites. Thus, the average concentration could be estimated for each pattern.h pattern.

  • PDF

HIERARCHICAL CLUSTER ANALYSIS by arboART NEURAL NETWORKS and its APPLICATION to KANSEI EVALUATION DATA ANALYSIS

  • Ishihara, Shigekazu;Ishihara, Keiko;Nagamachi, Mitsuo
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2002.05a
    • /
    • pp.195-200
    • /
    • 2002
  • ART (Adaptive Resonance Theory [1]) neural network and its variations perform non-hierarchical clustering by unsupervised learning. We propose a scheme "arboART" for hierarchical clustering by using several ART1.5-SSS networks. It classifies multidimensional vectors as a cluster tree, and finds features of clusters. The Basic idea of arboART is to use the prototype formed in an ART network as an input to other ART network that has looser distance criteria (Ishihara, et al., [2,3]). By sending prototype vectors made by ART to one after another, many small categories are combined into larger and more generalized categories. We can draw a dendrogram using classification records of sample and categories. We have confirmed its ability using standard test data commonly used in pattern recognition community. The clustering result is better than traditional computing methods, on separation of outliers, smaller error (diameter) of clusters and causes no chaining. This methodology is applied to Kansei evaluation experiment data analysis.

  • PDF

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

Characterization of Korean Archaeological Artifacts by Neutron Activation Analysis (I). Multivariate Classification of Korean Ancient Coins. (중성자 방사화분석에 의한 한국산 고고학적 유물의 특성화 연구 (I). 다변량 해석법에 의한 고전 (古錢) 의 분류 연구)

  • Chul Lee;Oh Cheun Kwun;Hyung Tae Kang;Ihn Chong Lee;Nak Bae Kim
    • Journal of the Korean Chemical Society
    • /
    • v.31 no.6
    • /
    • pp.555-566
    • /
    • 1987
  • Fifty ancient Korean coins originated in Yi Dynasty have been determined for 9 elements such as Sn, Fe, As, Ag, Co, Sb, Ir, Ru and Ni by instrumental neutron activation analysis and for 3 elements such as Cu, Pb, and Zn by atomic absorption spectrometry. Bronze coins originated in early days of the dynasty contain as major constituents Cu, Pb and Sn approximately in the ratio 90 : 4 : 3, whereas, those in latter days contain in ratio 7 : 2 : 0. Brass coins which had begun in 17 century contain as major constituents Cu, Zn and Pb approximately in the ratio 7 : 1 : 1. The multivariate data have been analyzed for the relation among elemental contents through the variance-covariance matrix. The data have been further analyzed by a principal component mapping method. As the results training set of 8 class have been chosen, based on the spread of sample points in an eigen vector plot and archaeological data such as age and the office of minting. The training set and test set of samples have finally been analyzed for the assignment to certain classes or outliers through the statistical isolinear multiple component analysis (SIMCA).

  • PDF

A Multivariate Analysis of Korean Professional Players Salary (한국 프로스포츠 선수들의 연봉에 대한 다변량적 분석)

  • Song, Jong-Woo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.441-453
    • /
    • 2008
  • We analyzed Korean professional basketball and baseball players salary under the assumption that it depends on the personal records and contribution to the team in the previous year. We extensively used data visualization tools to check the relationship among the variables, to find outliers and to do model diagnostics. We used multiple linear regression and regression tree to fit the model and used cross-validation to find an optimal model. We check the relationship between variables carefully and chose a set of variables for the stepwise regression instead of using all variables. We found that points per game, number of assists, number of free throw successes, career are important variables for the basketball players. For the baseball pitchers, career, number of strike-outs per 9 innings, ERA, number of homeruns are important variables. For the baseball hitters, career, number of hits, FA are important variables.