• Title/Summary/Keyword: statistics based method

Search Result 2,144, Processing Time 0.034 seconds

K-means Clustering using a Grid-based Representatives

  • Park, Hee-Chang;Lee, Sun-Myung
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.229-238
    • /
    • 2003
  • K-means clustering has been widely used in many applications, such that pattern analysis, data analysis, market research and so on. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters, because it is more primitive and explorative. In this paper we propose a new method of k-means clustering using the grid-based representative value(arithmetic and trimmed mean) for sample. It is more fast than any traditional clustering method and maintains its accuracy.

  • PDF

Validation Comparison of Credit Rating Models Using Box-Cox Transformation

  • Hong, Chong-Sun;Choi, Jeong-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.789-800
    • /
    • 2008
  • Current credit evaluation models based on financial data make use of smoothing estimated default ratios which are transformed from each financial variable. In this work, some problems of the credit evaluation models developed by financial experts are discussed and we propose improved credit evaluation models based on the stepwise variable selection method and Box-Cox transformed data whose distribution is much skewed to the right. After comparing goodness-of-fit tests of these models, the validation of the credit evaluation models using statistical methods such as the stepwise variable selection method and Box-Cox transformation function is explained.

  • PDF

Clustering Algorithm by Grid-based Sampling

  • Park, Hee-Chang;Ryu, Jee-Hyun;Lee, Sung-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.535-543
    • /
    • 2003
  • Cluster analysis has been widely used in many applications, such as pattern analysis or recognition, data analysis, image processing, market research on on-line or off-line and so on. Clustering can identify dense and sparse regions among data attributes or object attributes. But it requires many hours to get clusters that we want, because clustering is more primitive, explorative and we make many data an object of cluster analysis. In this paper we propose a new method of clustering using sample based on grid. It is more fast than any traditional clustering method and maintains its accuracy.

  • PDF

Method for Evaluating Optimal Air Monitoring Sites for SO2 in Ulsan (울산광역시 아황산가스(SO2)의 최적관측소 평가방법)

  • Lim, Junghyun;Yoon, Sanghoo
    • Journal of Environmental Science International
    • /
    • v.26 no.9
    • /
    • pp.1073-1080
    • /
    • 2017
  • Manufacturing and technology industries produce large amounts of air pollutants. Ulsan Metropolitan City, South Korea, is well-known for its large industrial complexes; in particular, the concentration of $SO_2$ here is the highest in the country. We assessed $SO_2$ monitoring sites based on conditional and joint entropy, because this is a common method for determining an optimal air monitoring network. Monthly $SO_2$ concentrations from 12 air monitoring sites were collected, and the distribution of spatial locations was determined by kriging. Mean absolute error, Root Mean Squared Error (RMSE), bias and correlation coefficients were employed to evaluate the considered algorithms. An optimal air monitoring network for Ulsan was suggested based on the improvement of RMSE.

Model-based inverse regression for mixture data

  • Choi, Changhwan;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.1
    • /
    • pp.97-113
    • /
    • 2017
  • This paper proposes a method for sufficient dimension reduction (SDR) of mixture data. We consider mixture data containing more than one component that have distinct central subspaces. We adopt an approach of a model-based sliced inverse regression (MSIR) to the mixture data in a simple and intuitive manner. We employed mixture probabilistic principal component analysis (MPPCA) to estimate each central subspaces and cluster the data points. The results from simulation studies and a real data set show that our method is satisfactory to catch appropriate central spaces and is also robust regardless of the number of slices chosen. Discussions about root selection, estimation accuracy, and classification with initial value issues of MPPCA and its related simulation results are also provided.

Comparison of clustering with yeast microarray gene expression data (효모 마이크로어레이 유전자발현 데이터에 대한 군집화 비교)

  • Lee, Kyung-A;Kim, Jae-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.741-753
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. We compare model-based clustering, K-means, PAM, SOM and hierarchical Ward method with yeast data. As the validity measure for clustering results, connectivity, Dunn Index and silhouette values are computed and compared.

Computer Vision-based Method to Detect Fire Using Color Variation in Temporal Domain

  • Hwang, Ung;Jeong, Jechang;Kim, Jiyeon;Cho, JunSang;Kim, SungHwan
    • Quantitative Bio-Science
    • /
    • v.37 no.2
    • /
    • pp.81-89
    • /
    • 2018
  • It is commonplace that high false detection rates interfere with immediate vision-based fire monitoring system. To circumvent this challenge, we propose a fire detection algorithm that can accommodate color variations of RGB in temporal domain, aiming at reducing false detection rates. Despite interrupting images (e.g., background noise and sudden intervention), the proposed method is proved robust in capturing distinguishable features of fire in temporal domain. In numerical studies, we carried out extensive real data experiments related to fire detection using 24 video sequences, implicating that the propose algorithm is found outstanding as an effective decision rule for fire detection (e.g., false detection rate <10%).

INERTIAL PROXIMAL AND CONTRACTION METHODS FOR SOLVING MONOTONE VARIATIONAL INCLUSION AND FIXED POINT PROBLEMS

  • Jacob Ashiwere Abuchu;Godwin Chidi Ugwunnadi;Ojen Kumar Narain
    • Nonlinear Functional Analysis and Applications
    • /
    • v.28 no.1
    • /
    • pp.175-203
    • /
    • 2023
  • In this paper, we study an iterative algorithm that is based on inertial proximal and contraction methods embellished with relaxation technique, for finding common solution of monotone variational inclusion, and fixed point problems of pseudocontractive mapping in real Hilbert spaces. We establish a strong convergence result of the proposed iterative method based on prediction stepsize conditions, and under some standard assumptions on the algorithm parameters. Finally, some special cases of general problem are given as applications. Our results improve and generalized some well-known and related results in literature.

Comparison of semiparametric methods to estimate VaR and ES (조건부 Value-at-Risk와 Expected Shortfall 추정을 위한 준모수적 방법들의 비교 연구)

  • Kim, Minjo;Lee, Sangyeol
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.171-180
    • /
    • 2016
  • Basel committee suggests using Value-at-Risk (VaR) and expected shortfall (ES) as a measurement for market risk. Various estimation methods of VaR and ES have been studied in the literature. This paper compares semi-parametric methods, such as conditional autoregressive value at risk (CAViaR) and conditional autoregressive expectile (CARE) methods, and a Gaussian quasi-maximum likelihood estimator (QMLE)-based method through back-testing methods. We use unconditional coverage (UC) and conditional coverage (CC) tests for VaR, and a bootstrap test for ES to check the adequacy. A real data analysis is conducted for S&P 500 index and Hyundai Motor Co. stock price index data sets.

Hydrologic Response Estimation Using Mallows' $C_L$ Statistics (Mallows의 $C_L$ 통계량을 이용한 수문응답 추정)

  • Seong, Gi-Won;Sim, Myeong-Pil
    • Journal of Korea Water Resources Association
    • /
    • v.32 no.4
    • /
    • pp.437-445
    • /
    • 1999
  • The present paper describes the problem of hydrologic response estimation using non-parametric ridge regression method. The method adapted in this work is based on the minimization of the $C_L$ statistics, which is an estimate of the mean square prediction error. For this method, effects of using both the identity matrix and the Laplacian matrix were considered. In addition, we evaluated methods for estimating the error variance of the impulse response. As a result of analyzing synthetic and real data, a good estimation was made when the Laplacian matrix for the weighting matrix and the bias corrected estimate for the error variance were used. The method and procedure presented in present paper will play a robust and effective role on separating hydrologic response.

  • PDF