• Title/Summary/Keyword: Statistical similarity

Search Result 311, Processing Time 0.029 seconds

Similarity of Sampling Sites by Water Quality (수질 관측지점 유사성 측정방법 연구)

  • Kwon, Se-Hyug;Lee, Yo-Sang
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.1
    • /
    • pp.39-45
    • /
    • 2010
  • As the value of environment is increasing, the water quality has been a matter of interest to the nation and people. Research on water quality has been widely studied, but focused on geographical characteristic and river characteristics like inflow, outflow, quantity and speed of water. In this paper, two approaches to measure the similarity of sampling sites by using water quality data are discussed and compared with two-years empirical data of Yongdam-Dam. The existing method has calculated their similarities with principal component scores. The proposed approach in this paper use correlation matrix of water quality related variables and MDS for measuring the similarity, which is shown to be better in the sense of being clustering which is identical to geographical clustering since it can consider the time series pattern of water quality.

Antecedents of consumers' decision postponement on purchasing fast fashion brands (패스트 패션 브랜드에 대한 소비자 의사결정 연기의 선행변수)

  • Park, Hye-Jung
    • The Research Journal of the Costume Culture
    • /
    • v.22 no.5
    • /
    • pp.743-759
    • /
    • 2014
  • The purpose of this study is to identify the antecedents of consumers' decision postponement on purchasing fast fashion brands. Ongoing search behavior, overchoice confusion, and similarity confusion were considered as antecedents. It was hypothesized that ongoing search behavior influences decision postponement both directly and indirectly through overchoice confusion and similarity confusion. Data were gathered by surveying university students in Seoul, using convenience sampling. Three hundred five questionnaires were used in the statistical analysis, which were exploratory factor analysis using SPSS and confirmatory factor analysis and path analysis using AMOS. Factor analysis proved that ongoing search behavior, overchoice confusion, similarity confusion, and decision postponement were uni-dimensions. Tests of the hypothesized path proved that ongoing search behavior influences decision postponement indirectly through overchoice confusion. In addition, similarity confusion influences decision postponement. The results suggest some confusion reduction strategies for marketers of fast fashion brands. Suggestions for future study are also discussed.

Similarity Measurement Using Open-Ball Scheme for 2D Patterns in Comparison with Moment Invariant Method (Open-Ball Scheme을 이용한 2D 패턴의 상대적 닮음 정도 측정의 Moment Invariant Method와의 비교)

  • Kim, Seong-Su
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.1
    • /
    • pp.76-81
    • /
    • 1999
  • The degree of relative similarity between 2D patterns is obtained using Open-Ball Scheme. Open-Ball Scheme employs a method of transforming the geometrical information on 3D objects or 2D patterns into the features to measure the relative similarity for object(patten) recognition, with invariance on scale, rotation, and translation. The feature of an object is used to obtain the relative similarity and mapped into [0, 1] the interval of real line. For decades, Moment-Invariant Method has been used as one of the excellent methods for pattern classification and object recognition. Open-Ball Scheme uses the geometrical structure of patterns while Moment Invariant Method uses the statistical characteristics. Open-Ball Scheme is compared to Moment Invariant Method with respect to the way that it interprets two-dimensional patten classification, especially the paradigms are compared by the degree of closeness to human's intuitive understanding. Finally the effectiveness of the proposed Open-Ball Scheme is illustrated through simulations.

  • PDF

The Similarity Plot for Comparing Clustering Methods (군집분석 방법들을 비교하기 위한 상사그림)

  • Jang, Dae-Heung
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.2
    • /
    • pp.361-373
    • /
    • 2013
  • There are a wide variety of clustering algorithms; subsequently, we need a measure of similarity between two clustering methods. Such a measure can compare how well different clustering algorithms perform on a set of data. More numbers of compared clustering algorithms allow for more number of valuers for a measure of similarity between two clustering methods. Thus, we need a simple tool that presents the many values of a measure of similarity to compare many clustering methods. We suggest some graphical tools to compareg many clustering methods.

A Sampling-based Algorithm for Top-${\kappa}$ Similarity Joins (Top-${\kappa}$ 유사도 조인을 위한 샘플링 기반 알고리즘)

  • Park, Jong Soo
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.256-261
    • /
    • 2014
  • The problem of top-${\kappa}$ set similarity joins finds the top-${\kappa}$ pairs of records ranked by their similarities between two sets of input records. We propose an efficient algorithm to return top-${\kappa}$ similarity join pairs using a sampling technique. From a sample of the input records, we construct a histogram of set similarity joins, and then compute an estimated similarity threshold in the histogram for top-${\kappa}$ join pairs within the error bound of 95% confidence level based on statistical inference. Finally, the estimated threshold is applied to the traditional similarity join algorithm which uses the min-heap structure to get top-${\kappa}$ similarity joins. The experimental results show the good performance of the proposed algorithm on large real datasets.

Stochastic Self-similarity Analysis and Visualization of Earthquakes on the Korean Peninsula (한반도에서 발생한 지진의 통계적 자기 유사성 분석 및 시각화)

  • JaeMin Hwang;Jiyoung Lim;Hae-Duck J. Jeong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.493-504
    • /
    • 2023
  • The Republic of Korea is located far from the boundary of the earthquake plate, and the intra-plate earthquake occurring in these areas is generally small in size and less frequent than the interplate earthquake. Nevertheless, as a result of investigating and analyzing earthquakes that occurred on the Korean Peninsula between the past two years and 1904 and earthquakes that occurred after observing recent earthquakes on the Korean Peninsula, it was found that of a magnitude of 9. In this paper, the Korean Peninsula Historical Earthquake Record (2 years to 1904) published by the National Meteorological Research Institute is used to analyze the relationship between earthquakes on the Korean Peninsula and statistical self-similarity. In addition, the problem solved through this paper was the first to investigate the relationship between earthquake data occurring on the Korean Peninsula and statistical self-similarity. As a result of measuring the degree of self-similarity of earthquakes on the Korean Peninsula using three quantitative estimation methods, the self-similarity parameter H value (0.5 < H < 1) was found to be above 0.8 on average, indicating a high degree of self-similarity. And through graph visualization, it can be easily figured out in which region earthquakes occur most often, and it is expected that it can be used in the development of a prediction system that can predict damage in the event of an earthquake in the future and minimize damage to property and people, as well as in earthquake data analysis and modeling research. Based on the findings of this study, the self-similar process is expected to help understand the patterns and statistical characteristics of seismic activities, group and classify similar seismic events, and be used for prediction of seismic activities, seismic risk assessments, and seismic engineering.

Decision Tree Based Context Clustering with Cross Likelihood Ratio for HMM-based TTS (HMM 기반의 TTS를 위한 상호유사도 비율을 이용한 결정트리 기반의 문맥 군집화)

  • Jung, Chi-Sang;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.174-180
    • /
    • 2013
  • This paper proposes a decision tree based context clustering algorithm for HMM-based speech synthesis systems using the cross likelihood ratio with a hierarchical prior (CLRHP). Conventional algorithms tie the context-dependent HMM states that have similar statistical characteristics, but they do not consider the statistical similarity of split child nodes, which does not guarantee the statistical difference between the final leaf nodes. The proposed CLRHP algorithm improves the reliability of model parameters by taking a criterion of minimizing the statistical similarity of split child nodes. Experimental results verify the superiority of the proposed approach to conventional ones.

Statistical Characteristics of Self-similar Data Traffic (자기유사성을 갖는 데이터 트래픽의 통계적인 특성)

  • Koo Hye-Ryun;Hong Keong-Ho;Lim Seog-Ku
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2005.05a
    • /
    • pp.410-415
    • /
    • 2005
  • Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits at a wide range of scales - Self-similarity. Self-similarity is expressed by long term dependency, this is contradictory concept with Poisson model that have relativity short term dependency. Therefore, first of all for design and dimensioning of next generation communication network, traffic model that are reflected burstness and self-similarity is required. Here self-similarity can be characterized by Hurst parameter. In this paper, when different many data traffic being integrated under various environments is arrived to communication network, Hurst Parameter's change is analyzed and compared with simulation results.

  • PDF

Web Page Similarity based on Size and Frequency of Tokens (토큰 크기 및 출현 빈도에 기반한 웹 페이지 유사도)

  • Lee, Eun-Joo;Jung, Woo-Sung
    • Journal of Information Technology Services
    • /
    • v.11 no.4
    • /
    • pp.263-275
    • /
    • 2012
  • It is becoming hard to maintain web applications because of high complexity and duplication of web pages. However, most of research about code clone is focusing on code hunks, and their target is limited to a specific language. Thus, we propose GSIM, a language-independent statistical approach to detect similar pages based on scarcity and frequency of customized tokens. The tokens, which can be obtained from pages splitted by a set of given separators, are defined as atomic elements for calculating similarity between two pages. In this paper, the domain definition for web applications and algorithms for collecting tokens, making matrics, calculating similarity are given. We also conducted experiments on open source codes for evaluation, with our GSIM tool. The results show the applicability of the proposed method and the effects of parameters such as threshold, toughness, length of tokens, on their quality and performance.

A Document Ranking Method by Document Clustering Using Bayesian SoM and Botstrap (베이지안 SOM과 붓스트랩을 이용한 문서 군집화에 의한 문서 순위조정)

  • Choe, Jun-Hyeok;Jeon, Seong-Hae;Lee, Jeong-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.7
    • /
    • pp.2108-2115
    • /
    • 2000
  • The conventional Boolean retrieval systems based on vector spae model can provide the results of retrieval fast, they can't reflect exactly user's retrieval purpose including semantic information. Consequently, the results of retrieval process are very different from those users expected. This fact forces users to waste much time for finding expected documents among retrieved documents. In his paper, we designed a bayesian SOM(Self-Organizing feature Maps) in combination with bayesian statistical method and Kohonen network as a kind of unsupervised learning, then perform classifying documents depending on the semantic similarity to user query in real time. If it is difficult to observe statistical characteristics as there are less than 30 documents for clustering, the number of documents must be increased to at least 50. Also, to give high rank to the documents which is most similar to user query semantically among generalized classifications for generalized clusters, we find the similarity by means of Kohonen centroid of each document classification and adjust the secondary rank depending on the similarity.

  • PDF