• Title/Summary/Keyword: 2-Step Clustering

Search Result 86, Processing Time 0.026 seconds

Efficient Time-Series Similarity Measurement and Ranking Based on Anomaly Detection (이상탐지 기반의 효율적인 시계열 유사도 측정 및 순위화)

  • Ji-Hyun Choi;Hyun Ahn
    • Journal of Internet Computing and Services
    • /
    • v.25 no.2
    • /
    • pp.39-47
    • /
    • 2024
  • Time series analysis is widely employed by many organizations to solve business problems, as it extracts various information and insights from chronologically ordered data. Among its applications, measuring time series similarity is a step to identify time series with similar patterns, which is very important in time series analysis applications such as time series search and clustering. In this study, we propose an efficient method for measuring time series similarity that focuses on anomalies rather than the entire series. In this regard, we validate the proposed method by measuring and analyzing the rank correlation between the similarity measure for the set of subsets extracted by anomaly detection and the similarity measure for the whole time series. Experimental results, especially with stock time series data and an anomaly proportion of 10%, demonstrate a Spearman's rank correlation coefficient of up to 0.9. In conclusion, the proposed method can significantly reduce computation cost of measuring time series similarity, while providing reliable time series search and clustering results.

Determinants of Consumer Preference by type of Accommodation: Two Step Cluster Analysis (이단계 군집분석에 의한 농촌관광 편의시설 유형별 소비자 선호 결정요인)

  • Park, Duk-Byeong;Yoon, Yoo-Shik;Lee, Min-Soo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.3
    • /
    • pp.1-19
    • /
    • 2007
  • 1. Purpose Rural tourism is made by individuals with different characteristics, needs and wants. It is important to have information on the characteristics and preferences of the consumers of the different types of existing rural accommodation. The stud aims to identify the determinants of consumer preference by type of accommodations. 2. Methodology 2.1 Sample Data were collected from 1000 people by telephone survey with three-stage stratified random sampling in seven metropolitan areas in Korea. Respondents were chosen by sampling internal on telephone book published in 2006. We surveyed from four to ten-thirty 0'clock afternoon so as to systematic sampling considering respondents' life cycle. 2.2 Two-step cluster Analysis Our study is accomplished through the use of a two-step cluster method to classify the accommodation in a reduced number of groups, so that each group constitutes a type. This method had been suggested as appropriate in clustering large data sets with mixed attributes. The method is based on a distance measure that enables data with both continuous and categorical attributes to be clustered. This is derived from a probabilistic model in which the distance between two clusters in equivalent to the decrease in log-likelihood function as a result of merging. 2.3 Multinomial Logit Analysis The estimation of a Multionmial Logit model determines the characteristics of tourist who is most likely to opt for each type of accommodation. The Multinomial Logit model constitutes an appropriate framework to explore and explain choice process where the choice set consists of more than two alternatives. Due to its ease and quick estimation of parameters, the Multinomial Logit model has been used for many empirical studies of choice in tourism. 3. Findings The auto-clustering algorithm indicated that a five-cluster solution was the best model, because it minimized the BIC value and the change in them between adjacent numbers of clusters. The accommodation establishments can be classified into five types: Traditional House, Typical Farmhouse, Farmstay house for group Tour, Log Cabin for Family, and Log Cabin for Individuals. Group 1 (Traditional House) includes mainly the large accommodation establishments, i.e. those with ondoll style room providing meals and one shower room on family tourist, of original construction style house. Group 2 (Typical Farmhouse) encompasses accommodation establishments of Ondoll rooms and each bathroom providing meals. It includes, in other words, the tourist accommodations Known as "rural houses." Group 3 (Farmstay House for Group) has accommodation establishments of Ondoll rooms not providing meals and self cooking facilities, large room size over five persons. Group 4 (Log Cabin for Family) includes mainly the popular accommodation establishments, i.e. those with Ondoll style room with on shower room on family tourist, of western styled log house. While the accommodations in this group are not defined as regards type of construction, the group does include all the original Korean style construction, Finally, group 5 (Log Cabin for Individuals)includes those accommodations that are bedroom western styled wooden house with each bathroom. First Multinomial Logit model is estimated including all the explicative variables considered and taking accommodation group 2 as base alternative. The results show that the variables and the estimated values of the parameters for the model giving the probability of each of the five different types of accommodation available in rural tourism village in Korea, according to the socio-economic and trip related characteristics of the individuals. An initial observation of the analysis reveals that none of variables income, the number of journey, distance, and residential style of house is explicative in the choice of rural accommodation. The age and accompany variables are significant for accommodation establishment of group 1. The education and rural residential experience variables are significant for accommodation establishment of groups 4 and 5. The expenditure and marital status variables are significant for accommodation establishment of group 4. The gender and occupation variable are significant for accommodation establishment of group 3. The loyalty variable is significant for accommodation establishment of groups 3 and 4. The study indicates that significant differences exist among the individuals who choose each type of accommodation at a destination. From this investigation is evident that several profiles of tourists can be attracted by a rural destination according to the types of existing accommodations at this destination. Besides, the tourist profiles may be used as the basis for investment policy and promotion for each type of accommodation, making use in each case of the variables that indicate a greater likelihood of influencing the tourist choice of accommodation.

  • PDF

Construction of web-based Database for Haliotis SNP (웹기반 전복류 (Haliotis) SNP 데이터베이스 구축)

  • Jeong, Ji-Eun;Lee, Jae-Bong;Kang, Se-Won;Baek, Moon-Ki;Han, Yeon-Soo;Choi, Tae-Jin;Kang, Jung-Ha;Lee, Yong-Seok
    • The Korean Journal of Malacology
    • /
    • v.26 no.2
    • /
    • pp.185-188
    • /
    • 2010
  • The Web-based the genus Haliotis SNP database was constructed on the basis of Intel Server Platform ZSS130 dual Xeon 3.2 GHz cpu and Linux-based (Cent OS) operating system. Haliotis related sequences (2,830 nucleotide sequences, 9,102 EST sequences) were downloaded through NCBI taxonomy browser. In order to eliminate vector sequences, we conducted vector masking step using cross match software with vector sequence database. In addition, poly-A tails were removed using Trimmest software from EMBOSS package. The processed sequences were clustered and assembled by TGICL package (TIGR tools) equipped with CAP3 software. A web-based interface (Haliotis SNP Database, http://www.haliotis.or.kr) was developed to enable optimal use of the clustered assemblies. The Clustering Res. menu shows the contig sequences from the clustering, the alignment results and sequences from each cluster. And also we can compare any sequences with Haliotis related sequences in BLAST menu. The search menu is equipped with its own search engine so that it is possible to search all of the information in the database using the name of a gene, accession number and/or species name. Taken together, the Web-based SNP database for Haliotis will be valuable to develop SNPs of Haliotis in the future.

Genetic Diversity and Population Structure of Korean Soybean Landrace [Glycine max(L.) Merr.]

  • Cho, Gyu-Taek;Lee, Jeong-Ran;Moon, Jung-Kyung;Yoon, Mun-Sup;Baek, Hyung-Jin;Kang, Jung-Hoon;Kim, Tae-San;Paek, Nam-Chon
    • Journal of Crop Science and Biotechnology
    • /
    • v.11 no.2
    • /
    • pp.83-90
    • /
    • 2008
  • Two hundred and sixty Korean soybean landrace accessions were analyzed for polymorphism at 92 simple sequence repeat(SSR) loci. The 995 identified alleles served as raw data for estimating genetic diversity and population structure. The number of alleles at a locus ranged from three to 27 with a mean of 10.4 alleles per locus. $F_{ST}$ values estimated by analysis of molecular variance(AMOVA) using SSR data set were 0.018, 0.027, and 0.016 for usage, collection site and maturity groups, respectively, indicating little genetic differentiation. The model-based clustering analysis placed the accessions into three clusters(K=3) with 0.0503 of $F_{ST}$, indicating moderate genetic differentiation. Duncan's Multiple Range Test at K = 3 on the basis of 18 quantitative traits revealed that one cluster was mainly differentiated from the other two clusters by seed related traits and the other two clusters were differentiated from each other by biochemical traits. Genetic structure of Korean soybean landraces was differentiated by model-based clustering and supported by their phenotypic traits in part. This preliminary study could be the first step towards more efficient germplasm management and utilization of soybean landraces and helpful in association studies between genotypic and phenotypic traits in Korean soybean landraces.

  • PDF

Effect of Annealing of Nafion Recast Membranes Containing Ionic Liquids

  • Park, Jin-Soo;Shin, Mun-Sik;Sekhon, S.S.;Choi, Young-Woo;Yang, Tae-Hyun
    • Journal of the Korean Electrochemical Society
    • /
    • v.14 no.1
    • /
    • pp.9-15
    • /
    • 2011
  • The composite membranes comprising of sulfonated polymers as matrix and ionic liquids as ion-conducting medium in replacement of water are studied to investigate the effect of annealing of the sulfonated polymers. The polymeric membranes are prepared on recast Nafion containing the ionic liquid, 1-ethyl-3-methylimidazolium tetrafluoroborate ($EMIBF_4$). The composite membranes are characterized by thermogravitational analyses, ion conductivity and small-angle X-ray scattering. The composite membranes annealed at $190^{\circ}C$ for 2 h after the fixed drying step showed better ionic conductivity, but no significant increase in thermal stability. The mean Bragg distance between the ionic clusters, which is reflected in the position of the ionomer peak (small-angle scattering maximum), is larger in the annealed composite membranes containing $EMIBF_4$ than the non-annealed ones. It might have been explained to be due to the different level of ion-clustering ability of the hydrophilic parts (i.e., sulfonic acid groups) in the non- and annealed polymer matrix. In addition, the ionic conductivity of the membranes shows higher for the annealed composite membranes containing $EMIBF_4$. It can be concluded that the annealing of the composite membranes containing ionic liquids due to an increase in ion-clustering ability is able to bring about the enhancement of ionic conductivity suitable for potential use in proton exchange membrane fuel cells (PEMFCs) at medium temperatures ($150-200^{\circ}C$) in the absence of external humidification.

Ram Accelerator Optimization Using the Response Surface Method (반응면 기법을 이용한 램 가속기 최적설계에 관한 연구)

  • Jeon Yong-Hee;Jeon Kwon-Su;Lee Jae-Woo;Byun Yung-Hwan
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2000.05a
    • /
    • pp.159-165
    • /
    • 2000
  • In this paper, numerical study has been done for the improvement of the superdetonative ram accelerator performance and for the design optimization of the system. The objective function to optimize the premixture composition is the ram tube length required to accelerate projectile from initial velocity $V_o$ to target velocity $V_e$. The premixture is composed of $H_2,\;O_2,\;N_2$ and the mole numbers of these species are selected at design variables. RSM(Response Surface Methodology) which is widely used for the complex optimization problems is selected as the optimization technique. In particular, to improve the non-linearity of the response and to consider the accuracy and efficiency of the solution, design space stretching technique has been applied. Separate sub-optimization routine is introduced to determine the stretching position and clustering parameters which construct the optimum regression model. Two step optimization technique has been applied to obtain the optimal system. With the application of stretching technique, we can perform system optimization with a small number of experimental points, and construct precise regression model for highly non-linear domain. The error to compared with analysis result is only $0.01\%$ and it is demonstrated that present method can be applied more practical design optimization problems with many design variables.

  • PDF

Shallow Junction Device Formation and the Design of Boron Diffusion Simulator (박막 소자 개발과 보론 확산 시뮬레이터 설계)

  • Han, Myoung Seok;Park, Sung Jong;Kim, Jae Young
    • 대한공업교육학회지
    • /
    • v.33 no.1
    • /
    • pp.249-264
    • /
    • 2008
  • In this dissertation, shallow $p^+-n$ junctions were formed by ion implantation and dual-step annealing processes and a new simulator is designed to model boron diffusion in silicon. This simulator predicts the boron distribution after ion implantation and annealing. The dopant implantation was performed into the crystalline substrates using $BF_2$ ions. The annealing was performed with a RTA(Rapid Thermal Annealing) and a FA(Furnace Annealing) process. The model which is used in this simulator takes into account nonequilibrium diffusion, reactions of point defects, and defect-dopant pairs considering their charge states, and the dopant inactivation by introducing a boron clustering reaction. FA+RTA annealing sequence exhibited better junction characteristics than RTA+FA thermal cycle from the viewpoint of sheet resistance and the simulator reproduced experimental data successfully. Therefore, proposed diffusion simulator and FA+RTA annealing method was able to applied to shallow junction formation for thermal budget. process.

System identification of a super high-rise building via a stochastic subspace approach

  • Faravelli, Lucia;Ubertini, Filippo;Fuggini, Clemente
    • Smart Structures and Systems
    • /
    • v.7 no.2
    • /
    • pp.133-152
    • /
    • 2011
  • System identification is a fundamental step towards the application of structural health monitoring and damage detection techniques. On this respect, the development of evolved identification strategies is a priority for obtaining reliable and repeatable baseline modal parameters of an undamaged structure to be adopted as references for future structural health assessments. The paper presents the identification of the modal parameters of the Guangzhou New Television Tower, China, using a data-driven stochastic subspace identification (SSI-data) approach complemented with an appropriate automatic mode selection strategy which proved to be successful in previous literature studies. This well-known approach is based on a clustering technique which is adopted to discriminate structural modes from spurious noise ones. The method is applied to the acceleration measurements made available within the task I of the ANCRiSST benchmark problem, which cover 24 hours of continuous monitoring of the structural response under ambient excitation. These records are then subdivided into a convenient number of data sets and the variability of modal parameter estimates with ambient temperature and mean wind velocity are pointed out. Both 10 minutes and 1 hour long records are considered for this purpose. A comparison with finite element model predictions is finally carried out, using the structural matrices provided within the benchmark, in order to check that all the structural modes contained in the considered frequency interval are effectively identified via SSI-data.

Experimental Evaluation of Distance-based and Probability-based Clustering

  • Kwon, Na Yeon;Kim, Jang Il;Dollein, Richard;Seo, Weon Joon;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • v.2 no.1
    • /
    • pp.36-41
    • /
    • 2013
  • Decision-making is to extract information that can be executed in the future, it refers to the process of discovering a new data model that is induced in the data. In other words, it is to find out the information to peel off to find the vein to catch the relationship between the hidden patterns in data. The information found here, is a process of finding the relationship between the useful patterns by applying modeling techniques and sophisticated statistical analysis of the data. It is called data mining which is a key technology for marketing database. Therefore, research for cluster analysis of the current is performed actively, which is capable of extracting information on the basis of the large data set without a clear criterion. The EM and K-means methods are used a lot in particular, how the result values of evaluating are come out in experiments, which are depending on the size of the data by the type of distance-based and probability-based data analysis.

A Biclustering Method for Time Series Analysis

  • Lee, Jeong-Hwa;Lee, Young-Rok;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.9 no.2
    • /
    • pp.131-140
    • /
    • 2010
  • Biclustering is a method of finding meaningful subsets of objects and attributes simultaneously, which may not be detected by traditional clustering methods. It is popularly used for the analysis of microarray data representing the expression levels of genes by conditions. Usually, biclustering algorithms do not consider a sequential relation between attributes. For time series data, however, bicluster solutions should keep the time sequence. This paper proposes a new biclustering algorithm for time series data by modifying the plaid model. The proposed algorithm introduces a parameter controlling an interval between two selected time points. Also, the pruning step preventing an over-fitting problem is modified so as to eliminate only starting or ending points. Results from artificial data sets show that the proposed method is more suitable for the extraction of biclusters from time series data sets. Moreover, by using the proposed method, we find some interesting observations from real-world time-course microarray data sets and apartment price data sets in metropolitan areas.