• Title/Summary/Keyword: K-평균 군집분석

Search Result 449, Processing Time 0.03 seconds

Stratification Method Using κ-Spatial Medians Clustering (κ-공간중위 군집방법을 활용한 층화방법)

  • Son, Soon-Chul;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.677-686
    • /
    • 2009
  • Stratification of population is widely used to improve the efficiency of the estimation in a sample survey. However, it causes several problems when there are some variables containing outliers. To overcome these problems, Park and Yun (2008) proposed a rather subjective method, which finds outliers before $\kappa$-means clustering for stratification. In this study, we propose the $\kappa$-spatial medians clustering method which is more robust than $\kappa$-means clustering method and also does not need the process of finding outliers in advance. We investigate the characteristics of the proposed method through a case study used in Park and Yun (2008) and confirm the efficiency of the proposed method.

Evaluation of horticultural traits and genetic relationship in melon germplasm (멜론 유전자원의 원예형질 특성 및 유연관계 분석)

  • Jung, Jaemin;Choi, Sunghwan;Oh, Juyeol;Kim, Nahui;Kim, Daeun;Son, Beunggu;Park, Younghoon
    • Journal of Plant Biotechnology
    • /
    • v.42 no.4
    • /
    • pp.401-408
    • /
    • 2015
  • Horticultural traits and genetic relationship were evaluated for 83 melon (Cucumis melo L.) cultivars. Survey of a total of 36 characteristics for seedling, leaf, stem, flower, fruit, and seed and subsequent multiple analysis of variance (MANOVA) were conducted. Principal component analysis (PCA) showed that 8 principle components including fruit weight, fruit length, fruit diameter, cotyledon length, seed diameter, and seed length accounted for 76.3% of the total variance. Cluster analysis of the 83 melon cultivars using average linkage method resulted in 5 clusters at coefficient of 0.7. Cluster I consisted of cultivars with high values for fruit-related traits, Cluster II for soluble solid content, and Cluster V for high ripening rate. Genotyping of the 83 cultivars was conducted using 15 expressed-sequence tagged-simple sequence repeat (EST-SSR) from the Cucurbit Genomics Initiative (ICuGI) database. Analysis of genetic relatedness by UPGMA resulted in 6 clusters. Mantel test indicated that correlation between morphological and genetic distance was very low (r = -0.11).

Classification of Terrestrial LiDAR Data Using Factor and Cluster Analysis (요인 및 군집분석을 이용한 지상 라이다 자료의 분류)

  • Choi, Seung-Pil;Cho, Ji-Hyun;Kim, Yeol;Kim, Jun-Seong
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.4
    • /
    • pp.139-144
    • /
    • 2011
  • This study proposed a classification method of LIDAR data by using simultaneously the color information (R, G, B) and reflection intensity information (I) obtained from terrestrial LIDAR and by analyzing the association between these data through the use of statistical classification methods. To this end, first, the factors that maximize variance were calculated using the variables, R, G, B, and I, whereby the factor matrix between the principal factor and each variable was calculated. However, although the factor matrix shows basic data by reducing them, it is difficult to know clearly which variables become highly associated by which factors; therefore, Varimax method from orthogonal rotation was used to obtain the factor matrix and then the factor scores were calculated. And, by using a non-hierarchical clustering method, K-mean method, a cluster analysis was performed on the factor scores obtained via K-mean method as factor analysis, and afterwards the classification accuracy of the terrestrial LiDAR data was evaluated.

A Study on the Impacts of Truck Platooning on Freeway Traffic-Flow and the Effect of Dedicated Lane (고속도로 화물차의 군집주행이 교통류에 미치는 영향 및 전용차로 효과 연구)

  • KIM, Joohye;Lee, YoungIhn
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.5
    • /
    • pp.52-69
    • /
    • 2020
  • Considering the need for an infrastructure-level review, this study analyzed the impact of truck platooning on freeway traffic flow and the effect of dedicated lanes based on domestic road and traffic conditions. According to the study, the higher traffic volume and truck ratio, the higher ratio of platoons and the greater size of platoons are formed, which results in greater effect of increasing the average speed of the network. Therefore, the routes with heavy traffic and heavy cargo traffic could be positively considered for truck platooning. However, the analysis showed that the effect of increasing the average speed of the entire network is difficult to expect in the event of a queue due to entry and exit, and that the overall network's throughput could be reduced. Therefore, traffic operation strategies associated with the access road, such as securing capacity of the connection, are needed to maximize the effect of truck platooning. When it comes to the effect of dedicated lane, it could have a positive effect only if one lane was fully operated by automated trucks under the condition of 100% MPR, which allowed positive effects in all aspects, such as higher average speed, throughput, and reduced conflict rates.

Plant Community Structure of Pinus densiflora S. et Z. Forest in the Geumjeongsan (Mt.), Busan Metropolitan City (부산광역시 금정산 소나무림 식생구조 연구)

  • Lee, Kyoung-Jae;Kwak, Jeong-In;Kwak, Nam-Hyun;Jang, Jong-Soo
    • Korean Journal of Environment and Ecology
    • /
    • v.27 no.4
    • /
    • pp.462-472
    • /
    • 2013
  • This study was carried out to provide a basic data for preservation of Pinus desiflora forest as cultural landscape forest by analyzing characteristics of plant community of P. desiflora forest in Geumjeongsan(mountatin) in Busan city. In order to analyze plant community of P. densiflora in Geumjeongsan, we set up 10 study plots inside and 8 plots outside of Geumjeongsansung(mountain fortress, hereinafter 'Sansung')(unit area: $400m^2$), a total of 18 plots. TWINSPAN analysis divided these 18 study plots into 6 communities which are Querqus serrata-P. desiflora community, P. desiflora community, P. desiflora-Q. serrata community, P. thunbergii-P. densiflora community, P. densiflora-P. thubergii-Q. acutissima community, and P. densiflora-Platycarya strobilacea community. Importance Percentage (I.P.) of each area and DBH class distribution of main species showed that P. densiflora community would succeed to Q. serrata community or C. tschonoskii community. Analysis on tree age found out that communities in the Sansung were 32~37 years old and those outside the Sansung were 44~57 years old. Shannon's species diversity index ranged from 0.4826 to 1.2499. Regarding correlation between species, P. densiflora had negative correlation with Styrax japonica. Based on abovementioned result we expected ecological succession from P. densiflora community to Q. serrata community inside of the Sansung. Outside the Sansung, succession from P. densiflora-P. thunbergii community to C. tschonoskii-Q. serrata community was expected. In order to manage P. densiflora forest as cultural landscape forest, Q. spp in the understory and shrub layer and deciduous broad-leaved arboreal trees should be managed. Tree crown management of deciduous broad-leaved trees in competition with P. desiflora, is also required.

Correlation Analysis between Climatic Factors and Radial Growth and Growth Prediction for Pinus densiflora and Larix kaempferi in South Korea (소나무와 일본잎갈나무의 연륜생장과 기후 요소와의 상관관계 분석 및 생장예측)

  • Chung, Junmo;Kim, Hyunseop;Kim, Meesook;Chun, Yongwoo
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.1
    • /
    • pp.77-86
    • /
    • 2017
  • This study was conducted to analyze the relationship among climatic factors and radial growth of Pinus densiflora and Larix kaempferi in South Korea. To determine the climate-growth relationship, cluster analysis was applied to group surveyed regions by the climatical similarity, and a dendroclimatological model was developed to predict radial growth for each climate group under the RCP 4.5 and RCP 8.5 scenarios for greenhouse gases. The cluster analysis showed four climatic clusters (Cluster 1~4) from 10 regions for P. densiflora and L. kaempferi. The dendroclimatological model was developed through climatic variables and standardized residual chronology for each climatic cluster of P. densiflora and L. kaempferi. Four climatic variables were used in the models for P. densiflora ($R^2$ values between 0.38 to 0.58). Two to five climatic variables were used in the models for L. kaempferi ($R^2$ values between 0.31 to 0.43). The growth simulations with two RCP climate-change scenarios(RCP 4.5 and RCP 8.5) were used for growth prediction. The radial growth of the Cluster 4 of P. densiflora, the mountainous region at high elevation, tend to increase, while those of cluster 2 and 3 of P. densiflora, the region of the hightest average temperature, tends to decrease. The radial growth of the Cluster 1 of L. kaempferi the region of the lowest minimum temperature, while that of Cluster 2, the region of the highest average temperature, tends to decrease. The radial growth of Cluster 3 of L. kaempferi, the region in the east coast and Cluster 4, the region at high elevation, tends to hold steady. The results of this study are expected to provide valuable information necessary for predicting changes in radial growth of Pinus densiflora and Larix kaempferi caused by climate change.

Statistical methods for testing tumor heterogeneity (종양 이질성을 검정을 위한 통계적 방법론 연구)

  • Lee, Dong Neuck;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.331-348
    • /
    • 2019
  • Understanding the tumor heterogeneity due to differences in the growth pattern of metastatic tumors and rate of change is important for understanding the sensitivity of tumor cells to drugs and finding appropriate therapies. It is often possible to test for differences in population means using t-test or ANOVA when the group of N samples is distinct. However, these statistical methods can not be used unless the groups are distinguished as the data covered in this paper. Statistical methods have been studied to test heterogeneity between samples. The minimum combination t-test method is one of them. In this paper, we propose a maximum combinatorial t-test method that takes into account combinations that bisect data at different ratios. Also we propose a method based on the idea that examining the heterogeneity of a sample is equivalent to testing whether the number of optimal clusters is one in the cluster analysis. We verified that the proposed methods, maximum combination t-test method and gap statistic, have better type-I error and power than the previously proposed method based on simulation study and obtained the results through real data analysis.

Analysis of Types and Characteristics of Clothing Lifestyle of the New Forty Generation

  • Bok, Mi-Jung;Hong, Eun-Sil
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.9
    • /
    • pp.151-158
    • /
    • 2020
  • The purpose of this study was to analyze the characteristics of each type after categorizing the clothing lifestyle of 394 male office workers in their 30s and 50s. The data were analyzed with PASW 18.0 using frequency analysis, k-means cluster analysis, one-way ANOVA and crosstabs analysis. According to findings, first of all, types of clothing lifestyle are divided into 4 groups: a type of fashion leader(22.3%), a type of price sensitive(12.2%), a type of fashion indifference(27.9%), a type of normcore fashion(37.6%). Secondly, the types of clothing lifestyle showed statistically significant difference age, marital status, job and monthly average household income of socio-economic variables. Thirdly, the types of clothing lifestyle showed statistically significant difference monthly average appearance care cost, suit count, monthly average clothing purchase cost, average purchase cost of one suit.

Multivariate Analysis of Variation of Growth and Quality Characteristics in Colored Rice Germplasm (유색미 도입 유전자원의 생육 및 품질특성 변이 다변량 분석)

  • Park, Jong-Hyun;Lee, Ji-Yoon;Chun, Jae-Buhm;You, Oh-Jong;Son, Eun-Ho
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.63 no.3
    • /
    • pp.175-185
    • /
    • 2018
  • The aim of this study was to evaluate the variation of growth and quality characteristics in colored rice from 178 accessions and to develop useful, basic rice breeding data by classifying these germplasm characteristics via principal component (PC) analysis. The coefficient of variation of the 178colored rice accessions were the highest for panicle length (PL) and protein contents, followed by length-width ratio (LWR), 1000-grain weight (TGW), culm length (CL), and amylose contents, whereas the lowest was for the number of panicles per hill (NP), which is a yield component. The results from the PC analysis exhibited eigenvalues and contributions respective to each PC as follows: PC1, 2.06 and 29.49%; PC2, 1.31 and 18.75%; PC3, 1.21 and 17.36%; PC4, 1.01 and 14.38%. The eigenvalues of four PCs were over 1.0, and their cumulative contributions were 79.98%, which completes the necessary condition for evaluation of the 178 colored rice accessions. Cluster analysis showed cluster I as the largest, which included 79 accessions, while clusters II, III, IV, V, VI, and VII comprised 46, 19, 13, 4, 8, and 9 accessions, respectively. Moreover, dark brown accessions were dispersed in clusters I and II, and many resources of purple seed coat color were found in clusters V, VI, and VII. Particularly, cluster V had resources of only black and purple seed coat colors. Resources of cluster VII were found to have a relatively small average CL, PL, and LWR; notably, cluster V had the smallest average TGW, and cluster IV the lowest NP but the highest TGW. Finally, considering the yield potential, growth characteristics, heading stage, and color during breeding of colored rice, we obtained the following conclusions: cluster VII is suitable for breeding of colored rice; cross breeding among clusters I, II, and VII has a high yield potential; and it is possible to produce a superior color by cross breeding plants from cluster V and VI.

Analysis of COVID-19 Context-awareness based on Clustering Algorithm (클러스터링 알고리즘기반의 COVID-19 상황인식 분석)

  • Lee, Kangwhan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.5
    • /
    • pp.755-762
    • /
    • 2022
  • This paper propose a clustered algorithm that possible more efficient COVID-19 disease learning prediction within clustering using context-aware attribute information. In typically, clustering of COVID-19 diseases provides to classify interrelationships within disease cluster information in the clustering process. The clustering data will be as a degrade factor if new or newly processing information during treated as contaminated factors in comparative interrelationships information. In this paper, we have shown the solving the problems and developed a clustering algorithm that can extracting disease correlation information in using K-means algorithm. According to their attributes from disease clusters using accumulated information and interrelationships clustering, the proposed algorithm analyzes the disease correlation clustering possible and centering points. The proposed algorithm showed improved adaptability to prediction accuracy of the classification management system in terms of learning as a group of multiple disease attribute information of COVID-19 through the applied simulation results.