• Title/Summary/Keyword: K-평균 군집분석

Search Result 449, Processing Time 0.026 seconds

Multi-variate Statistical Analysis for Evaluation of Water Quality Properties in Korean Rural Watershed (농촌유역의 수질평가를 위한 다변량분석 기법의 이용)

  • Kim, Jin-Ho;Choi, Chul-Mann;Kim, Won-Il;Lee, Jong-Sik;Jung, Goo-Bok;Han, Kuk-Heon;Ryu, Jong-Soo;Lee, Jung-Taek;Kwun, Soon-Kuk
    • Korean Journal of Environmental Agriculture
    • /
    • v.26 no.1
    • /
    • pp.17-24
    • /
    • 2007
  • This study was carried out to classify the streams at rural watersheds by characteristics of water quality. The water quality data of 319 steams at rural watersheds in Korea were selected. Multivariate analysis was used for this purpose. The cases were divided into 5 types, and then factor analysis and cluster analysis were done. The characteristics of water quality of rural watersheds can be showed more than 40% of total water quality by first factor(organic matters and nutrients). The cluster analysis of extracted factors using factor analysis was carried out. The results showed that the Case 1 and Case 2 were classified 4 communities, Case 3 was classified 5 communities and Case 4 and 5 were classified 3 communities. Among 5 types cases, to classified the steams of rural watersheds, Case 4 - 7 water quality items - was selected as a desirable case. Many kinds of statistical analysis can be used to classify the streams of rural watersheds. Our results showed a good example to evaluate the water quality properties in Korean rural watershed.

Creation and clustering of proximity data for text data analysis (텍스트 데이터 분석을 위한 근접성 데이터의 생성과 군집화)

  • Jung, Min-Ji;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.451-462
    • /
    • 2019
  • Document-term frequency matrix is a type of data used in text mining. This matrix is often based on various documents provided by the objects to be analyzed. When analyzing objects using this matrix, researchers generally select only terms that are common in documents belonging to one object as keywords. Keywords are used to analyze the object. However, this method misses the unique information of the individual document as well as causes a problem of removing potential keywords that occur frequently in a specific document. In this study, we define data that can overcome this problem as proximity data. We introduce twelve methods that generate proximity data and cluster the objects through two clustering methods of multidimensional scaling and k-means cluster analysis. Finally, we choose the best method to be optimized for clustering the object.

Groundwater-use Estimation Method Based on Field Monitoring Data in South Korea (실측 자료에 기반한 우리나라 지하수의 용도별 이용량 추정 방법)

  • Kim, Ji-Wook;Jun, Hyung-Pil;Lee, Chan-Jin;Kim, Nam-Ju;Kim, Gyoo-Bum
    • The Journal of Engineering Geology
    • /
    • v.23 no.4
    • /
    • pp.467-476
    • /
    • 2013
  • With increasing interest in environmental issues and the quality of surface water becoming inadequate for water supply, the Korean government has launched a groundwater development policy to satisfy the demand for clean water. To drive this policy effectively, it is essential to guarantee the accuracy of sustainable groundwater yield and groundwater use amount. In this study, groundwater use was monitored over several years at various locations in Korea (32 cities/counties in 5 provinces) to obtain accurate groundwater use data. Statistical analysis of the results was performed as a method for estimating rational groundwater use. For the case of groundwater use for living purposes, we classified the cities/counties into three regional types (urban, rural, and urban-rural complex) and divided the groundwater facilities into five types (domestic use, apartment housing, small-scale water supply, schools, and businesses) according to use. For the case of agricultural use, we defined three regional types based on rainfall intensity (average rainfall, below-average rainfall, and above-average rainfall) and the facilities into six types (rice farming, dry-field farming, floriculture, livestock-cows, livestock-pigs, and livestock-chickens). Finally, we developed groundwater-use estimation equations for each region and use type, using cluster analysis and regression model analysis of the monitoring data. The results will enhance the reliability of national groundwater statistics.

An Empirical Comparison and Verification Study on the Containerports Clustering Measurement Using K-Means and Hierarchical Clustering(Average Linkage Method Using Cross-Efficiency Metrics, and Ward Method) and Mixed Models (K-Means 군집모형과 계층적 군집(교차효율성 메트릭스에 의한 평균연결법, Ward법)모형 및 혼합모형을 이용한 컨테이너항만의 클러스터링 측정에 대한 실증적 비교 및 검증에 관한 연구)

  • Park, Ro-Kyung
    • Journal of Korea Port Economic Association
    • /
    • v.34 no.3
    • /
    • pp.17-52
    • /
    • 2018
  • The purpose of this paper is to measure the clustering change and analyze empirical results. Additionally, by using k-means, hierarchical, and mixed models on Asian container ports over the period 2006-2015, the study aims to form a cluster comprising Busan, Incheon, and Gwangyang ports. The models consider the number of cranes, depth, birth length, and total area as inputs and container twenty-foot equivalent units(TEU) as output. Following are the main empirical results. First, ranking order according to the increasing ratio during the 10 years analysis shows that the value for average linkage(AL), mixed ward, rule of thumb(RT)& elbow, ward, and mixed AL are 42.04% up, 35.01% up, 30.47%up, and 23.65% up, respectively. Second, according to the RT and elbow models, the three Korean ports can be clustered with Asian ports in the following manner: Busan Port(Hong Kong, Guangzhou, Qingdao, and Singapore), Incheon Port(Tokyo, Nagoya, Osaka, Manila, and Bangkok), and Gwangyang Port(Gungzhou, Ningbo, Qingdao, and Kasiung). Third, optimal clustering numbers are as follows: AL(6), Mixed Ward(5), RT&elbow(4), Ward(5), and Mixed AL(6). Fourth, empirical clustering results match with those of questionnaire-Busan Port(80%), Incheon Port(17%), and Gwangyang Port(50%). The policy implication is that related parties of Korean seaports should introduce port improvement plans like the benchmarking of clustered seaports.

WAVE System Performance for Platooning Vehicle Service Requirements Under Highway Environments (고속도로 환경에서 군집주행 서비스 요구사항에 대한 WAVE 통신시스템 성능 분석)

  • Song, Yoo-seung;Choi, Hyun Kyun
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.1
    • /
    • pp.147-156
    • /
    • 2017
  • This paper analyzes the performance limit of WAVE system for the platooning service requirements which is referred from the de facto standards. The performance of the packet error rate and mean delay as key parameters in the wireless communication systems should be satisfied to provide safety to the platooning vehicles. The test scenarios are conducted by considering the following vehicle groups: platooning vehicles, vehicles within a hop distance and vehicles within two hop distance( called hidden node vehicles). The models of packet error rate and delay deals with the topology of aforementioned vehicle groups, vehicle speed and communication range. The numerical results are obtained in terms of packet size, packet arrival rate and data transmission rate. Finally, this paper suggests the robust range of packet error rate and delay for the WAVE system to provide the platooning vehicle service.

Differences in the Community Structures of Macrobenthic Polychaetes from Farming Grounds and Natural Habitats in Gamak Bay (가막만 양식장과 자연 서식지에서의 대형저서다모류 군집구조 차이)

  • Jang, So Yun;Shin, Hyun Chool
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.19 no.4
    • /
    • pp.297-309
    • /
    • 2016
  • This study was carried out to investigate the differences in sedimentary environments and benthic polychaete communities between farming grounds and natural habitats (non-farming ground) in Gamak Bay. Sampling stations of natural habitats were evenly distributed in the entire bay. And mussel farm, oyster farm and ark-shell farm were selected as farming grounds. Dominant sedimentary facies was mud in most sampling stations of farming grounds and natural habitats. However organic contents were higher in the farming grounds than natural habitats of the bay. The species number and mean density of polychaetous community in the natural habitats were greater than those from the farming grounds. Lumbrineris longifolia, known as potential organic enrichment indicator species, was first dominant species both in farming grounds and natural habitats of the bay. However, the next dominant species consisted of different species between two benthic habitats. As a result of community analysis using cluster analysis and nMDS, the natural habitats were divided into several station groups, but most of stations in farming grounds were clustered into one group. Pearson' correlation analysis and PCA showed high relationships between sedimentary environmental factors and benthic polychaetous community in natural habitats, but low or no relationships in farming grounds. That means benthic polychaetous community established in farming ground was under unusual condition such as high input of organic matter. Thus it is necessary to improve the benthic environmental quality of the farming grounds as well as the north-western inner part in Gamak Bay through long-term monitoring efforts.

A Study on Classifications and Characteristics of Declined Rural Area in Chungcheong Region (충청권 농촌지역 쇠퇴 특성 및 유형에 관한 연구)

  • Jo, Jin-Hee;Park, Hyung-Keun;Mo, Hye-Ran;Lee, Han-Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.35 no.1
    • /
    • pp.203-215
    • /
    • 2015
  • The study aims to identify the degree and types of spatial recessions in Si/Gun and Eup/Myun units within Chungcheong region in South Korea to contribute to the efforts being made to diagnose the rural recession and the potentials. To this end, we analyzed 27 Sis and Guns to identify the degree of recession and potentials of rural areas in Chungcheong region. We also carried out the diagnosis and K-Means Clustering on 274 Eups and Myuns, smaller administrative units, to figure out the types and characteristics of the rural recessions. In case of the analysis targeting the Sis and Guns, a relatively high degree of rural recession was found in Cheongyang, Seocheon and Taean for Chungcheongnam-do, and in Danyang and Goisan, as well as in Boeun, Okcheon and Youngdong - which are collectively called as 'Southern 3 Areas in Chungcheongbuk-do' as they are conventionally known by their high degree of rural recession. According to the results of the clustering analysis carried out on the 166 Eups and Myuns, there were five outstanding clusters. They were; areas with housing deterioration (29), areas with poor economic foundation (16), areas with poor accessibility to central areas (42), areas with poor residential environment (51) and areas with aged population (28). The findings and results of the present study are likely to serve as a basis for the design and enforcement of forthcoming rural area activation policies. Also, it would be highly recommended that a more comprehensive diagnosis is taken from a community-level perspective and policy suggestions and strategies tailored for rural communities are further discussed.

Estimation of urban drinking water consumption patterns based on smart water grid monitoring data by k-means clustering in Vietnam (k-means 군집화 기법을 이용한 베트남 스마트워터그리드 계측 데이터 기반 도시 물 사용 패턴 추정)

  • Koo, Kang Min;Han, Kuk Heon;Lee, Gyumin;Jun, Kyung Soo;Yum, Kyung Taek
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.419-419
    • /
    • 2021
  • 수자원 관리 패러다임은 공급 위주에서 수요관리로 전환되고 있다. 가용한 수자원은 한정적이나 급속한 인구증가와 도시화로 인한 물 수요의 증가로 수요관리의 효율성이 중시되고 있기 때문이다. 기존 상수도시스템은 노후화로 가동효율이 점차 낮아지고 있으며, 인력으로 월 또는 격월로 소비자의 물 사용량을 검침해 실시간 관리가 불가능하여 수요와 공급의 불균형을 초래한다. 이러한 문제를 해결할 대안으로 IT 기술과 전통적인 물관리 기술을 접목한 Smart Water Grid는 양방향 통신장치를 이용해 실시간으로 소비자의 물 사용량을 모니터링한다. 물 사용 특성을 잘 파악하면 보다 정확한 물 수요 예측이 가능하다. 특히 소비자들의 시간별, 평일, 주말, 그리고 주별 물 사용 특성을 파악하면 미래 물 수요 예측에 도움이 된다. 예측된 물 수요량에 따라 물 공급 배분 계획을 수립하여 운영 효율성을 높일 수 있다. 물 수요예측 방법 중 k-mean 군집분석은 시간별 물 사용량을 이용해 서로 유사한 여러 개의 부분집합으로 할당하여 분류하는 Machine learing 방법으로 물 사용의 유사성을 파악할 수 있다. SWG 연구단은 2019년 Vietnam Hai Duong province에 SWG Pilot plant를 구축하고 27개의 Smart water meter를 설치하여 운영하고 있다. 이에 본 연구에서는 소비자의 물 사용 특성을 분석하기 위해 27개 SWM로부터 수신된 2019년 11월 14일부터 2020년 12월 3일까지 1시간 단위의 물 사용량 데이터를 수집하였다. 그리고 k-mean 군집 방법을 이용해 시간별, 평일, 주말, 그리고 주별 물 사용 특성을 분석하였다. 이 때 최적의 군집 개수 결정을 위해 Elbow 방법을 적용하였다. 분석 결과 각 소비자의 물 사용량 특성에 따라 평균 물 수요패턴 추정이 가능하며, 향후 물 수요 예측에 도움이 될 것으로 사료된다.

  • PDF

Design of an Arm Gesture Recognition System using Kinect Sensor (키넥트 센서를 이용한 팔 제스처 인식 시스템의 설계)

  • Heo, Se-Kyeong;Shin, Ye-Seul;Kim, Hye-Suk;Kim, In-Cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.250-253
    • /
    • 2013
  • 최근 카메라 영상을 이용한 제스처 인식 관련 연구가 활발히 진행되고 있다. 카메라 영상을 이용한 제스처 인식에서 많이 사용되는 학습 알고리즘에는 확률 그래프 모델인 HMM과 CRF 등이 있다. 이 학습 알고리즘들은 다차원의 연속된 실수 데이터를 가지고 모델을 학습하면 계산량이 많아진다. 본 논문에서는 팔 관절 위치 데이터를 k-평균 군집화 과정을 거쳐 1차원의 시계열 데이터로 변환 후, 제스처별로 HMM 모델을 학습하는 방법을 제안한다. 키넥트 센서를 통해 얻은 팔 관절 위치 데이터에 k-평균 군집화를 적용하여 1차원 시계열 데이터를 생성하고, 이를 HMM의 학습 및 인식에 사용한다. 본 논문에서 제안하는 방법의 성능을 분석하기 위하여, 다른 시계열 학습 알고리즘인 AP+DTW를 이용한 방법과의 비교 실험을 포함해 다양한 실험들을 수행하였다.

Calculation of the Peak-hour Ratio for Road Traffic Volumes using a Hybrid Clustering Technique (혼합군집분석 기법을 이용한 도로 교통량의 첨두율 산정)

  • Kim, Hyung-Joo;Chang, Justin S.
    • Journal of Korean Society of Transportation
    • /
    • v.30 no.1
    • /
    • pp.19-30
    • /
    • 2012
  • The majority of daily travel demands concentrate at particular time-periods, which causes the difficulties in the travel demand analysis and the corresponding benefit estimation. Thus, it is necessary to consider time-specific traffic characteristics to yield more reliable results. Traditionally, na$\ddot{i}$ve, heuristic, and statistical approaches have been applied to address the peak-hour ratio. In this study, a hybrid clustering model which is one of the statistical methods is applied to calculate the peak-hour ratio and its duration. The 2009 national 24-hour traffic data provided by the Korea institute of Construction Technology are used. The analysis is conducted dividing vehicle types into passenger cars and trucks. For the verification for the usefulness of the methodology, the toll collection system data by the Korea Express Corporation are collected. The result of the research shows lower errors during the off-peak hours and night times and increasing error ratios as the travel distance increases. Since the method proposed can reduce the arbitrariness of analysts and can accommodate the statistical significance test, the model could be considered as a more robust and stable methodology. It is hoped that the result of this paper could contribute to the enhancement of the reliability for the travel demand analysis.