• Title/Summary/Keyword: 군집 자료

Search Result 1,192, Processing Time 0.021 seconds

A Study on the Response Plan by Station Area Cluster through Time Series Analysis of Urban Rail Riders Before and After COVID-19 (COVID-19 전후 도시철도 승차인원 시계열 군집분석을 통한 역세권 군집별 대응방안 고찰)

  • Li, Cheng Xi;Jung, Hun Young
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.363-370
    • /
    • 2023
  • Due to the spread of COVID-19, the use of public transportation such as urban railroads has changed significantly since the beginning of 2020. Therefore, in this study, daily time series data for each urban railway station were collected for three years before COVID-19 and after the spread of COVID-19, and the similarity of time series analysis was evaluated through DTW (Dynamic Time Warping) distance method to derive regression centers for each cluster, and the effect of various external events such as COVID-19 on changes in the number of users was diagnosed as a time series impact detection function. In addition, the characteristics of use by cluster of urban railway stations were analyzed, and the change in passenger volume due to external shocks was identified. The purpose was to review measures for the maintenance and recovery of usage in the event of re-proliferation of COVID-19.

Analyzing data-related policy programs in Korea using text mining and network cluster analysis (텍스트 마이닝과 네트워크 군집 분석을 활용한 한국의 데이터 관련 정책사업 분석)

  • Sungjun Choi;Kiyoon Shin;Yoonhwan Oh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.6
    • /
    • pp.63-81
    • /
    • 2023
  • This study endeavors to classify and categorize similar policy programs through network clustering analysis, using textual information from data-related policy programs in Korea. To achieve this, descriptions of data-related budgetary programs in South Korea in 2022 were collected, and keywords from the program contents were extracted. Subsequently, the similarity between each program was derived using TF-IDF, and policy program network was constructed accordingly. Following this, the structural characteristics of the network were analyzed, and similar policy programs were clustered and categorized through network clustering. Upon analyzing a total of 97 programs, 7 major clusters were identified, signifying that programs with analogous themes or objectives were categorized based on application area or services utilizing data. The findings of this research illuminate the current status of data-related policy programs in Korea, providing policy implications for a strategic approach to planning future national data strategies and programs, and contributing to the establishment of evidence-based policies.

Selection of Optimal Variables for Clustering of Seoul using Genetic Algorithm (유전자 알고리즘을 이용한 서울시 군집화 최적 변수 선정)

  • Kim, Hyung Jin;Jung, Jae Hoon;Lee, Jung Bin;Kim, Sang Min;Heo, Joon
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.4
    • /
    • pp.175-181
    • /
    • 2014
  • Korean government proposed a new initiative 'government 3.0' with which the administration will open its dataset to the public before requests. City of Seoul is the front runner in disclosure of government data. If we know what kind of attributes are governing factors for any given segmentation, these outcomes can be applied to real world problems of marketing and business strategy, and administrative decision makings. However, with respect to city of Seoul, selection of optimal variables from the open dataset up to several thousands of attributes would require a humongous amount of computation time because it might require a combinatorial optimization while maximizing dissimilarity measures between clusters. In this study, we acquired 718 attribute dataset from Statistics Korea and conducted an analysis to select the most suitable variables, which differentiate Gangnam from other districts, using the Genetic algorithm and Dunn's index. Also, we utilized the Microsoft Azure cloud computing system to speed up the process time. As the result, the optimal 28 variables were finally selected, and the validation result showed that those 28 variables effectively group the Gangnam from other districts using the Ward's minimum variance and K-means algorithm.

Effect of Climate Change on the Tree-Ring Growth of Pinus koraiensis in Korea (기후변화가 잣나무의 연륜생장에 미치는 영향 분석)

  • Lim, Jong Hwan;Chun, Jung Hwa;Park, Ko Eun;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.105 no.3
    • /
    • pp.351-359
    • /
    • 2016
  • This study was conducted to analyze the effect of climate change on the tree-ring growth of Pinus koraiensis in Korea. Annual tree-ring growth data of P. koraiensis collected by the $5^{th}$ National Forest Inventory were first organized to analyze yearly growth patterns of the species. When tree-ring growth data were analyzed through cluster analysis based on similarity of climatic conditions, five clusters were identified. Yearly growing degree days and standard precipitation index based on daily mean temperature and precipitation data from 1951 to 2010 were calculated by cluster. Using the information, yearly temperature effect index(TEI) and precipitation effect index(PEI) by cluster were estimated to analyze the effect of climatic conditions on the growth of the species. Tree-ring growth estimation equations by cluster were developed by using the product of yearly TEI and PEI as independent variable. The tree-ring growth estimation equations were applied to the climate change scenarios of RCP 4.5 and RCP 8.5 for predicting the changes in tree-ring growth by cluster of P. koraiensis from 2011 to 2100. The results of this study are expected to provide valuable information necessary for estimating local growth characteristics of P. koraiensis and for predicting changes in tree-ring growth patterns caused by climate change.

Community Structure and Species Composition of Pinus densiflora for. erecta Forest in Mt. Cheonchuk (천축산 일대 금강소나무림의 군집구조 및 종조성)

  • Byeon, Jun Gi;Park, Byeong Joo;Joo, Sung Hyun;Cheon, KwangIl
    • Korean Journal of Plant Resources
    • /
    • v.33 no.1
    • /
    • pp.1-14
    • /
    • 2020
  • This study was conducted to analyze community structure and species composition of Pinus densiflora for. erecta Stand in Mt. Cheonchuk (653 m). Field survey was carried out from June to September in 2013. 74 plots (20×20 m) were set up, 5 herb layer plots (3×3 m) were constructed for each plot, and there, Diameter at Breast Heigh t(DBH), height, environmental factor, annual growth were measured. Vascular plants were surveyed as following; 66 family, 165 genus, 211 species, 2 sub species, 29 variety, 6 form 248 taxa. Results of cluster analysis for P. densiflora for. erecta forest, 3 communities were divided into; Quercus mongolica (P-1), Quercus variabilis (P-2) and Quercus aliena-Stephanandra incisa (P-3). There were significant environmental factors that organic layer, annual growth, CEC, total total nitrogen, organic matter and pH for each community. As a result of DCA, P-1 and P-2 were distributed large range of environmental factors but relatively limited in P-3. Distributions of herb layer were affected by sand, cation exchange capacity, silt and total nitrogen. Results of MRPP test for herb layer communities, it was significantly analyzed (A=0.003, P<0.008). Species diversity index was highly recorded in P-3 and influenced by cation exchange capacity, total nitrogen, annual growth in consequence of NMS analysis.

Analysis of Change Transitions in Regional Types in Emergency Department Patient Flows of in Jeonlado (2014-2018) (전라지역 응급실 환자의 유출입 분석 및 지역유형 변화 추이)

  • Lee, Jae-Hyeon;Lee, Sung-Min;Kim, Seongjung;Oh, Mi-Ra
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.12
    • /
    • pp.126-131
    • /
    • 2020
  • This study analyzed the inflow and outflow patterns of emergency department patients, to identify changes in regional types in cities, counties, and districts in Jeonlado, Korea. Data of areas in Jeonlado for 2014 to 2018 were extracted from the National Emergency Department Information System. The extracted data includes the patients' and emergency medical institution addresses, which were used to calculate the relevance index (RI) and commitment index (CI). The calculated indices were classified into regional types by applying cluster analysis. A non-parametric method, Kruskal-Wallis test, was employed to examine the differences between years for RI and CI by regional types. The results of cluster analysis using the relevance and commitment indices revealed three regional types. Regions in cluster 1 were classified as outflow type, in cluster 2 as inflow type, and in cluster 3 as self-sufficient. RI and CI were calculated for each cluster or regional type. There were no significant differences between years in cluster 2 (inflow type) and cluster 3 (self-sufficient type). In cluster 1 (outflow type), there were no significant differences in CI between the years; however, there were significant differences in RI between 2014 and 2017, and 2014 and 2018. It is difficult to see that the emergency medical environment has improved due to the increased concentration of emergency medical care.

Actual Vegetation and Structure of Plant Community of Forest Ecosystem in Taejongdae, Busan City, Korea (부산광역시 태종대 산림생태계의 현존식생 및 식물군집구조)

  • Kim, Jong-Yup
    • Korean Journal of Environment and Ecology
    • /
    • v.26 no.3
    • /
    • pp.426-436
    • /
    • 2012
  • This study was carried out to investigate actual vegetation, the structure of plant community, and ecological succession sere of coastal forest ecosystem in warm temperate climate zone, Taejongdae, Busan City, Korea to provide the basic data for planning of the forest management. As a result of analysis of actual vegetation, vegetation types divided into 35 types, and the area of survey site was $1,750,461m^2$. The ratio of vegetation type dominated by Pinus thunbergii was 80.7%, dominated by Quercus spp. was just 5.0%, and dominated by Carpinus tschonoskii was just 0.4%. Eighteen plots(size is $20m{\times}20m$) were set up and the results analyzed by DCA which is one of the ordination technique showed that the plant communities were divided into four groups which are community I(P. thunbergii community), community II(P. thunbergii-Quercus serrata community), community III(Q. serrata-P. thunbergii community), and community IV(Carpinus tschonoskii-P. thunbergii community). The age of community I was from 38 to 59 years old, that of community II was from 35 to 71 years old, that of community III was from 37 to 53 years old, that of community IV was from 50 to 72 years old, thus we supposed that the age of the study site is about from 38 to 72 years old. We supposed that the successional sere of the study site is in the early stage of ecological succession in the warm temperate climate zone. The dominant species will be changed from P. thunbergii to Q. serrata or Carpinus tschonoskii in the canopy layer, on the other hand, Eurya japonica will be dominant species in the understory layer, and E. japonica and Trachelospermum asiaticum var. intermediumwill be dominant species in the shrub layer for a while. According to the index of Shnnon's diversity(unit: $400m^2$), community I ranged from 0.8640 to 1.3986, community II was from 0.1731 to 1.1885, community III was from 0.8250 to 1.0042, and community IV was from 0.3436 to 0.6986.

Application of K-means Clustering Model to XRD Experimental Data in the Korea Plateau (한국대지 XRD 실험자료 대상 k-평균 군집화 모델 적용성 분석)

  • Ju Young Park;Sun Young Park;Jiyoung Choi;Sungil Kim;Yuri Kim;Bo Yeon Yi;Kyungbook Lee
    • Economic and Environmental Geology
    • /
    • v.57 no.5
    • /
    • pp.529-537
    • /
    • 2024
  • Mineral composition used to identify the sedimentary environment can be obtained through X-ray diffraction (XRD) analysis. However, due to time constraints for analyzing a large number of samples, a machine learning-based mineral composition analysis model was developed. This model demonstrated reasonable reliability for samples with usual compositions but showed poor performance for unusual samples. Consequently, a clustering model has recently been developed to classify the unusual samples, allowing experts to handle. The purpose of this study is to examine the applicability of the clustering model, developed using XRD data from the Ulleung Basin in previous study, using samples from different regions. Research data consist of intensity profile from XRD experiment and its mineral composition analysis for a total of 54 sediment samples from the Korea Plateau, located northwest of the Ulleung Basin. Because the intensity of samples in the Korea Plateau comprises 7,420 values (3.005-64.996°), differing from 3,100 values (3.01-64.99°) of samples in the Ulleung Basin, linear interpolation was used to align the input feature. Then, min-max scaler was applied to intensity profile for each sample to preserve the trend and peak ratio of the intensity. Applying the clustering model to the 54 preprocessed intensity profiles, 35 samples and 19 samples were classified into expert and machine learning groups, respectively. For machine learning group, false positive was zero among the 19 samples. This means that the clustering model can increase reliability in when mineral composition from machine learning model because unusual sample did not belong to the machine learning group. For the 35 samples in expert group, the 31 samples were classified as false negative (FN). It means that although machine learning model can properly analyze these samples, they were assigned to expert group. However, when these FN samples were analyzed using machine learning based composition analysis model, a high mean absolute error of 2.94% was observed. Therefore, it is reasonable that the samples were assigned to expert group.

Assessment of Spatiotemporal Water Quality Variation Using Multivariate Statistical Techniques: A Case Study of the Imjin River Basin, Korea (다변량 통계기법을 이용한 시·공간적 수질변화의 평가: 임진강유역에 관한 연구)

  • Cho, Yong-Chul;Lee, Su-Woong;Ryu, In-Gu;Yu, Soon-Ju
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.39 no.11
    • /
    • pp.641-649
    • /
    • 2017
  • In the study, the water quality of the Imjin River basin with pollutants of changing characteristics it was determined through statistical analysis, correlation analysis, principle component and factor analysis, and cluster analysis. Among all analyzed data points, the average water quality concentration at the Sincheon 3 site shows high levels of BOD 13.4 mg/L, COD 19.9 mg/L, T-N 11.145 mg/L, T-P 0.336 mg/L, TOC 14.2 mg/L, indicating that Sincheon basin requires intersive water quality management out of the entire drainage basin. The correlational analysis of comprehensive water quality data shows statistically significant correlation between COD, TOC, BOD, T-N water quality factors, as well as finding of high correlation between organic and nutrients. The principal component analysis show that 2 main components being extracted at 81.221% from the measuring station's entire data, while seasonal data show 3 main components being extracted at 96.241%. Factor analysis of the entire data set and the seasonal data identify BOD, COD, T-N, T-P, TOC as the common factors influencing water quality. The spatial and temporal cluster analysis showed 4 groups and 3 groups, respectively, according to seasonal characteristics and land use. By analysing the water quality factors for the Imjin River basins over an 8 year period, with consideration to the spatial and temporal characteristics, this study will become the fundamental analytic data that will help understand the future changes of water quality in the Imjin River basin.

Development of Naïve-Bayes classification and multiple linear regression model to predict agricultural reservoir storage rate based on weather forecast data (기상예보자료 기반의 농업용저수지 저수율 전망을 위한 나이브 베이즈 분류 및 다중선형 회귀모형 개발)

  • Kim, Jin Uk;Jung, Chung Gil;Lee, Ji Wan;Kim, Seong Joon
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.10
    • /
    • pp.839-852
    • /
    • 2018
  • The purpose of this study is to predict monthly agricultural reservoir storage by developing weather data-based Multiple Linear Regression Model (MLRM) with precipitation, maximum temperature, minimum temperature, average temperature, and average wind speed. Using Naïve-Bayes classification, total 1,559 nationwide reservoirs were classified into 30 clusters based on geomorphological specification (effective storage volume, irrigation area, watershed area, latitude, longitude and frequency of drought). For each cluster, the monthly MLRM was derived using 13 years (2002~2014) meteorological data by KMA (Korea Meteorological Administration) and reservoir storage rate data by KRC (Korea Rural Community). The MLRM for reservoir storage rate showed the determination coefficient ($R^2$) of 0.76, Nash-Sutcliffe efficiency (NSE) of 0.73, and root mean square error (RMSE) of 8.33% respectively. The MLRM was evaluated for 2 years (2015~2016) using 3 months weather forecast data of GloSea5 (GS5) by KMA. The Reservoir Drought Index (RDI) that was represented by present and normal year reservoir storage rate showed that the ROC (Receiver Operating Characteristics) average hit rate was 0.80 using observed data and 0.73 using GS5 data in the MLRM. Using the results of this study, future reservoir storage rates can be predicted and used as decision-making data on stable future agricultural water supply.