• Title/Summary/Keyword: statistical clustering method

Search Result 231, Processing Time 0.024 seconds

Association Rules Analysis of Safe Accidents Caused by Falling Objects (낙하물에 기인한 안전사고의 연관규칙 분석)

  • Son, Ki-Young;Ryu, Han-Guk
    • Journal of the Korea Institute of Building Construction
    • /
    • v.19 no.4
    • /
    • pp.341-350
    • /
    • 2019
  • Construction industry is one of the most dangerous industry. As the construction accidents occur due to the repeated factors found in each accidents, there is a limitation in analyzing all types of occupational accidents by the existing descriptive analysis and statistical test. In this study, we classified safety accidents caused by falling objects among the accident types occurring at construction sites into fatal and nonfatal accidents and deduced the factors. In addition, we deduced the association rules among the safety accidents factors caused by falling objects through the association rule analysis method among the machine learning techniques. Therefore, considering the association rules for fatal and nonfatal accidents proposed in this study, it would be possible to prevent accidents by searching for countermeasures against safety accidents caused by falling objects.

Comparing MCMC algorithms for the horseshoe prior (Horseshoe 사전분포에 대한 MCMC 알고리듬 비교 연구)

  • Miru Ma;Mingi Kang;Kyoungjae Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.103-118
    • /
    • 2024
  • The horseshoe prior is notably one of the most popular priors in sparse regression models, where only a small fraction of coefficients are nonzero. The parameter space of the horseshoe prior is much smaller than that of the spike and slab prior, so it enables us to efficiently explore the parameter space even in high-dimensions. However, on the other hand, the horseshoe prior has a high computational cost for each iteration in the Gibbs sampler. To overcome this issue, various MCMC algorithms for the horseshoe prior have been proposed to reduce the computational burden. Especially, Johndrow et al. (2020) recently proposes an approximate algorithm that can significantly improve the mixing and speed of the MCMC algorithm. In this paper, we compare (1) the traditional MCMC algorithm, (2) the approximate MCMC algorithm proposed by Johndrow et al. (2020) and (3) its variant in terms of computing times, estimation and variable selection performance. For the variable selection, we adopt the sequential clustering-based method suggested by Li and Pati (2017). Practical performances of the MCMC methods are demonstrated via numerical studies.

Quantitative Comparison of Cinnamomi Cortex and Various Cinnamon Barks using HPLC Analysis (육계 및 기원종별 계피의 지표성분 함량 비교)

  • Han-Young Kim;Jung-Hoon Kim
    • The Korea Journal of Herbology
    • /
    • v.39 no.3
    • /
    • pp.23-35
    • /
    • 2024
  • Objective : In this study, we performed quantitative comparison on the content of 10 marker compounds in cinnamon barks from different species and found chemical discrimination between genuine Cinnamomum cassia and other Cinnamomum species (Non C. cassia). Methods : Cinnamon bark samples were extracted using the ultrasonication in 100% methanol for 30 minutes. The samples were analysed using high-performance liquid chromatography with statistical analysis. Results : The analytical method developed in this study met all validation criteria and was applied to the quantification of the 10 marker compounds in cinnamon bark samples. The major chemical discrimination of C. cassia were identified as low content of epicatechin and eugenol, and high contents of benzaldehyde, cinnamaldehyde and cinnamic acid compared to other Non C. cassia samples. Especially, among other compounds, the content of cinnamaldehyde was the highest in the C. cassia and Non C. cassia samples. The result of principal component analysis showed that the samples of C. cassia and Non C. cassia were clearly differentiated via benzaldehyde, cinnamaldehyde, cinnamic acid, eugenol, and epicatechin, which influenced on clustering C. cassia and Non C. cassia samples. Conclusion : C. cassia and Non C. cassia samples were chemically discriminated using the quantitative HPLC analysis. Based on this, it is possible to control the quality of herbal medicines containing Cinnamomi Cortex. It is necessary to further improve the accuracy of discrimination between C. cassia and Non C. cassia species to evaluate cinnamon bark quality.

Development of the Approximate Cost Estimating Model Using Statistical Inference for PSC Box Girder Bridge Constructed by the Incremental Launching Method (통계적 기법을 활용한 ILM압출공법 교량 상부공사 개략공사비 산정모델 개발 연구)

  • Kim, Sang-Bum;Cho, Ji-Hoon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.33 no.2
    • /
    • pp.781-790
    • /
    • 2013
  • This research focuses on development of the conceptual cost estimation models for I.L.M box girder bridge. The current conceptual cost estimation for public construction projects is dependent on governmental average unit price references which has been regarded as inaccurate and unreliable by many experts. Therefore, there have been strong demands for developing a better way of conceptual cost estimating methods. This research has proposed three different conceptual cost estimating method for a P.S.C. girder bridge built with the I.L.M method. Model (I) attempts to seek the proper breakdown of standard works that are accountable for more than 95 percentage in total cost and calculates the amount of standard work's materials from the standard section and volume of I.L.M box girder bridge. Model (II) utilizes a correlation analysis (coefficient over 0.6 or more) between breakdown of standard works and input data that would be considered available information in preliminary design phase. Model(III) obtains conceptual estimating through multiple-regression analysis between the breakdown of standard works and all of input data related to them. In order to validate the clustering of coverage in the preliminary design phase, the variation of I.L.M cost coverage from multiple-regression analysis[model(III)] has been investigated which result in between -3.76% and 11.79%, comparing with AACE(Association for the Advancement of Cost Engineering) which informs its variation between -5% and +15% in the design phase. The model proposed from this research are envisioned to be improved to a great distinct if reliable cost date for P.S.C. girder bridges can be continually collected with reasonable accuracies.

Cluster exploration of water pipe leak and complaints surveillance using a spatio-temporal statistical analysis (스캔통계량 분석을 통한 상수도 누수 및 수질 민원 발생 클러스터 탐색)

  • Juwon Lee;Eunju Kim;Sookhyun Nam;Tae-Mun Hwang
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.37 no.5
    • /
    • pp.261-269
    • /
    • 2023
  • In light of recent social concerns related to issues such as water supply pipe deterioration leading to problems like leaks and degraded water quality, the significance of maintenance efforts to enhance water source quality and ensure a stable water supply has grown substantially. In this study, scan statistic was applied to analyze water quality complaints and water leakage accidents from 2015 to 2021 to present a reasonable method to identify areas requiring improvement in water management. SaTScan, a spatio-temporal statistical analysis program, and ArcGIS were used for spatial information analysis, and clusters with high relative risk (RR) were determined using the maximum log-likelihood ratio, relative risk, and Monte Carlo hypothesis test for I city, the target area. Specifically, in the case of water quality complaints, the analysis results were compared by distinguishing cases occurring before and after the onset of "red water." The period between 2015 and 2019 revealed that preceding the occurrence of red water, the leak cluster at location L2 posed a significantly higher risk (RR: 2.45) than other regions. As for water quality complaints, cluster C2 exhibited a notably elevated RR (RR: 2.21) and appeared concentrated in areas D and S, respectively. On the other hand, post-red water incidents of water quality complaints were predominantly concentrated in area S. The analysis found that the locations of complaint clusters were similar to those of red water incidents. Of these, cluster C7 exhibited a substantial RR of 4.58, signifying more than a twofold increase compared to pre-incident levels. A kernel density map analysis was performed using GIS to identify priority areas for waterworks management based on the central location of clusters and complaint cluster RR data.

Selecting Climate Change Scenarios Reflecting Uncertainties (불확실성을 고려한 기후변화 시나리오의 선정)

  • Lee, Jae-Kyoung;Kim, Young-Oh
    • Atmosphere
    • /
    • v.22 no.2
    • /
    • pp.149-161
    • /
    • 2012
  • Going by the research results of the past, of all the uncertainties resulting from the research on climate change, the uncertainty caused by the climate change scenario has the highest degree of uncertainty. Therefore, depending upon what kind of climate change scenario one adopts, the projection of the water resources in the future will differ significantly. As a matter of principle, it is highly recommended to utilize all the GCM scenarios offered by the IPCC. However, this could be considered to be an impractical alternative if a decision has to be made at an action officer's level. Hence, as an alternative, it is deemed necessary to select several scenarios so as to express the possible number of cases to the maximum extent possible. The objective standards in selecting the climate change scenarios have not been properly established and the scenarios have been selected, either at random or subject to the researcher's discretion. In this research, a new scenario selection process, in which it is possible to have the effect of having utilized all the possible scenarios, with using only a few principal scenarios and maintaining some of the uncertainties, has been suggested. In this research, the use of cluster analysis and the selection of a representative scenario in each cluster have efficiently reduced the number of climate change scenarios. In the cluster analysis method, the K-means clustering method, which takes advantage of the statistical features of scenarios has been employed; in the selection of a representative scenario in each cluster, the selection method was analyzed and reviewed and the PDF method was used to select the best scenarios with the closest simulation accuracy and the principal scenarios that is suggested by this research. In the selection of the best scenarios, it has been shown that the GCM scenario which demonstrated high level of simulation accuracy in the past need not necessarily demonstrate the similarly high level of simulation accuracy in the future and various GCM scenarios were selected for the principal scenarios. Secondly, the "Maximum entropy" which can quantify the uncertainties of the climate change scenario has been used to both quantify and compare the uncertainties associated with all the scenarios, best scenarios and the principal scenarios. Comparison has shown that the principal scenarios do maintain and are able to better explain the uncertainties of all the scenarios than the best scenarios. Therefore, through the scenario selection process, it has been proven that the principal scenarios have the effect of having utilized all the scenarios and retaining the uncertainties associated with the climate change to the maximum extent possible, while reducing the number of scenarios at the same time. Lastly, the climate change scenario most suitable for the climate on the Korean peninsula has been suggested. Through the scenario selection process, of all the scenarios found in the 4th IPCC report, principal climate change scenarios, which are suitable for the Korean peninsula and maintain most of the uncertainties, have been suggested. Therefore, it is assessed that the use of the scenario most suitable for the future projection of water resources on the Korean peninsula will be able to provide the projection of the water resources management that maintains more than 70~80% level of uncertainties of all the scenarios.

A Big Data Based Random Motif Frequency Method for Analyzing Human Proteins (인간 단백질 분석을 위한 빅 데이타 기반 RMF 방법)

  • Kim, Eun-Mi;Jeong, Jong-Cheol;Lee, Bae-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1397-1404
    • /
    • 2018
  • Due to the technical difficulties and high cost for obtaining 3-dimensional structure data, sequence-based approaches in proteins have not been widely acknowledged. A motif can be defined as any segments in protein or gene sequences. With this simplicity, motifs have been actively and widely used in various areas. However, the motif itself has not been studied comprehensively. The value of this study can be categorized in three fields in order to analyze the human proteins using artificial intelligence method: (1) Based on our best knowledge, this research is the first comprehensive motif analysis by analyzing motifs with all human proteins in Protein Data Bank (PDB) associated with the database of Enzyme Commission (EC) number and Structural Classification of Proteins (SCOP). (2) We deeply analyze the motif in three different categories: pattern, statistical, and functional analysis of clusters. (3) At the last and most importantly, we proposed random motif frequency(RMF) matric that can efficiently distinct the characteristics of proteins by identifying interface residues from non-interface residues and clustering protein functions based on big data while varying the size of random motif.

A study on the estimation of AADT by short-term traffic volume survey (단기조사 교통량을 이용한 AADT 추정연구)

  • 이승재;백남철;권희정
    • Journal of Korean Society of Transportation
    • /
    • v.20 no.6
    • /
    • pp.59-68
    • /
    • 2002
  • AADT(Annual Average Daily Traffic) can be obtained by using short-term counted traffic data rather than using traffic data collected for 365 days. The process is a very important in estimating AADT using short-term traffic count data. Therefore, There have been many studies about estimating AADT. In this Paper, we tried to improve the process of the AADT estimation based on the former AADT estimation researches. Firstly, we found the factor showing differences among groups. To do so, we examined hourly variables(divided to total hours, weekday hours. Saturday hours, Sunday hours, weekday and Sunday hours, and weekday and Saturday hours) every time changing the number of groups. After all, we selected the hourly variables of Sunday and weekday as the factor showing differences among groups. Secondly, we classified 200 locations into 10 groups through cluster analysis using only monthly variables. The nile of deciding the number of groups is maximizing deviation among hourly variables of each group. Thirdly, we classified 200 locations which had been used in the second step into the 10 groups by applying statistical techniques such as Discriminant analysis and Neural network. This step is for testing the rate of distinguish between the right group including each location and a wrong one. In conclusion, the result of this study's method was closer to real AADT value than that of the former method. and this study significantly contributes to improve the method of AADT estimation.

Discrimination of the drinking water taste by potentiometric electronic tongue and multivariate analysis (전자혀 및 다변량 분석법을 활용한 먹는물의 구별 방법)

  • Eunju Kim;Tae-Mun Hwang;Jae-Wuk Koo;Jaeyong Song;Hongkyeong Park;Sookhyun Nam
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.37 no.6
    • /
    • pp.425-435
    • /
    • 2023
  • Organoleptic parameters such as color, odor, and flavor influence consumer perception of drinking water quality. This study aims to evaluate the taste of the selected bottled and tap water samples using an electronic tongue (E-tongue) instead of a sensory test. Bottled and tap water's mineral components are related to the overall preference for water taste. Contrary to the sensory test, the potentiometric E-tongue method presented in this study distinguishes taste by measuring the mineral components in water, and the data obtained can be statistically analyzed. Eleven bottled water products from various brands and one tap water from I city in Korea were evaluated. The E-tongue data were statistically analyzed using multivariate statistical tools such as hierarchical clustering analysis (HCA), principal component analysis (PCA), and partial least squares discriminant analysis (PLS-DA). The results show that the E-tongue method can clearly distinguish taste discrimination in drinking water differing in water quality based on the ion-related water quality parameters. The water quality parameters that affect taste discrimination were found to be total dissolved solids (TDS), sodium (Na+), calcium (Ca2+), magnesium (Mg2+), sulfate (SO42-), chloride (Cl-), potassium (K+) and pH. The distance calculation of HCA was used to quantify the differences between 12 different types of drinking water. The proposed E-tongue method is a practical tool to quantitatively evaluate the differences between samples in water quality items related to the ionic components. It can be helpful in quality control of drinking water.

A study on solar radiation prediction using medium-range weather forecasts (중기예보를 이용한 태양광 일사량 예측 연구)

  • Sujin Park;Hyojeoung Kim;Sahm Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.1
    • /
    • pp.49-62
    • /
    • 2023
  • Solar energy, which is rapidly increasing in proportion, is being continuously developed and invested. As the installation of new and renewable energy policy green new deal and home solar panels increases, the supply of solar energy in Korea is gradually expanding, and research on accurate demand prediction of power generation is actively underway. In addition, the importance of solar radiation prediction was identified in that solar radiation prediction is acting as a factor that most influences power generation demand prediction. In addition, this study can confirm the biggest difference in that it attempted to predict solar radiation using medium-term forecast weather data not used in previous studies. In this paper, we combined the multi-linear regression model, KNN, random fores, and SVR model and the clustering technique, K-means, to predict solar radiation by hour, by calculating the probability density function for each cluster. Before using medium-term forecast data, mean absolute error (MAE) and root mean squared error (RMSE) were used as indicators to compare model prediction results. The data were converted into daily data according to the medium-term forecast data format from March 1, 2017 to February 28, 2022. As a result of comparing the predictive performance of the model, the method showed the best performance by predicting daily solar radiation with random forest, classifying dates with similar climate factors, and calculating the probability density function of solar radiation by cluster. In addition, when the prediction results were checked after fitting the model to the medium-term forecast data using this methodology, it was confirmed that the prediction error increased by date. This seems to be due to a prediction error in the mid-term forecast weather data. In future studies, among the weather factors that can be used in the mid-term forecast data, studies that add exogenous variables such as precipitation or apply time series clustering techniques should be conducted.