• Title/Summary/Keyword: Group Model Clustering

Search Result 99, Processing Time 0.031 seconds

Clustering Analysis of Science and Engineering College Students' understanding on Probability and Statistics (Robust PCA를 활용한 이공계 대학생의 확률 및 통계 개념 이해도 분석)

  • Yoo, Yongseok
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.3
    • /
    • pp.252-258
    • /
    • 2022
  • In this study, we propose a method for analyzing students' understanding of probability and statistics in small lectures at universities. A computer-based test for probability and statistics was performed on 95 science and engineering college students. After dividing the students' responses into 7 clusters using the Robust PCA and the Gaussian mixture model, the achievement of each subject was analyzed for each cluster. High-ranking clusters generally showed high achievement on most topics except for statistical estimation, and low-achieving clusters showed strengths and weaknesses on different topics. Compared to the widely used PCA-based dimension reduction followed by clustering analysis, the proposed method showed each group's characteristics more clearly. The characteristics of each cluster can be used to develop an individualized learning strategy.

Comparative analysis of model performance for predicting the customer of cafeteria using unstructured data

  • Seungsik Kim;Nami Gu;Jeongin Moon;Keunwook Kim;Yeongeun Hwang;Kyeongjun Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.5
    • /
    • pp.485-499
    • /
    • 2023
  • This study aimed to predict the number of meals served in a group cafeteria using machine learning methodology. Features of the menu were created through the Word2Vec methodology and clustering, and a stacking ensemble model was constructed using Random Forest, Gradient Boosting, and CatBoost as sub-models. Results showed that CatBoost had the best performance with the ensemble model showing an 8% improvement in performance. The study also found that the date variable had the greatest influence on the number of diners in a cafeteria, followed by menu characteristics and other variables. The implications of the study include the potential for machine learning methodology to improve predictive performance and reduce food waste, as well as the removal of subjective elements in menu classification. Limitations of the research include limited data cases and a weak model structure when new menus or foreign words are not included in the learning data. Future studies should aim to address these limitations.

A Study on Anomaly Detection based on User's Command Analysis (사용자 명령어 분석을 통한 비정상 행위 판정에 관한 연구)

  • 윤정혁;오상현;이원석
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.59-71
    • /
    • 2000
  • Due to the advance of computer and communication technology, intrusions or crimes using a computer have been increased rapidly while various information has been provided to users conveniently. As a results, many studies are necessary to detect the activities of intruders effectively. In this paper, a new association algorithm for the anomaly detection model is proposed in the process of generating user\`s normal patterns. It is that more recently observed behavior get more affection on the process of data mining. In addition, by clustering generated normal patterns for each use or a group of similar users, it is possible to identify the usual frequency of programs or command usage for each user or a group of uses. The performance of the proposed anomaly detection system has been tested on various system Parameters in order to identify their practical ranges for maximizing its detection rate.

Analysis of the Genetic Diversity and Population Structure of Amaranth Accessions from South America Using 14 SSR Markers

  • Oo, Win Htet;Park, Yong-Jin
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.58 no.4
    • /
    • pp.336-346
    • /
    • 2013
  • Amaranth (Amaranthus sp. L.) is an important group of plants that includes grain, vegetable, and ornamental types. Centers of diversity for Amaranths are Central and South America, India, and South East Asia, with secondary centers of diversity in West and East Africa. The present study was performed to determine the genetic diversity and population structure of 75 amaranth accessions: 65 from South America and 10 from South Asia as controls using 14 SSR markers. Ninety-nine alleles were detected at an average of seven alleles per SSR locus. Model-based structure analysis revealed the presence of two subpopulations and 3 admixtures, which was consistent with clustering based on the genetic distance. The average major allele frequency and polymorphic information content (PIC) were 0.42 and 0.39, respectively. According to the model-based structure analysis based on genetic distance, 75 accessions (96%) were classified into two clusters, and only three accessions (4%) were admixtures. Cluster 1 had a higher allele number and PIC values than Cluster 2. Model-based structure analysis revealed the presence of two subpopulations and three admixtures in the 75 accessions. The results of this study provide effective information for future germplasm conservation and improvement programs in Amaranthus.

Object Classification based on Weakly Supervised E2LSH and Saliency map Weighting

  • Zhao, Yongwei;Li, Bicheng;Liu, Xin;Ke, Shengcai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.1
    • /
    • pp.364-380
    • /
    • 2016
  • The most popular approach in object classification is based on the bag of visual-words model, which has several fundamental problems that restricting the performance of this method, such as low time efficiency, the synonym and polysemy of visual words, and the lack of spatial information between visual words. In view of this, an object classification based on weakly supervised E2LSH and saliency map weighting is proposed. Firstly, E2LSH (Exact Euclidean Locality Sensitive Hashing) is employed to generate a group of weakly randomized visual dictionary by clustering SIFT features of the training dataset, and the selecting process of hash functions is effectively supervised inspired by the random forest ideas to reduce the randomcity of E2LSH. Secondly, graph-based visual saliency (GBVS) algorithm is applied to detect the saliency map of different images and weight the visual words according to the saliency prior. Finally, saliency map weighted visual language model is carried out to accomplish object classification. Experimental results datasets of Pascal 2007 and Caltech-256 indicate that the distinguishability of objects is effectively improved and our method is superior to the state-of-the-art object classification methods.

A Study on Evaluating the Efficiency of the Photonics Industry in Gwangju Using a DEA Model (DEA 모형을 활용한 광주 광산업체 효율성 평가에 관한 연구)

  • Cho, Geon;Jung, Kyung-Ho
    • Journal of Korean Society for Quality Management
    • /
    • v.39 no.2
    • /
    • pp.244-255
    • /
    • 2011
  • In this study, we try to evaluate the efficiency of the photonics industry using a data envelopment analysis(DEA) model. We first develope four stage procedures for selecting proper input and output variables which consist of selecting the first candidate variables from literature survey, selecting the second candidate variables through experts' discussion, measuring the partial efficiency of the selected variables based on Tofallis' profiling, and clustering some variables through the rank correlation analysis of partial efficiency proposed by Min and Kim(l998). With this procedure, we select 4 input variables(capital, number of employee, R&D cost, operating cost) and 2 output variables(sales, growth of sales) and then utilize CCR and BCC model to measure efficiencies of 26 photonics companies in Gwangju. Moreover, we perform the reference group analysis to figure out what causes inefficiencies and to provide the desirable values for input and output variables at which inefficient photonics companies become efficient. Finally, we classify 26 photonics companies into three groups such as optical communications, optical applications, and optical sources, and perform the Kruskal-Wallis test to check if there exist some differences between efficiencies of three groups.

Anatomical Brain Connectivity Map of Korean Children (한국 아동 집단의 구조 뇌연결지도)

  • Um, Min-Hee;Park, Bum-Hee;Park, Hae-Jeong
    • Investigative Magnetic Resonance Imaging
    • /
    • v.15 no.2
    • /
    • pp.110-122
    • /
    • 2011
  • Purpose : The purpose of this study is to establish the method generating human brain anatomical connectivity from Korean children and evaluating the network topological properties using small-world network analysis. Materials and Methods : Using diffusion tensor images (DTI) and parcellation maps of structural MRIs acquired from twelve healthy Korean children, we generated a brain structural connectivity matrix for individual. We applied one sample t-test to the connectivity maps to derive a representative anatomical connectivity for the group. By spatially normalizing the white matter bundles of participants into a template standard space, we obtained the anatomical brain network model. Network properties including clustering coefficient, characteristic path length, and global/local efficiency were also calculated. Results : We found that the structural connectivity of Korean children group preserves the small-world properties. The anatomical connectivity map obtained in this study showed that children group had higher intra-hemispheric connectivity than inter-hemispheric connectivity. We also observed that the neural connectivity of the group is high between brain stem and motorsensory areas. Conclusion : We suggested a method to examine the anatomical brain network of Korean children group. The proposed method can be used to evaluate the efficiency of anatomical brain networks in people with disease.

Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence (수질자료의 특성을 고려한 앙상블 머신러닝 모형 구축 및 설명가능한 인공지능을 이용한 모형결과 해석에 대한 연구)

  • Park, Jungsu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.36 no.4
    • /
    • pp.239-248
    • /
    • 2022
  • The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.

A Method of Patch Merging for Atlas Construction in 3DoF+ Video Coding

  • Im, Sung-Gyune;Kim, Hyun-Ho;Lee, Gwangsoon;Kim, Jae-Gon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.11a
    • /
    • pp.259-260
    • /
    • 2019
  • MPEG-I Visual group is actively working on enhancing immersive experiences with up to six degree of freedom (6DoF). In virtual space of 3DoF+, which is defined as an extension of 360 video with limited changes of the view position in a sitting position, looking at the scene from another viewpoint (another position in space) requires rendering additional viewpoints using multiple videos taken at the different locations at the same time. In the MPEG-I Visual workgroup, methods of efficient coding and transmission of 3DoF+ video are being studied, and they released Test Model for Immersive Media (TMIV) recently. This paper presents the enhanced clustering method which can pack the patches into atlas efficiently in TMIV. The experimental results show that the proposed method achieves significant BD-rate reduction in terms of various end-to-end evaluation methods.

  • PDF

Design of Multi-FPNN Model Using Clustering and Genetic Algorithms and Its Application to Nonlinear Process Systems (HCM 클러스처링과 유전자 알고리즘을 이용한 다중 FPNN 모델 설계와 비선형 공정으로의 응용)

  • 박호성;오성권;안태천
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.10 no.4
    • /
    • pp.343-350
    • /
    • 2000
  • In this paper, we propose the Multi-FPNN(Fuzzy Polynomial Neural Networks) model based on FNN and PNN(Polyomial Neural Networks) for optimal system identifacation. Here FNN structure is designed using fuzzy input space divided by each separated input variable, and urilized both in order to get better output performace. Each node of PNN structure based on GMDH(Group Method of Data handing) method uses two types of high-order polynomials such as linearane and quadratic, and the input of that node uses three kinds of multi-variable inputs such as linear and quadratic, and the input of that node and Genetic Algorithms(GAs) to identify both the structure and the prepocessing of parameters of a Multi-FPNN model. Here, HCM clustering method, which is carried out for data preproessing of process system, is utilized to determine the structure method, which is carried out for data preprocessing of process system, is utilized to determance index with a weighting factor is used to according to the divisions of input-output space. A aggregate performance inddex with a wegihting factor is used to achieve a sound balance between approximation and generalization abilities of the model. According to the selection and adjustment of a weighting factor of this aggregate abjective function which it is acailable and effective to design to design and optimal Multi-FPNN model. The study is illustrated with the aid of two representative numerical examples and the aggregate performance index related to the approximation and generalization abilities of the model is evaluated and discussed.

  • PDF