• Title/Summary/Keyword: Industrial Clustering

Search Result 401, Processing Time 0.034 seconds

Comparative analysis of model performance for predicting the customer of cafeteria using unstructured data

  • Seungsik Kim;Nami Gu;Jeongin Moon;Keunwook Kim;Yeongeun Hwang;Kyeongjun Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.5
    • /
    • pp.485-499
    • /
    • 2023
  • This study aimed to predict the number of meals served in a group cafeteria using machine learning methodology. Features of the menu were created through the Word2Vec methodology and clustering, and a stacking ensemble model was constructed using Random Forest, Gradient Boosting, and CatBoost as sub-models. Results showed that CatBoost had the best performance with the ensemble model showing an 8% improvement in performance. The study also found that the date variable had the greatest influence on the number of diners in a cafeteria, followed by menu characteristics and other variables. The implications of the study include the potential for machine learning methodology to improve predictive performance and reduce food waste, as well as the removal of subjective elements in menu classification. Limitations of the research include limited data cases and a weak model structure when new menus or foreign words are not included in the learning data. Future studies should aim to address these limitations.

Analysis of the genetic diversity and population structure of Lindera obtusiloba (Lauraceae), a dioecious tree in Korea

  • Ho Bang Kim;Hye-Young Lee;Mi Sun Lee;Yi Lee;Youngtae Choi;Sung-Yeol Kim;Jaeyong Choi
    • Journal of Plant Biotechnology
    • /
    • v.50
    • /
    • pp.207-214
    • /
    • 2023
  • Lindera obtusiloba (Lauraceae) is a dioecious tree that is widely distributed in the low-altitude montane forests of East Asia, including Korea. Despite its various pharmacological properties and ornamental value, the genetic diversity and population structure of this species in Korea have not been explored. In this study, we selected 6 nuclear and 6 chloroplast microsatellite markers with polymorphism or clean cross-amplification and used these markers to perform genetic diversity and population structure analyses of L. obtusiloba samples collected from 20 geographical regions. Using these 12 markers, we identified a total of 44 alleles, ranging from 1 to 8 per locus, and the average observed and expected heterozygosity values were 0.11 and 0.44, respectively. The average polymorphism information content was 0.39. Genetic relationship and population structure analyses revealed that the natural L. obtusiloba population in Korea is composed of 2 clusters, possibly due to two different plastid genotypes. The same clustering patterns have also been observed in Lindera species in mainland China and Japan.

Analysis of genetic diversity and population structure of rice cultivars from Africa, Asia, Europe, South America, and Oceania using SSR markers

  • Cheng, Yi;Cho, Young-Il;Chung, Jong-Wook;Ma, Kyung-Ho;Park, Yong-Jin
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.54 no.4
    • /
    • pp.441-451
    • /
    • 2009
  • In this study, 29 simple sequence repeat (SSR) markers were used to analyze the genetic diversity and population structure of 125 rice accessions from 40 different origins in Africa, Asia, Europe, South America, and Oceania. A total of 333 alleles were detected, with an average of 11.5 per locus. The mean values of major allele frequency, expected heterozygosity, and polymorphism information content (PIC) for each SSR locus were 0.39, 0.73, and 0.70, respectively. The highest mean PIC was 0.71 for Asia, followed by 0.66 for Africa, 0.59 for South America, 0.53 for Europe, and 0.47 for Oceania. Model-based structure analysis revealed the presence of five subpopulations, which was basically consistent with clustering based on genetic distance. Some accessions were clearly assigned to a single population in which >70% of their inferred ancestry was derived from one of the model-based populations. In addition, 12 accessions (9.6%) were categorized as having admixed ancestry. The results could be used to understanding the genetic structure of rice cultivars from these regions and to support effective breeding programs to broaden the genetic basis of rice varieties.

Automatic Detection of Foreign Body through Template Matching in Industrial CT Volume Data (산업용 CT 볼륨데이터에서 템플릿 매칭을 통한 이물질 자동 검출)

  • Ji, Hye-Rim;Hong, Helen
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.12
    • /
    • pp.1376-1384
    • /
    • 2013
  • In this paper, we propose an automaticdetection method of foreign bodies through template matching in industrial CT volume data. Our method is composed of three main steps. First,Indown-sampling data, the product region is separated from background after noise reduction and initial foreign-body candidates are extracted using mean and standard deviation of the product region. Then foreign-body candidates are extracted using K-means clustering. Second, the foreign body with different intensity of product region is detected using template matching. At this time, the template matching is performed by evaluating SSD orjoint entropy according to the size of detected foreign-body candidates. Third, to improve thedetection rate of foreign body in original volume data, final foreign bodiesare detected using percolation method. For the performance evaluation of our method, industrial CT volume data and simulation data are used. Then visual inspection and accuracy assessment are performed and processing time is measured. For accuracy assessment, density-based detection method is used as comparative method and Dice's coefficient is measured.

A Study on Defect Prediction through Real-time Monitoring of Die-Casting Process Equipment (주조공정 설비에 대한 실시간 모니터링을 통한 불량예측에 대한 연구)

  • Chulsoon Park;Heungseob Kim
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.157-166
    • /
    • 2022
  • In the case of a die-casting process, defects that are difficult to confirm by visual inspection, such as shrinkage bubbles, may occur due to an error in maintaining a vacuum state. Since these casting defects are discovered during post-processing operations such as heat treatment or finishing work, they cannot be taken in advance at the casting time, which can cause a large number of defects. In this study, we propose an approach that can predict the occurrence of casting defects by defect type using machine learning technology based on casting parameter data collected from equipment in the die casting process in real time. Die-casting parameter data can basically be collected through the casting equipment controller. In order to perform classification analysis for predicting defects by defect type, labeling of casting parameters must be performed. In this study, first, the defective data set is separated by performing the primary clustering based on the total defect rate obtained during the post-processing. Second, the secondary cluster analysis is performed using the defect rate by type for the separated defect data set, and the labeling task is performed by defect type using the cluster analysis result. Finally, a classification learning model is created by collecting the entire labeled data set, and a real-time monitoring system for defect prediction using LabView and Python was implemented. When a defect is predicted, notification is performed so that the operator can cope with it, such as displaying on the monitoring screen and alarm notification.

Effects of Acidification on the Changes of Microbial Diversity in Aquatic Microcosms

  • Young-Beom Ahn;Hong-Bum Cho;Byung Re Min;Yong-Keel Choi
    • Animal cells and systems
    • /
    • v.3 no.2
    • /
    • pp.153-159
    • /
    • 1999
  • In an artificial pH-gradient batch culture system, the effects of acidification on the species composition of a heterotrophic bacterial community were analyzed. As a result of this study, it was found that total bacteria numbers were not affected by acidification and that the population of hetero-trophic bacteria decreased as pH became lower. The heterotrophic bacteria isolated from the entire pH gradient were 12 genera and 22 species. Among them, 64% were gram negative and 36% were gram positive bacteria. As pH decreased, the distribution rate of gram negative bacteria increased while that of gram positive bacteria decreased. The diversity of genera decreased from 13 to 5 as pH decreased from 7 to 3. The G+C content of all of the 202 isolated strains varied from 22.8 to 77.0%, and increased in interspecies of same genus as pH decreased. As a result of clustering analysis, the diversity index of species ranged from 1.13 to 2.37, and it had lower indices as pH decreased. In order to evaluate the diversity of numbers of sample of different size, a rarefaction method was used to analyze the expected number of species appearance according to pH. The statistical significance of species diversity was verified by the fact that the number decreased at lower pH.

  • PDF

Visual Exploration based Approach for Extracting the Interesting Association Rules (유용한 연관 규칙 추출을 위한 시각적 탐색 기반 접근법)

  • Kim, Jun-Woo;Kang, Hyun-Kyung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.9
    • /
    • pp.177-187
    • /
    • 2013
  • Association rule mining is a popular data mining technique with a wide range of application domains, and aims to extract the cause-and-effect relations between the discrete items included in transaction data. However, analysts sometimes have trouble in interpreting and using the plethora of association rules extracted from a large amount of data. To address this problem, this paper aims to propose a novel approach called HTM for extracting the interesting association rules from given transaction data. The HTM approach consists of three main steps, hierarchical clustering, table-view, and mosaic plot, and each step provides the analysts with appropriate visual representation. For illustration, we applied our approach for analyzing the mass health examination data, and the result of this experiment reveals that the HTM approach help the analysts to find the interesting association rules in more effective way.

Genetic characterization of microsporidians infecting Indian non-mulberry silkworms (Antheraea assamensis and Samia cynthia ricini) by using PCR based ISSR and RAPD markers assay

  • Hassan, Wazid;Nath, B. Surendra
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.30 no.1
    • /
    • pp.6-16
    • /
    • 2015
  • This study established the genetic characterisation of 10 microsporidian isolates infecting non-mulberry silkworms (Antheraea assamensis and Samia cynthia ricini) collected from biogeographical forest locations in the State of Assam, India, using PCR-based markers assays: inter simple sequence repeat (ISSR) and random amplified polymorphic DNA (RAPD). A Nosema type species (NIK-1s_mys) was used as control for comparison. The shape of mature microsporidian spores were observed oval to elongated, measuring 3.80 to $4.90{\mu}m$ in length and 2.60 to $3.05{\mu}m$ in width. Fourteen ISSR primers generated reproducible profiles and yielded 178 fragments, of which 175 were polymorphic (98%), while 16 RAPD primers generated reproducible profiles with 198 amplified fragments displaying 95% of polymorphism. Estimation of genetic distance coefficients based on dice coefficients method and clustering with un-weighted pair group method using arithmetic average (UPGMA) analysis was done to unravel the genetic diversity of microsporidians infecting Indian muga and eri silkworm. The similarity coefficients varied from 0.385 to 0.941 in ISSR and 0.083 to 0.938 in RAPD data. UPGMA analysis generated dendrograms with two microsporidian groups, which appear to be different from each other. Based on Euclidean distance matrix method, 2-dimensional distribution also revealed considerable variability among different identified microsporidians. Clustering of these microsporidian isolates was in accordance with their host and biogeographic origin. Both techniques represent a useful and efficient tool for taxonomical grouping as well as for phylogenetic classification of different microsporidians in general and genotyping of these pathogens in particular.

Inappropriate Survey Design Analysis of the Korean National Health and Nutrition Examination Survey May Produce Biased Results

  • Kim, Yangho;Park, Sunmin;Kim, Nam-Soo;Lee, Byung-Kook
    • Journal of Preventive Medicine and Public Health
    • /
    • v.46 no.2
    • /
    • pp.96-104
    • /
    • 2013
  • Objectives: The inherent nature of the Korean National Health and Nutrition Examination Survey (KNHANES) design requires special analysis by incorporating sample weights, stratification, and clustering not used in ordinary statistical procedures. Methods: This study investigated the proportion of research papers that have used an appropriate statistical methodology out of the research papers analyzing the KNHANES cited in the PubMed online system from 2007 to 2012. We also compared differences in mean and regression estimates between the ordinary statistical data analyses without sampling weight and design-based data analyses using the KNHANES 2008 to 2010. Results: Of the 247 research articles cited in PubMed, only 19.8% of all articles used survey design analysis, compared with 80.2% of articles that used ordinary statistical analysis, treating KNHANES data as if it were collected using a simple random sampling method. Means and standard errors differed between the ordinary statistical data analyses and design-based analyses, and the standard errors in the design-based analyses tended to be larger than those in the ordinary statistical data analyses. Conclusions: Ignoring complex survey design can result in biased estimates and overstated significance levels. Sample weights, stratification, and clustering of the design must be incorporated into analyses to ensure the development of appropriate estimates and standard errors of these estimates.

Anomaly Detection Analysis using Repository based on Inverted Index (역방향 인덱스 기반의 저장소를 이용한 이상 탐지 분석)

  • Park, Jumi;Cho, Weduke;Kim, Kangseok
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.294-302
    • /
    • 2018
  • With the emergence of the new service industry due to the development of information and communication technology, cyber space risks such as personal information infringement and industrial confidentiality leakage have diversified, and the security problem has emerged as a critical issue. In this paper, we propose a behavior-based anomaly detection method that is suitable for real-time and large-volume data analysis technology. We show that the proposed detection method is superior to existing signature security countermeasures that are based on large-capacity user log data according to in-company personal information abuse and internal information leakage. As the proposed behavior-based anomaly detection method requires a technique for processing large amounts of data, a real-time search engine is used, called Elasticsearch, which is based on an inverted index. In addition, statistical based frequency analysis and preprocessing were performed for data analysis, and the DBSCAN algorithm, which is a density based clustering method, was applied to classify abnormal data with an example for easy analysis through visualization. Unlike the existing anomaly detection system, the proposed behavior-based anomaly detection technique is promising as it enables anomaly detection analysis without the need to set the threshold value separately, and was proposed from a statistical perspective.