• Title/Summary/Keyword: Comparison mining

Search Result 285, Processing Time 0.031 seconds

A Comparison of Related Variables According to Children's Stress Types Using the Data Mining Method (데이터마이닝 기법을 활용한 아동의 스트레스 유형별 관련변수 비교)

  • Lee, Hye-Joo;Jung, Eui-Hyun
    • Korean Journal of Child Studies
    • /
    • v.33 no.2
    • /
    • pp.111-127
    • /
    • 2012
  • This study compared a number of related variables according to children's stress types using the data mining method. The sample population was taken from the Korean Youth Panel Survey (KYPS) data (2688, sixth-grade elementary students). The results of the decision tree model revealed that : (1) Parental expectations in terms of study, life satisfaction, self-esteem, parental attachment, aggression, the spousal relationship, other cognition (one's own misdeeds), and study related worries were all related to parent stress. (2) Life satisfaction, study related worries, admitting one's own misdeeds, gender, other cognition (one's own misdeeds), aggression, the spousal relationship, and a sense of alienation in the school were all related to appearance stress. (3) Study related worries, parental expectations in terms of study, aggression, life satisfaction, self-esteem, parental attachment, satisfying parental expectations, parental attachment, and teacher attachment were all related to academic stress. (4) A sense of alienation in the school, mixing with peers in the school, aggression, self-esteem, other cognition (one's own misdeeds), study related worries, parental abuse, and life satisfaction were all significantly related to friend stress. These results suggested that children's diverse conditions should be considered according to the stress types if we are to understand and cope with these stress types more efficiently.

An Efficient Algorithm For Mining Association Rules In Main Memory Systems (대용량 주기억장치 시스템에서 효율적인 연관 규칙 탐사 알고리즘)

  • Lee, Jae-Mun
    • The KIPS Transactions:PartD
    • /
    • v.9D no.4
    • /
    • pp.579-586
    • /
    • 2002
  • This paper propose an efficient algorithm for mining association rules in the large main memory systems. To do this, the paper attempts firstly to extend the conventional algorithms such as DHP and Partition in order to be compatible to the large main memory systems and proposes secondly an algorithm to improve Partition algorithm by applying the techniques of the hash table and the bit map. The proposed algorithm is compared to the extended DHP within the experimental environments and the results show up to 65% performance improvement in comparison to the expanded DHP.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

Obesity Level Prediction Based on Data Mining Techniques

  • Alqahtani, Asma;Albuainin, Fatima;Alrayes, Rana;Al muhanna, Noura;Alyahyan, Eyman;Aldahasi, Ezaz
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.103-111
    • /
    • 2021
  • Obesity affects individuals of all gender and ages worldwide; consequently, several studies have performed great works to define factors causing it. This study develops an effective method to trace obesity levels based on supervised data mining techniques such as Random Forest and Multi-Layer Perception (MLP), so as to tackle this universal epidemic. Notably, the dataset was from countries like Mexico, Peru, and Colombia in the 14- 61year age group, with varying eating habits and physical conditions. The data includes 2111 instances and 17 attributes labelled using NObesity, which facilitates categorization of data using Overweight Levels l I and II, Insufficient Weight, Normal Weight, as well as Obesity Type I to III. This study found that the highest accuracy was achieved by Random Forest algorithm in comparison to the MLP algorithm, with an overall classification rate of 96.7%.

Comparison between Planned and Actual Data of Block Assembly Process using Process Mining in Shipyards (조선 산업에서 프로세스 마이닝을 이용한 블록 조립 프로세스의 계획 및 실적 비교 분석)

  • Lee, Dongha;Park, Jae Hun;Bae, Hyerim
    • The Journal of Society for e-Business Studies
    • /
    • v.18 no.4
    • /
    • pp.145-167
    • /
    • 2013
  • This paper proposes a method to compare planned processes with actual processes of bock assembly operations in shipbuilding industry. Process models can be discovered using the process mining techniques both for planned and actual log data. The comparison between planned and actual process is focused in this paper. The analysis procedure consists of five steps : 1) data pre-processing, 2) definition of analysis level, 3) clustering of assembly bocks, 4) discovery of process model per cluster, and 5) comparison between planned and actual processes per cluster. In step 5, it is proposed to compare those processes by the several perspectives such as process model, task, process instance and fitness. For each perspective, we also defined comparison factors. Especially, in the fitness perspective, cross fitness is proposed and analyzed by the quantity of fitness between the discovered process model by own data and the other data(for example, the fitness of planned model to actual data, and the fitness of actual model to planned data). The effectiveness of the proposed methods was verified in a case study using planned data of block assembly planning system (BAPS) and actual data generated from block assembly monitoring system (BAMS) of a top ranked shipbuilding company in Korea.

K-means Clustering for Environmental Indicator Survey Data

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.04a
    • /
    • pp.185-192
    • /
    • 2005
  • There are many data mining techniques such as association rule, decision tree, neural network analysis, clustering, genetic algorithm, bayesian network, memory-based reasoning, etc. We analyze 2003 Gyeongnam social indicator survey data using k-means clustering technique for environmental information. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper, we used k-means clustering of several clustering techniques. The k-means clustering is classified as a partitional clustering method. We can apply k-means clustering outputs to environmental preservation and environmental improvement.

  • PDF

A Comparison of Corrosion Performance of Zirconium Grain Refined MEZ and AZ91 Alloys

  • Song, Guangling;StJohn, David
    • Corrosion Science and Technology
    • /
    • v.2 no.1
    • /
    • pp.30-35
    • /
    • 2003
  • In this study, sand cast AZ91E and zirconium grain refined MEZ are representative of two typical groups of magnesium alloys: those containing aluminium and those containing no aluminium but with zirconium as a grain refiner. The corrosion performance of these two alloys was evaluated and compared in 5%wt NaCI solution through measurements of weight loss and polarisation curves and examination of microstructure. Corrosion damage of AZ91E was deeper and more localised than that of MEZ, while MEZ had a lower rate of cathodic hydrogen evolution and a higher rate of anodic dissolution than AZ91E. These differences in behaviour can be related to the differences in microstructure and chemical composition between the two alloys.

Comparison and Analysis of P2P Botnet Detection Schemes

  • Cho, Kyungsan;Ye, Wujian
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.3
    • /
    • pp.69-79
    • /
    • 2017
  • In this paper, we propose our four-phase life cycle of P2P botnet with corresponding detection methods and the future direction for more effective P2P botnet detection. Our proposals are based on the intensive analysis that compares existing P2P botnet detection schemes in different points of view such as life cycle of P2P botnet, machine learning methods for data mining based detection, composition of data sets, and performance matrix. Our proposed life cycle model composed of linear sequence stages suggests to utilize features in the vulnerable phase rather than the entire life cycle. In addition, we suggest the hybrid detection scheme with data mining based method and our proposed life cycle, and present the improved composition of experimental data sets through analysing the limitations of previous works.

Comparison of Binary Discretization Algorithms for Data Mining

  • Na, Jong-Hwa;Kim, Jeong-Mi;Cho, Wan-Sup
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.769-780
    • /
    • 2005
  • Recently, the discretization algorithms for continuous data have been actively studied. But there are few articles to compare the efficiency of these algorithms. In this paper we introduce the principles of some binary discretization algorithms including C4.5, CART and QUEST and investigate the efficiency of these algorithms through numerical study. For various underlying distribution, we compare these algorithms in view of misclassification rate and MSE. Real data examples are also included.

  • PDF

Numerical investigation of segmental tunnel linings-comparison between the hyperstatic reaction method and a 3D numerical model

  • Do, Ngoc Anh;Dias, Daniel;Oreste, Pierpaolo
    • Geomechanics and Engineering
    • /
    • v.14 no.3
    • /
    • pp.293-299
    • /
    • 2018
  • This paper has the aim of estimating the applicability of a numerical approach to the Hyperstatic Reaction Method (HRM) for the analysis of segmental tunnel linings. For this purpose, a simplified three-dimensional (3D) numerical model, using the $FLAC^{3D}$ finite difference software, has been developed, which allows analysing in a rigorous way the effect of the lining segmentation on the overall behaviour of the lining. Comparisons between the results obtained with the HRM and those determined by means of the simplified 3D numerical model show that the proposed HRM method can be used to investigate the behaviour of a segmental tunnel lining.