• Title/Summary/Keyword: subgroup discovery

Search Result 5, Processing Time 0.021 seconds

Subgroup Discovery Method with Internal Disjunctive Expression

  • Kim, Seyoung;Ryu, Kwang Ryel
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.23-32
    • /
    • 2017
  • We can obtain useful knowledge from data by using a subgroup discovery algorithm. Subgroup discovery is a rule model learning method that finds data subgroups containing specific information from data and expresses them in a rule form. Subgroups are meaningful as they account for a high percentage of total data and tend to differ significantly from the overall data. Subgroup is expressed with conjunction of only literals previously. So, the scope of the rules that can be derived from the learning process is limited. In this paper, we propose a method to increase expressiveness of rules through internal disjunctive representation of attribute values. Also, we analyze the characteristics of existing subgroup discovery algorithms and propose an improved algorithm that complements their defects and takes advantage of them. Experiments are conducted with the traffic accident data given from Busan metropolitan city. The results shows that performance of the proposed method is better than that of existing methods. Rule set learned by proposed method has interesting and general rules more.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

Deriving Local Association Rules by User Segmentation (사용자 구분에 의한 지역적 연관규칙의 유도)

  • Park, Se-Il;Lee, Soo-Wun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.53-64
    • /
    • 2002
  • Association rule discovery is a method that detects associative relationships between items or attributes in transactions. It is one of the most widely studied problems in data mining because it offers useful insight into the types of dependencies that exist in a data set. However, most studies on association rule discovery have the drawback that they can not discover association rules among user groups that have common characteristics. To solve this problem, we segment the set of users into user-subgroups by using feature selection and the user segmentation, thus local association rules in user-subgroup can be discovered. To evaluate that the local association rules are more appropriated than the global association rules in each user-subgroup, derived local association rules are compared with global association rules in terms of several evaluation measures.

CLUSTERING DNA MICROARRAY DATA BY STOCHASTIC ALGORITHM

  • Shon, Ho-Sun;Kim, Sun-Shin;Wang, Ling;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.438-441
    • /
    • 2007
  • Recently, due to molecular biology and engineering technology, DNA microarray makes people watch thousands of genes and the state of variation from the tissue samples of living body. With DNA Microarray, it is possible to construct a genetic group that has similar expression patterns and grasp the progress and variation of gene. This paper practices Cluster Analysis which purposes the discovery of biological subgroup or class by using gene expression information. Hence, the purpose of this paper is to predict a new class which is unknown, open leukaemia data are used for the experiment, and MCL (Markov CLustering) algorithm is applied as an analysis method. The MCL algorithm is based on probability and graph flow theory. MCL simulates random walks on a graph using Markov matrices to determine the transition probabilities among nodes of the graph. If you look at closely to the method, first, MCL algorithm should be applied after getting the distance by using Euclidean distance, then inflation and diagonal factors which are tuning modulus should be tuned, and finally the threshold using the average of each column should be gotten to distinguish one class from another class. Our method has improved the accuracy through using the threshold, namely the average of each column. Our experimental result shows about 70% of accuracy in average compared to the class that is known before. Also, for the comparison evaluation to other algorithm, the proposed method compared to and analyzed SOM (Self-Organizing Map) clustering algorithm which is divided into neural network and hierarchical clustering. The method shows the better result when compared to hierarchical clustering. In further study, it should be studied whether there will be a similar result when the parameter of inflation gotten from our experiment is applied to other gene expression data. We are also trying to make a systematic method to improve the accuracy by regulating the factors mentioned above.

  • PDF

Overexpression of CXCL2 inhibits cell proliferation and promotes apoptosis in hepatocellular carcinoma

  • Ding, Jun;Xu, Kangdi;Zhang, Jie;Lin, Bingyi;Wang, Yubo;Yin, Shengyong;Xie, Haiyang;Zhou, Lin;Zheng, Shusen
    • BMB Reports
    • /
    • v.51 no.12
    • /
    • pp.630-635
    • /
    • 2018
  • C-X-C motif chemokine ligand 2 (CXCL2) is a small secreted protein that exhibits a structure similar to the proangiogenic subgroup of the CXC chemokine family. Recently, accumulating evidence suggests that chemokines play a pivotal role in cancer progression and carcinogenesis. We examined the expression levels of 7 types of $ELR^+$ CXCLs messenger RNA (mRNA) in 264 clinical samples. We found that CXCL2 expression was stably down-regulated in 94% of hepatocellular carcinoma (HCC) specimens compared with paired adjacent normal liver tissues and some HCC cell lines. Moreover, CXCL2 overexpression profoundly attenuated HCC cell proliferation and growth and induced apoptosis in vitro. In animal studies, we found that overexpressing CXCL2 by lentivirus also apparently inhibited the size and weight of subcutaneous tumours in nude mice. Furthermore, we demonstrated that CXCL2 induced HCC cell apoptosis via both nuclear and mitochondrial apoptosis pathways. Our results indicate that CXCL2 negatively regulates the cell cycle in HCC cells via the ERK1/2 signalling pathway. These results provide new insights into HCC and may ultimately lead to the discovery of innovative therapeutic approaches of HCC.