Search | Korea Science

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.23-45
- /
- 2020
Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.
https://doi.org/10.13088/jiis.2020.26.1.023 인용 PDF KSCI

Effects of Secondary Task on Driving Performance -Control of Vehicle and Analysis of Motion signal- (동시과제가 운전 수행 능력에 미치는 영향 -차량 통제 및 동작신호 해석을 중심으로-)

Mun, Kyung-Ryoul;Choi, Jin-Seung;Kang, Dong-Won;Bang, Yun-Hwan;Kim, Han-Soo;Lee, Su-Jung;Yang, Jae-Woong;Kim, Ji-Hye;Choi, Mi-Hyun;Ji, Doo-Hwan;Min, Byung-Chan;Chung, Soon-Cheol;Taek, Gye-Rae
- Science of Emotion and Sensibility
- /
- v.13 no.4
- /
- pp.613-620
- /
- 2010
The purpose of this study was to quantitatively evaluate the effects of the secondary task while simulated driving using the variable indicating control of vehicle and smoothness of motion. Fifteen healthy adults having 1~2years driving experience were participated. 9 markers were attached on the subjects' upper(shoulder, elbow, Wrist) and lower(knee, ankle, toe) limbs and all subjects were instructed to keep the 30m distance with the front vehicle running at 80km/hr speed. Sending text message(STM) and searching navigation(SN) were selected as the secondary task. Experiment consisted of driving alone for 1 min and driving with secondary task for 1 min, and was defined driving and cognition blocks respectively. To indicate the effects of secondary task, coefficient of variation of distance between vehicles and lane keeping(APCV and MLCV) and jerk-cost function(JC) were analyzed. APCV was increased by 222.1% in SN block. MLCV was increased by 318.2% in STM and 308.4% in SN. JC were increased at the drivers' elbow, knee, ankle and toe, especially the total mean JC of lower limbs were increased by 218.2% in STM and 294.7% in SN. Conclusively, Performing secondary tasks while driving decreased the smoothness of motion with increased JC and disturbed the control of vehicle with increased APCV and MLCV.
PDF

Minisatellite 5 of SLC6A18 (SLC6A18-MS5): Relationship to Hypertension and Evolutional Level (SLC6A18 유전자의 minisatellites 5 (SLC6A18-MS5)의 고혈압과의 관련성 및 진화적 의미)

Heo, Chang-Hwan;Lee, Sang-Yeop;Seol, So-Young;Kwon, Jeong-Ah;Jeong, Yun-Hee;Chung, Chung-Nam;SunWoo, Yang-Il
- Journal of Life Science
- /
- v.18 no.12
- /
- pp.1733-1738
- /
- 2008
SLC6A18, one of the neurotransmitters, was reported the possible relationship to hypertension, and it contained eight blocks of minisatellites. In this study, SLC6A18-MS5 sequence which showed the highest heterozygosity among seven minisatellites was analyzed using the Transfac software, the putative binding sites for the transcription factor Pax4 and HNF4 were discovered as a result. The HNF4 is involved in the diabetes pathway and suggested the relationship to hypertension. Thus, we investigated the putative functional significance of allelic variation in this minisatellites with respect to susceptibility for hypertension. To address this possibility, we analyzed genomic DNA from the blood of 301 hypertension-free controls and 184 cases with hypertension. A statistically significant association was not identified between the allelic distribution of SLC6A18-MS5 and occurrence of hypertension. We then examined the meiotic segregation of SLC6A18-MS5 and it was transmitted following Mendelian inheritance. Therefore, this locus could be useful markers for paternity mapping and DNA fingerprinting. Moreover, we undertook a comprehensive analysis of the genomic sequence to address the evolutionary events of these variable repeats. SLC6A18 minisatellites regions are only conserved in human and primates. This result suggestedthat intronic minisatellites analysis is powerful evolution marker for the non-coding regions in primates and can provide a great insight to the molecular evolution of repeated region in primates.
https://doi.org/10.5352/JLS.2008.18.12.1733 인용 PDF KSCI

Immunohistochemical Detection of p53, erbB-2 and CEA Oncoprotein in Lung Cancer; Clinical Correlations (폐암 환자에서 면역조직화학 염색을 통한 p53, erbB-2, CEA 종양단백 발현과 임상적 의의)

Jeong, Seong-Su;Kang, Dong-Won;Lee, Gyu-Seung;Ko, Dong-Seok;Suh, Jae-Chul;Kim, Geun-Hwa;Shin, Kyoung-Sang;Kim, Ju-Ock;Song, Gyu-Sang;Kim, Sun-Young
- Tuberculosis and Respiratory Diseases
- /
- v.45 no.4
- /
- pp.766-775
- /
- 1998
Background : The prognosis of patients with lung cancer is still poor. Lung cancer exhibits a variable clinical outcome, even in those patients with same stage. Numerous reports suggest that oncogene expression might playa role in explaining the variability of response and survival But many of these reports are still under debate. So we studied the clinical relevance of oncogene expression in Korean lung cancer patients. Immunohistochemistry of p53, erbB-2, CEA expression was performed. Method: From March, 1992 until March, 1997, 120 patients with lung cancer were reviewed. p53, erbB-2, and CEA expression were detected on paraffin-embedded tumor blocks with the use of monoclonal antibodies. The survival and response has correlated with the expressibility of p53, erbB-2, and CEA oncoprotein Results: Overall, the expression rates of p53, erbB-2, and CEA were 33.7%, 59.3%, and 32.6% respectively. Expression rates were not correlated to cell type or stage. Compared with response to chemotherapy, no correlation was found. The expression of p53, erbB-2, or CEA was not correlated with 2-year survival. With simultaneous applications of p53, erbB-2, and CEA, patients with 2 or more expressions also did not show poor response to chemotherapy. Conclusion: We conclude the p53, erbB-2, and CEA expression are clinically less useful in predicting response to chemotherapy or survival.
PDF

Search Result 164, Processing Time 0.025 seconds

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Effects of Secondary Task on Driving Performance -Control of Vehicle and Analysis of Motion signal- (동시과제가 운전 수행 능력에 미치는 영향 -차량 통제 및 동작신호 해석을 중심으로-)

Minisatellite 5 of SLC6A18 (SLC6A18-MS5): Relationship to Hypertension and Evolutional Level (SLC6A18 유전자의 minisatellites 5 (SLC6A18-MS5)의 고혈압과의 관련성 및 진화적 의미)

Immunohistochemical Detection of p53, erbB-2 and CEA Oncoprotein in Lung Cancer; Clinical Correlations (폐암 환자에서 면역조직화학 염색을 통한 p53, erbB-2, CEA 종양단백 발현과 임상적 의의)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)