Search | Korea Science

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.23-45
- /
- 2020
Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.
https://doi.org/10.13088/jiis.2020.26.1.023 인용 PDF KSCI

Applications of Fuzzy Theory on The Location Decision of Logistics Facilities (퍼지이론을 이용한 물류단지 입지 및 규모결정에 관한 연구)

이승재;정창무;이헌주
- Journal of Korean Society of Transportation
- /
- v.18 no.1
- /
- pp.75-85
- /
- 2000
In existing models in optimization, the crisp data improve has been used in the objective or constraints to derive the optimal solution, Besides, the subjective environments are eliminated because the complex and uncertain circumstances were regarded as Probable ambiguity, In other words those optimal solutions in the existing models could be the complete satisfactory solutions to the objective functions in the Process of application for industrial engineering methods to minimize risks of decision-making. As a result of those, decision-makers in location Problems couldn't face appropriately with the variation of demand as well as other variables and couldn't Provide the chance of wide selection because of the insufficient information. So under the circumstance. it has been to develop the model for the location and size decision problems of logistics facility in the use of the fuzzy theory in the intention of making the most reasonable decision in the Point of subjective view under ambiguous circumstances, in the foundation of the existing decision-making problems which must satisfy the constraints to optimize the objective function in strictly given conditions in this study. Introducing the Process used in this study after the establishment of a general mixed integer Programming(MIP) model based upon the result of existing studies to decide the location and size simultaneously, a fuzzy mixed integer Programming(FMIP) model has been developed in the use of fuzzy theory. And the general linear Programming software, LINDO 6.01 has been used to simulate, to evaluate the developed model with the examples and to judge of the appropriateness and adaptability of the model(FMIP) in the real world.
PDF

A study on the Logical Reclassification of Parcel Service Tariffs (택배요금기준의 합리적 재설정에 관한 연구)

Cho, Yoon-Sung;Lee, Tae-Hwee
- Journal of Distribution Science
- /
- v.10 no.5
- /
- pp.45-55
- /
- 2012
In Korea, the parcel delivery service was launched officially in 1992, and the market has grown to 13.2 billion units, or 3.5 trillion won, as of 2011. The service companies accept small packages under 30 kg and deliver them on the next day in most domestic areas. This service plays an important role in business and personal activities. The parcel service companies have themselves designed the tariff for the delivery service based on two criteria: weight and the sum of three side lengths. Further, the tariff is graded in steps of three or four rate structures based on size (small, medium, large, and extra-small). However, the basic freight rate is generally decided according to the cargo's weight or measurement size, and an extra rate is added according to some factors (handling, stowability, liability, and so on). The parcel service tariff adopted by the companies is illogically designed, and this study was carried out to assess the need for redesigning the tariff structure. The cargo volume cannot be logically reflected by three side lengths. For example, two parcels measuring 160 cm based on three side lengths may have different volumes, one measuring 0.152 cbm (53.33 cm × 53.33 cm × 53.34 cm) and the other 0.05 cbm (100 cm × 50 cm × 10 cm). A small package of less than120 cm (sum of three side lengths) may have a volume of as much as 0.064 cbm (40 cm × 40 cm × 40 cm). Sample comparison showed that 17% of medium-size parcels (based on the sum of three side lengths) are small-volume packages, 24% of large-size parcels are small- or medium-volume packages, and 40% of extra-big-size parcels are big- or under-size packages. Therefore, if parcel service companies rate their services for volume cargo based on the three side lengths standard, users may have to pay higher than normal rates, particularly because a large percentage of parcels are volume cargo. According to this study, the average weight per 1 cbm is less than 300 kg. Therefore, users face an increasing risk of paying higher than logical freight charges. Generally, transportation companies are called "public interest enterprises," and parcel service companies operate as postal services. Public interest enterprises must provide the delivery service to all customers without discrimination at a reasonable service level and logical service charges. Therefore, parcels service tariffs must be designed and adopted logically. In this study, freight theories and prior research findings were used to consider the importance of freight rates, and distortion of parcel service rates based on the three side lengths system was verified through regression analysis of a parcel sample and sample comparison. In conclusion, volume sizes based on three side lengths have a higher correlation to the rate level than does the sum of three side lengths. Further, compared to the sum of three side lengths, volume size has a higher correlation to cargo weight, which is the most basic factor determining transportation cost. Therefore, the existing parcel service tariff should be changed to weight- and volume-based rates, and the tariff must be graded in steps of 8 to 10 higher rate structures for a logical freight schedule based on service cost.
PDF

Difference in the practice of COVID-19 prevention according to the reliability of COVID-19 response among high school students in Korea (일부 고등학생들의 학교와 학원 코로나19 대응방역 신뢰도에 따른 코로나19 예방행동 실천의 차이)

Lee, Hocheol;Yoon, Hyejin;Kim, Ji Eon;Nam, Eun Woo
- Journal of agricultural medicine and community health
- /
- v.46 no.3
- /
- pp.131-143
- /
- 2021
Objectives: This study aimed 1) to investigate high school students' reliability on COVID-19 responses in schools and private academies and 2) to identify the differences in COVID-19 prevention practice. Methods: This cross-sectional survey collected data from 200 high school respondents, using an anonymous online questionnaire designed by the Yonsei Global Health Center, from July 2 to 17, 2020 in this study. Chi-square tests were conducted to analyze the differences in preventative practices and practice rates between schools and private academies. Binary logistics regression analysis was conducted to identify the factor affecting the reliability of COVID-19 response. Results: These high school students reliabilityed the schools' COVID-19 response more than the private academy. In addition, students who studied only at school did more COVID-19 prevention practices than students who studied both at school and academy. There was a significant difference in avoiding public transportation (p=.028), sitting in one row while having a meal (p=.011) in the practice rates depending on the schools' COVID-19 response. A significant difference in Covering the mouth when coughing and sneezing (p-.041) was also found in the practice rates depending on the private academies' COVID-19 response. Conclusion: The reason why schools were more reliable than private academies was that there are health teachers. Because schools are supervised by the ministry of education, the Ministry of education and local government need to work together to manage and monitor the COVID-19 response in the academies through cooperation between two organizations. In addition, it is necessary to arrange a temporary circulation health teacher who will provide the COVID-19 prevention education at the academies.
https://doi.org/10.5393/JAMCH.2021.46.3.131 인용 PDF KSCI

Search Result 144, Processing Time 0.019 seconds

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Applications of Fuzzy Theory on The Location Decision of Logistics Facilities (퍼지이론을 이용한 물류단지 입지 및 규모결정에 관한 연구)

A study on the Logical Reclassification of Parcel Service Tariffs (택배요금기준의 합리적 재설정에 관한 연구)

Difference in the practice of COVID-19 prevention according to the reliability of COVID-19 response among high school students in Korea (일부 고등학생들의 학교와 학원 코로나19 대응방역 신뢰도에 따른 코로나19 예방행동 실천의 차이)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)