Search | Korea Science

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.23-45
- /
- 2020
Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.
https://doi.org/10.13088/jiis.2020.26.1.023 인용 PDF KSCI

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
- Science of Emotion and Sensibility
- /
- v.13 no.1
- /
- pp.47-60
- /
- 2010
Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.
PDF

Thin Layer Drying and Quality Characteristics of Ainsliaea acerifolia Sch. Bip. Using Far Infrared Radiation (원적외선을 이용한 단풍취의 박층 건조 및 품질 특성)

Ning, Xiao Feng;Li, He;Kang, Tae Hwan;Lee, Jun Soo;Lee, Jeong Hyun;Ha, Chung Su
- Journal of the Korean Society of Food Science and Nutrition
- /
- v.43 no.6
- /
- pp.884-892
- /
- 2014
The purpose of this study was to investigate the drying characteristics and drying models of Ainsliaea acerifolia Sch. Bip. using far-infrared thin layer drying. Far-infrared thin layer drying test on Ainsliaea acerifolia Sch. Bip. was conducted at two air velocities of 0.6 and 0.8 m/sec, as well as three drying temperatures of 40, 45, and $50^{\circ}C$ respectively. The drying models were estimated using coefficient of determination and root mean square error. Drying characteristics were analyzed based on factors such as drying rate, leaf color changes, antioxidant activity, and contents of polyphenolics and flavonoids. The results revealed that increases in drying temperature and air velocity caused a reduction in drying time. The Thompson model was considered suitable for thin layer drying using far-infrared radiation for Ainsliaea accerifolia Sch. Bip. Greenness and yellowness values decreased and lightness values increased after far-infrared thin layer drying, and the color difference (${\Delta}E$) values at $40^{\circ}C$ were higher than those at $45^{\circ}C$ and $50^{\circ}C$. The antioxidant properties of Ainsliaea acerifolia Sch. Bip. decreased under all far-infrared thin layer drying conditions, and the highest polyphenolic content (37.9 mg/g), flavonoid content (22.7 mg/g), DPPH radical scavenging activity (32.5), and ABTS radical scavenging activity (31.1) were observed at a drying temperature of $40^{\circ}C$ with an air velocity of 0.8 m/sec.
https://doi.org/10.3746/jkfn.2014.43.6.884 인용 PDF KSCI

Investigating the Influence of Perceived Usefulness and Self-Efficacy on Online WOM Adoption Based on Cognitive Dissonance Theory: Stick to Your Own Preference VS. Follow What Others Said (온라인 구전정보 수용자의 지각된 정보유용성과 자기효능감이 구전정보 수용의도에 미치는 영향에 관한 연구: 의견고수와 구전수용의 비교)

Lee, Jung Hyun;Park, Joo Seok;Kim, Hyun Mo;Park, Jae Hong
- Asia pacific journal of information systems
- /
- v.23 no.3
- /
- pp.131-154
- /
- 2013
New internet technologies have created a revolutionary new platform which allows consumers to make decision about product price and quality quickly and provides information about themselves through the transcript of online reviews. By expressing their feelings toward products or services on virtual opinion platforms, users extend their influence into cyberspace as electronic word-of-mouth (e-WOM). Existing research indicates that an impact of eWOM on the consumer decision process is influential. For both academic researchers and practitioners, investigating this phenomenon of information sharing in online website is essential given the increasing number of consumers using them as sources of purchase decisions. It is worthwhile to examine the extent to which opinion seekers are willing to accept and adopt online reviews and which factors encourage adoption. Discerning the most motivating aspects of information adoption in particular, could help electronic marketers better promote their brand and presence on the internet. The objectives of this study are to investigate how online WOM influences a persons' purchase decision by discovering which factors encourage information adoption. Especially focused on the self-efficacy, this research investigates how self-efficacy affects on information usefulness and adoption of online information. Although people are exposed to same review or comment about product or service, some accept the reviews while others do not. We notice that accepting online reviews mainly depends on the person's preference or personal characteristics. This study empirically examines this issue by using cognitive dissonance theory. Specifically, in the movie industry, we address few questions-is always positive WOM generating positive effect? What if the movie isn't the person's favorite genre? What if the person who is very self-assertive so doesn't take other's opinion easily? In these cases of cognitive dissonance, is always WOM generating same result? While many studies have focused on one direct of WOM which indicates positive (or negative) informative reviews or comments generate positive (or negative) results and more (or less) profits, this study investigates not only directional properties of WOM but also how people change their opinion towards product or service positive to negative, negative to positive through the online WOM. An experiment was conducted quantitatively by using a sample of 168 users who have experience within the online movie review site, 'Naver Movie'. Users were required to complete a survey regarding reviews and comments taken from the real movie page. The data reflected user's perceptions of online WOM information that determined users' adoption level. Analysis results provide empirical support for the proposed theoretical perspective. When user can't agree with the opinion of online WOM information, in other words, when cognitive dissonance between online WOM information and users' preference occurs, perceived self-efficacy significantly decreases customers' perception of usefulness. And this perception of usefulness plays an important role in determining users' intention to adopt online WOM information. Most of researches have been concentrated on characteristics of online WOM itself such as quality or vividness of information, credibility of source and direction of online WOM, etc. for describing effect of online WOM, but our results suggest that users' personal character (e.g., self-efficacy) plays decisive role for acceptance of online WOM information. Higher self-efficacy means lower possibility to accept the information that represents counter opinion because of cognitive dissonance, whereas the people that have lower self-efficacy are willing to accept the online WOM information as true and refer to purchase decision. This study suggests a model for understanding role of direction of online WOM information. Also, our result implicates the importance of online review supervision and personalized information service by confirming switching opinion negative to positive is more difficult than positive to negative through the online WOM information. This implication would help marketers to manage online reviews of their products or services.
https://doi.org/10.14329/apjis.2013.23.3.131 인용 PDF

Design and Implementation of Web Based Instruction Based on Constructivism for Self-Directed Learning Ablity (구성주의 이론에 기반한 자기주도적 웹 기반 교육의 설계와 구현)

Kim Gi-Nam;Kim Eui-Jeong;Kim Chang-Suk
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2006.05a
- /
- pp.855-858
- /
- 2006
First of all, Developing information technology makes it possible to change a paradigm of all kinds of areas, including an education. Students can choose learning goals and objects themselves and acquire not the accumulation of knowledge but the method of their learning. Moreover, Teachers get to be adviser, and students play a key role in teaming. That is, the subject of leaning is students. Constructivism emphasizes the student-oriented environment of education, which corresponds to the characteristics of hypeimedia. In addition, Internet allows us to make a practical plan for constructivism. Web Based Internet provides us with a proper environment to make constructivism practice md causes an education system to change. Sure Web Based Instruction makes them motivated to learn more, they can gain plenty of information regardless of places or time. Besides, they are able to consult more up-to-date information regarding their learning use hypermedia such as an image, audio, video, and test, and effectively communicate with their instructor through a board, an e-mail, a chatting etc. A school and instructors have been making effort to develop a new model of a teaching method to cope with a new environment change. In this thesis, with 'Design and Implementation of Web Based Instruction Based on Constructivism', providing online learner-oriented and indexed video lesson, learners can get chance of self-oriented learning. In addition, learners doesn't have to cover all contents of a lesson but can choose contents they want to have from a indexed list of a lesson, and they ran search contents they want to have with a 'Keyword Search' on a main page, which can make learners improve learner's achievement.
PDF

Construction and Application of Intelligent Decision Support System through Defense Ontology - Application example of Air Force Logistics Situation Management System (국방 온톨로지를 통한 지능형 의사결정지원시스템 구축 및 활용 - 공군 군수상황관리체계 적용 사례)

Jo, Wongi;Kim, Hak-Jin
- Journal of Intelligence and Information Systems
- /
- v.25 no.2
- /
- pp.77-97
- /
- 2019
The large amount of data that emerges from the initial connection environment of the Fourth Industrial Revolution is a major factor that distinguishes the Fourth Industrial Revolution from the existing production environment. This environment has two-sided features that allow it to produce data while using it. And the data produced so produces another value. Due to the massive scale of data, future information systems need to process more data in terms of quantities than existing information systems. In addition, in terms of quality, only a large amount of data, Ability is required. In a small-scale information system, it is possible for a person to accurately understand the system and obtain the necessary information, but in a variety of complex systems where it is difficult to understand the system accurately, it becomes increasingly difficult to acquire the desired information. In other words, more accurate processing of large amounts of data has become a basic condition for future information systems. This problem related to the efficient performance of the information system can be solved by building a semantic web which enables various information processing by expressing the collected data as an ontology that can be understood by not only people but also computers. For example, as in most other organizations, IT has been introduced in the military, and most of the work has been done through information systems. Currently, most of the work is done through information systems. As existing systems contain increasingly large amounts of data, efforts are needed to make the system easier to use through its data utilization. An ontology-based system has a large data semantic network through connection with other systems, and has a wide range of databases that can be utilized, and has the advantage of searching more precisely and quickly through relationships between predefined concepts. In this paper, we propose a defense ontology as a method for effective data management and decision support. In order to judge the applicability and effectiveness of the actual system, we reconstructed the existing air force munitions situation management system as an ontology based system. It is a system constructed to strengthen management and control of logistics situation of commanders and practitioners by providing real - time information on maintenance and distribution situation as it becomes difficult to use complicated logistics information system with large amount of data. Although it is a method to take pre-specified necessary information from the existing logistics system and display it as a web page, it is also difficult to confirm this system except for a few specified items in advance, and it is also time-consuming to extend the additional function if necessary And it is a system composed of category type without search function. Therefore, it has a disadvantage that it can be easily utilized only when the system is well known as in the existing system. The ontology-based logistics situation management system is designed to provide the intuitive visualization of the complex information of the existing logistics information system through the ontology. In order to construct the logistics situation management system through the ontology, And the useful functions such as performance - based logistics support contract management and component dictionary are further identified and included in the ontology. In order to confirm whether the constructed ontology can be used for decision support, it is necessary to implement a meaningful analysis function such as calculation of the utilization rate of the aircraft, inquiry about performance-based military contract. Especially, in contrast to building ontology database in ontology study in the past, in this study, time series data which change value according to time such as the state of aircraft by date are constructed by ontology, and through the constructed ontology, It is confirmed that it is possible to calculate the utilization rate based on various criteria as well as the computable utilization rate. In addition, the data related to performance-based logistics contracts introduced as a new maintenance method of aircraft and other munitions can be inquired into various contents, and it is easy to calculate performance indexes used in performance-based logistics contract through reasoning and functions. Of course, we propose a new performance index that complements the limitations of the currently applied performance indicators, and calculate it through the ontology, confirming the possibility of using the constructed ontology. Finally, it is possible to calculate the failure rate or reliability of each component, including MTBF data of the selected fault-tolerant item based on the actual part consumption performance. The reliability of the mission and the reliability of the system are calculated. In order to confirm the usability of the constructed ontology-based logistics situation management system, the proposed system through the Technology Acceptance Model (TAM), which is a representative model for measuring the acceptability of the technology, is more useful and convenient than the existing system.
https://doi.org/10.13088/jiis.2019.25.2.077 인용 PDF KSCI HTML

Search Result 266, Processing Time 0.021 seconds

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

Thin Layer Drying and Quality Characteristics of Ainsliaea acerifolia Sch. Bip. Using Far Infrared Radiation (원적외선을 이용한 단풍취의 박층 건조 및 품질 특성)

Design and Implementation of Web Based Instruction Based on Constructivism for Self-Directed Learning Ablity (구성주의 이론에 기반한 자기주도적 웹 기반 교육의 설계와 구현)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)