Search | Korea Science

Resolving data imbalance through differentiated anomaly data processing based on verification data (검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법)

Hwang, Chulhyun
- Journal of Intelligence and Information Systems
- /
- v.28 no.4
- /
- pp.179-190
- /
- 2022
Data imbalance refers to a phenomenon in which the number of data in one category is too large or too small compared to another category. Due to this, it has been raised as a major factor that deteriorates performance in machine learning that utilizes classification algorithms. In order to solve the data imbalance problem, various ovrsampling methods for amplifying prime number distribution data have been proposed. Among them, SMOTE is the most representative method. In order to maximize the amplification effect of minority distribution data, various methods have emerged that remove noise included in data (SMOTE-IPF) or enhance only border lines (Borderline SMOTE). This paper proposes a method to ultimately improve classification performance by improving the processing method for anomaly data in the traditional SMOTE method that amplifies minority classification data. The proposed method consistently presented relatively high classification performance compared to the existing methods through experiments.
https://doi.org/10.13088/jiis.2022.28.4.179 인용 PDF KSCI

FUZZY LOGIC KNOWLEDGE SYSTEMS AND ARTIFICIAL NEURAL NETWORKS IN MEDICINE AND BIOLOGY

Sanchez, Elie
- Journal of the Korean Institute of Intelligent Systems
- /
- v.1 no.1
- /
- pp.9-25
- /
- 1991
This tutorial paper has been written for biologists, physicians or beginners in fuzzy sets theory and applications. This field is introduced in the framework of medical diagnosis problems. The paper describes and illustrates with practical examples, a general methodology of special interest in the processing of borderline cases, that allows a graded assignment of diagnoses to patients. A pattern of medical knowledge consists of a tableau with linguistic entries or of fuzzy propositions. Relationships between symptoms and diagnoses are interpreted as labels of fuzzy sets. It is shown how possibility measures (soft matching) can be used and combined to derive diagnoses after measurements on collected data. The concepts and methods are illustrated in a biomedical application on inflammatory protein variations. In the case of poor diagnostic classifications, it is introduced appropriate ponderations, acting on the characterizations of proteins, in order to decrease their relative influence. As a consequence, when pattern matching is achieved, the final ranking of inflammatory syndromes assigned to a given patient might change to better fit the actual classification. Defuzzification of results (i.e. diagnostic groups assigned to patients) is performed as a non fuzzy sets partition issued from a "separating power", and not as the center of gravity method commonly employed in fuzzy control. It is then introduced a model of fuzzy connectionist expert system, in which an artificial neural network is designed to build the knowledge base of an expert system, from training examples (this model can also be used for specifications of rules in fuzzy logic control). Two types of weights are associated with the connections: primary linguistic weights, interpreted as labels of fuzzy sets, and secondary numerical weights. Cell activation is computed through MIN-MAX fuzzy equations of the weights. Learning consists in finding the (numerical) weights and the network topology. This feed forward network is described and illustrated in the same biomedical domain as in the first part.
PDF

Determining a BMDL of Blood Lead Based on ADHD Scores Using a Semi-Parametric Regression

Kim, Ah-Hyoun;Ha, Min-A;Kim, Byung-Soo
- The Korean Journal of Applied Statistics
- /
- v.25 no.3
- /
- pp.389-401
- /
- 2012
This paper derives a benchmark dose(BMD) and its 95% lower confidence limit(BMDL) using a semi-parametric regression model for small lead based changes in attention-deficit hyperactivity disorder(ADHD) scores in the first wave of the Children's Health and Environment Research(CHEER) survey data, which have been regularly collected in South Korea since 2005. Ha et al. (2009) showed that the appearance of ADHD symptoms had a borderline trend of increasing with the blood lead concentration. Butdz-J${\o}$rgensen (EFSA, 2010a) derived the BMDL of lead corresponding to a benchmark region of 1 full intelligent quotient (IQ) score using the raw data in Lanphear et al. (2005, EHP). European Food Safety Authority (EFSA, 2010b) determined the BMDL of $1.2{\mu}g/dl$ as a reference point for the characterization of lead when assessing the risk of the intellectual deficit measured by IQ scores. Kim et al. (2011) indicated that an even lower BMDL could be obtained based on the ADHD score; however, the BMDLs depended heavily upon the model assumptions. We show in this paper that a semi-parametric approach resolves the model dependence of BMDLs.
https://doi.org/10.5351/KJAS.2012.25.3.389 인용 PDF KSCI

Successful Lifelong Learning Strategies for Slow Learners: Applying Grit and Growth Mindset (느린 학습자를 위한 성공적인 평생학습 전략: 그릿 및 성장 마인드셋의 적용)

Eun Mi Shin;Ok Geun Choi;Gyu Dal Lee;Duk Han Kwon;Chang Seek Lee
- Industry Promotion Research
- /
- v.8 no.4
- /
- pp.163-176
- /
- 2023
Through a literature review, this study examined the concept of slow learners and the lifelong learning characteristics of slow learners, and sought ways to achieve successful lifelong learning by utilizing grit and growth mindset among non-cognitive characteristics. Slow learners were experiencing difficulties in cognitive, academic, linguistic, social and emotional, and behavioral characteristics. For successful lifelong learning of slow learners, it was necessary to set long-term goals rather than short-term goals and to maintain effort and consistency of interest to achieve the goals. In addition, it was confirmed that in order to achieve long-term goals, it is necessary to believe that change can be achieved through effort and learning. In other words, the need for learning using grit and growth mindset was confirmed. Based on these previous research results, it was presented as a lifelong learning strategy for slow learners that applied grit and growth mindset, which are non-cognitive characteristics, rather than cognitive characteristics such as intelligence.
https://doi.org/10.21186/IPR.2023.8.4.163 인용 PDF

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

Kang, Soyi;Shin, Kyung-shik
- Journal of Intelligence and Information Systems
- /
- v.27 no.3
- /
- pp.157-173
- /
- 2021
With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.
https://doi.org/10.13088/jiis.2021.27.3.157 인용 PDF KSCI

Search Result 5, Processing Time 0.016 seconds

Resolving data imbalance through differentiated anomaly data processing based on verification data (검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법)

FUZZY LOGIC KNOWLEDGE SYSTEMS AND ARTIFICIAL NEURAL NETWORKS IN MEDICINE AND BIOLOGY

Determining a BMDL of Blood Lead Based on ADHD Scores Using a Semi-Parametric Regression

Successful Lifelong Learning Strategies for Slow Learners: Applying Grit and Growth Mindset (느린 학습자를 위한 성공적인 평생학습 전략: 그릿 및 성장 마인드셋의 적용)

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)