• Title/Summary/Keyword: Classification of Difficulty

Search Result 247, Processing Time 0.023 seconds

Data Processing of AutoML-based Classification Models for Improving Performance in Unbalanced Classes (불균형 클래스에서 AutoML 기반 분류 모델의 성능 향상을 위한 데이터 처리)

  • Lee, Dong-Joon;Kang, Ji-Soo;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.49-54
    • /
    • 2021
  • With the recent development of smart healthcare technology, interest in daily diseases is increasing. However, healthcare data has an imbalance between positive and negative data. This is caused by the difficulty of collecting data because there are relatively many people who are not patients compared to patients with certain diseases. Data imbalances need to be adjusted because they affect performance in ongoing learning during disease prediction and analysis. Therefore, in this paper, We replace missing values through multiple imputation in detection models to determine whether they are prevalent or not, and resolve data imbalances through over-sampling. Based on AutoML using preprocessed data, We generate several models and select top 3 models to generate ensemble models.

Weakly-supervised Semantic Segmentation using Exclusive Multi-Classifier Deep Learning Model (독점 멀티 분류기의 심층 학습 모델을 사용한 약지도 시맨틱 분할)

  • Choi, Hyeon-Joon;Kang, Dong-Joong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.6
    • /
    • pp.227-233
    • /
    • 2019
  • Recently, along with the recent development of deep learning technique, neural networks are achieving success in computer vision filed. Convolutional neural network have shown outstanding performance in not only for a simple image classification task, but also for tasks with high difficulty such as object segmentation and detection. However many such deep learning models are based on supervised-learning, which requires more annotation labels than image-level label. Especially image semantic segmentation model requires pixel-level annotations for training, which is very. To solve these problems, this paper proposes a weakly-supervised semantic segmentation method which requires only image level label to train network. Existing weakly-supervised learning methods have limitations in detecting only specific area of object. In this paper, on the other hand, we use multi-classifier deep learning architecture so that our model recognizes more different parts of objects. The proposed method is evaluated using VOC 2012 validation dataset.

Design of Efficient Storage Exploiting Structural Similarity in Microarray Data (마이크로어레이 데이터의 구조적 유사성을 이용한 효율적인 저장 구조의 설계)

  • Yun, Jong-Han;Shin, Dong-Kyu;Shin, Dong-Il
    • The KIPS Transactions:PartD
    • /
    • v.16D no.5
    • /
    • pp.643-650
    • /
    • 2009
  • As one of typical techniques for acquiring bio-information, microarray has contributed greatly to development of bioinformatics. Although it is established as a core technology in bioinformatics, it has difficulty in sharing and storing data because data from experiments has huge and complex type. In this paper, we propose a new method which uses the feature that microarray data format in MAGE-ML, a standard format for exchanging data, has frequent structurally similar patterns. This method constructs compact database by simplifying MAGE-ML schema. In this method, Inlining techniques and newly proposed classification techniques using structural similarity of elements are used. The structure of database becomes simpler and number of table-joins is reduced, performance is enhanced using this method.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.

Corporate Bond Rating Using Various Multiclass Support Vector Machines (다양한 다분류 SVM을 적용한 기업채권평가)

  • Ahn, Hyun-Chul;Kim, Kyoung-Jae
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.157-178
    • /
    • 2009
  • Corporate credit rating is a very important factor in the market for corporate debt. Information concerning corporate operations is often disseminated to market participants through the changes in credit ratings that are published by professional rating agencies, such as Standard and Poor's (S&P) and Moody's Investor Service. Since these agencies generally require a large fee for the service, and the periodically provided ratings sometimes do not reflect the default risk of the company at the time, it may be advantageous for bond-market participants to be able to classify credit ratings before the agencies actually publish them. As a result, it is very important for companies (especially, financial companies) to develop a proper model of credit rating. From a technical perspective, the credit rating constitutes a typical, multiclass, classification problem because rating agencies generally have ten or more categories of ratings. For example, S&P's ratings range from AAA for the highest-quality bonds to D for the lowest-quality bonds. The professional rating agencies emphasize the importance of analysts' subjective judgments in the determination of credit ratings. However, in practice, a mathematical model that uses the financial variables of companies plays an important role in determining credit ratings, since it is convenient to apply and cost efficient. These financial variables include the ratios that represent a company's leverage status, liquidity status, and profitability status. Several statistical and artificial intelligence (AI) techniques have been applied as tools for predicting credit ratings. Among them, artificial neural networks are most prevalent in the area of finance because of their broad applicability to many business problems and their preeminent ability to adapt. However, artificial neural networks also have many defects, including the difficulty in determining the values of the control parameters and the number of processing elements in the layer as well as the risk of over-fitting. Of late, because of their robustness and high accuracy, support vector machines (SVMs) have become popular as a solution for problems with generating accurate prediction. An SVM's solution may be globally optimal because SVMs seek to minimize structural risk. On the other hand, artificial neural network models may tend to find locally optimal solutions because they seek to minimize empirical risk. In addition, no parameters need to be tuned in SVMs, barring the upper bound for non-separable cases in linear SVMs. Since SVMs were originally devised for binary classification, however they are not intrinsically geared for multiclass classifications as in credit ratings. Thus, researchers have tried to extend the original SVM to multiclass classification. Hitherto, a variety of techniques to extend standard SVMs to multiclass SVMs (MSVMs) has been proposed in the literature Only a few types of MSVM are, however, tested using prior studies that apply MSVMs to credit ratings studies. In this study, we examined six different techniques of MSVMs: (1) One-Against-One, (2) One-Against-AIL (3) DAGSVM, (4) ECOC, (5) Method of Weston and Watkins, and (6) Method of Crammer and Singer. In addition, we examined the prediction accuracy of some modified version of conventional MSVM techniques. To find the most appropriate technique of MSVMs for corporate bond rating, we applied all the techniques of MSVMs to a real-world case of credit rating in Korea. The best application is in corporate bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. For our study the research data were collected from National Information and Credit Evaluation, Inc., a major bond-rating company in Korea. The data set is comprised of the bond-ratings for the year 2002 and various financial variables for 1,295 companies from the manufacturing industry in Korea. We compared the results of these techniques with one another, and with those of traditional methods for credit ratings, such as multiple discriminant analysis (MDA), multinomial logistic regression (MLOGIT), and artificial neural networks (ANNs). As a result, we found that DAGSVM with an ordered list was the best approach for the prediction of bond rating. In addition, we found that the modified version of ECOC approach can yield higher prediction accuracy for the cases showing clear patterns.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

Recognition of Efficiency and Effectiveness of the Experiences with Hand Acupuncture (수지침 경험자들의 수지침에 대한 효율성과 효과성 인식정도)

  • Lee, Yeon-Joo;Park, Kyung-Min
    • Research in Community and Public Health Nursing
    • /
    • v.12 no.1
    • /
    • pp.278-287
    • /
    • 2001
  • The purpose of this study is to provide with basic information on application of hand acupuncture as a complementary and alternative therapy by giving some recognition of efficiency and effectiveness of hand acupuncture. And so, answers for questionnaires of 290 respondents were used for this research and collected from June 5 through 13, 1999 from adults twenty and over who were participating in the hand acupuncture training program in Seoul and had some direct experiences with hand acupuncture therapy, whatever they had been treated and/or had treated. To secure reliability of measurement tool. Cronbach'a has been calculated and Factor Analysis was done as Validity Analysis of question classification. Demograprucal characteristics of hand acupuncture experienced people and factors related to hand acupuncture experiences are calculated based on the real number and percentage. The degree of recognition of efficiency and effectiveness of hand acupuncture is made as average and standard deviation, while the degree of recognition of efficiency and effectiveness based on general characteristics come from one-way ANOVA. 1. According to socio-demographical analysis. the questioned could be classified firstly as age (40-49 : 32.5%. 30-39 : 24.9%. 50-59 : 21.9%. 60-69 : 14.7%. 20-29 : 6.0%). secondly gender (male 36.6%. female 63.4%). thirdly occupation (housewife: 43.8%. self-employed: 15.5%. company-employee: 14.8%). fourthly education (high school graduate: 41.9%, college graduate: 37.9%), and lastly monthly-income (1 to 2 million: 51.4%. 2 to 3 million: 20,3%) 2, As for the general aspects related to hand acupuncture. 80,0% of the respondents answered almost zero for the monthly average number of visit to hospital and 15.5% responded 1 to 2 visits, 6,2% of the respondents is complaining of a disorder of digestive system. 19,0% circulatory disease, 10.7% bad nervous system. By utilizing hand acupuncture, 84% of the questioned have following experiences in curing diseases: digestive system 47.3%, circulatory system 9.3%, nervous system 8.3%, 54,1% are curing 1 to 2 and 10.3% 3 to 4 patients on a daily basis with hand acupuncture. Research on the demerits of giving medical treatment with hand acupuncture shows 23,8% are feeling economic burden. 16.6% difficulty of learning and 16.2% weak theoretical backgrounds. 3. Among the efficiency recognition, possibility of general application is average 4,29 and simple treatment is 4,19. economic merits 4.36. possibility of establishment with supplementary and alternative medicine 4.17, medical effectiveness 4.09. 4, As a result of demographical analysis on the efficiency and effectiveness of hand acupuncture therapy, it appears that the recognition of efficiency based on occupation and the recognition of effectiveness based on monthly income are most significant to be noticed. In an orderly fashion. government-employee, self-employed, company-employee. and then housewife have perceived hand acupuncture very efficiently, And those who recognize hand acupuncture to be most effective are people earn 1 million to 2 million won a month, 5. The efficiency(p = .003) and effectiveness (p= .049) of hand acupuncture therapy by number of visit to hospital were statiscally significant, and effectiveness of hand acupuncture therapy by disease exist was statiscally significant (p= .033).

  • PDF

Nutritional Status of Chronic Obstructive Pulmonary Disease Patients according to the Severity of Disease (만성 폐쇄성 폐질환 환자에서 병기에 따른 영양상태 평가)

  • Park, Young-Mi;Yoon, Ho-Il;Sohn, Cheong-Min;Choue, Ryo-Won
    • Journal of Nutrition and Health
    • /
    • v.41 no.4
    • /
    • pp.307-316
    • /
    • 2008
  • The purpose of the study was to investigate nutritional status of chronic obstructive pulmonary disease (COPD) patients and to find out the differences according to the stages of disease. From March to October, 2006, 41 stable male patients of mild to severe COPD patients were recruited from Seoul National University hospital. The patients' of body weight and fat free mass were assessed by bioelectrical impedance analysis. The nutritional status of the patients was also assessed by 3-day recall, index of nutritional quality (INQ), dietary diversity score (DDS), dietary variety score (DVS), food group index pattern and dietary quality index (DQI). The total of 41 patients were classified into three groups, stage I, stage II and stage III groups according to the classification of Global Initiative for Chronic Obstructive Lung Disease (GOLD) standard. The mean age of the patients in each stage were 67.2-66.9 years showing no significant difference. The ratio of $FEV_1$/FVC were $57.5{\pm}7.3$, $46.9{\pm}7.6$ and $38.2{\pm}6.8%$, respectively showing significant differences according to the stages of disease. The fat free mass of the stage II ($48.2{\pm}4.7kg$) and III ($47.3{\pm}4.5kg$) was significantly lower than that of stage I ($53.1{\pm}6.9kg$) patients. There were significant correlation of fat free mass with $FEV_{1}$, and BMI (body mass index) with $FEV_{1}$/FVC ratio (p < 0.05). COPD patients showed the diet-related clinical symptoms of anorexia, dyspnea, dyspepsia, and chewing difficulty. Daily intakes of calorie, K, vitamin $B_2$ and folate of the patients were very low ($83.8{\pm}20.7%$, $58.9{\pm}14.4%$, $70.7{\pm}19.6%$ and $74.4{\pm}10.2%$, respectively) however, they did not significantly different according to the stages of disease. Daily intake of calcium was significantly lower in the stage III patients (p < 0.05). The mean scores of dietary variety score was significantly lower in the stage III patients (p < 0.001). Dietary quality index of the patients were not different among the stages of disease and the scores indicated poor quality of diet. As a summary, we found that body fat free mass, regularity of exercise, frequency of having snacks and dietary variety score were significantly associated with the severity of chronic obstructive pulmonary disease.

A Study on the Stress and Fatigue Symptoms of High School Students according to the Life Styles (일부 고등학생들의 일상생활특성에 따른 스트레스와 피로자각증상의 평가)

  • Lee, Ju-Young;Song, In-Soon;Jeong, Yong-Jun;Cho, Young-Chae
    • Journal of the Korean Society of School Health
    • /
    • v.16 no.1
    • /
    • pp.9-21
    • /
    • 2003
  • The present study was designed to evaluate the factors influential on stress and subjective fatigue symptoms based on school life environments and daily life styles among high school students. The self-administered questionnaires were delivered to 2,381 high school students of both sexes in Taejon Metropolitan city during the period from Mar. 1st to Jun. 30th, 2000. The analysis of study results revealed the following findings: 1. According to the magnitude of stress, the normal subjects were 3.1%, the groups with potential stress were 64.7%, and the groups at high risk for stress were 32.2%. Higher level of stress existed in the female than the male students, and in the third grader than the 1st and 2nd graders. According to the classification of typical constitutional symptoms of fatigue, category III (group with bodily projection of fatigue) was the most frequent and it was followed by category II (group with difficulty in concentration) and category I (group with dullness and sleepiness) in a decreasing order of frequency, which showed that the predominant pattern of fatigue arose from the body parts. 2. With regard to the school life characteristics and stress scores, the higher scores of stress were shown in the groups with the lower grades, with worse friend's relation and with the lower satisfaction with the school life. The scores for the subjective fatigue symptoms were higher in the male, in the low graders, in the better friend's relation, and in the satisfactory group than the respective counterparts. 3. Concerning home life characteristics, the higher scores of stress were associated with the students characterized by the recognized poor economic conditions, lower interests of parents, lack of satisfaction with the home life, the poor subjective health status. On the other hand, the scores for the subjective fatigue symptoms were higher in the student groups with good economic conditions, higher interests of parents, presence of satisfaction with the home life, and good subjective health status. 4. Concerning daily life styles, the higher scores of stress were in the students who had inappropriate sleep hours, skipped breakfasts, daily consumption of intermeal snacks, lack of exercise, daily smoking, normal indices of obesity, and lower indices of health habit. Conversely, the scores of subjective fatigue symptoms were higher in the groups who had daily breakfasts, no intermeal snacks, daily exercise, no smoking than their counterparts. 5. The factors exerting influence upon the stress included the satisfaction with school life, friend's relation, satisfaction with the home life, exercise, school grades, interests of parents, school year, sex, scores of health habit, degree of obesity, economic conditions of home. Those influencing on the degree os stress included stress, intermeal snacks, smoking, friend's relation and satisfaction with the home life.

A Study on The Clinical Characteristics and Treatment in Burning Mouth Syndrome (구강 작엽감 증후군 (BMS)의 임상적 특징 및 치료에 관한 연구)

  • Mi-Jung Yeom;Chong-Youl Kim
    • Journal of Oral Medicine and Pain
    • /
    • v.20 no.1
    • /
    • pp.39-52
    • /
    • 1995
  • Burning mouth syndrome is characterized by a burning sensation in oral cavity without clinical signs. There has b een no established theories about the diagnosis and treatment. The purpose of this article is to examine the clinical feature of BMS patients of Korean and to present a treatment protocol that can be helpful in clinical applications. The subjects chosen for the study were 52 patients who had visited Department of Oral Diagnosis at Yonsei University Dental Hospital and were diagnosed as BMS. We did questionnaires and precise oral exam, laboratory exam, grouping of our patients, individual treatment for the groups and classification of responses to the treatment. The following results were obtained: 1. Chief complaints were throbbing (71.2%), pricking, stinging, tingling (30.8%), burning(25a%). The tongue is the most frequently affected site (82.7%), followed by full mouth, gingiva, palate, buccal mucosa, lips, throat, labial mucosa and floor of mouth. 2. The average age of onset was 48.1 year and the male to female ratio was 1 to 3. The average duration of symptom was 11.69 months for male and 23.07 months for female. 3. 32.7% of patients had appealed continuous pain, which was the most cases. Aggravating factors were peppery food, salty food, hot food, fatigue, tension conversation, sour food, cold food and toothpaste. Reducing factors were cold food, diet, going to sleep and smoking. 4. Associated symptoms were dry mouth, other life problem, altered taste perception, bad taste, throat pain, tingle and difficulty in swallowing. 5. Most of patients had appealed that there was not associated event on onset of symptom, and the order of prevalence is as fallow; dental treatment, stress, denture wearing, an attack of a systemic disease. 92.3% of patient appealed that there was no psychological withering and 7.7% of patients appealed positively. 6. There were eight males and four females that had jobs. 7. There was no family history in 100% of patients in questions about presence of family history. 8. 96.2% of patients appealed that there was no oral habits. 13.5% of patients had dryness of oral mucosa in oral exam. A significant relation to dental prosthesis was not observable, but incidence of diseases due to stress appeared high in BMS which had the clinical characteristics as above. A group having low serum iron was 63.5% and in this group period of potential iron deficiency appeared high in incidence just before move to anemia. A group represented positive response was 38.5% in fungus study for Candida albicans. Since we can expect high treatment response by prescription of iron-contained drug and antifungal drug in these patients, diagnosing patients' condition of BMS can be achieved in more various aspects through study for serum iron and Candida albicans. Furthermore, it is expected that treatment protocol can be made.

  • PDF