• Title/Summary/Keyword: Multi-class Classification

Search Result 224, Processing Time 0.031 seconds

Development of machine learning model for reefer container failure determination and cause analysis with unbalanced data (불균형 데이터를 갖는 냉동 컨테이너 고장 판별 및 원인 분석을 위한 기계학습 모형 개발)

  • Lee, Huiwon;Park, Sungho;Lee, Seunghyun;Lee, Seungjae;Lee, Kangbae
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.23-30
    • /
    • 2022
  • The failure of the reefer container causes a great loss of cost, but the current reefer container alarm system is inefficient. Existing studies using simulation data of refrigeration systems exist, but studies using actual operation data of refrigeration containers are lacking. Therefore, this study classified the causes of failure using actual refrigerated container operation data. Data imbalance occurred in the actual data, and the data imbalance problem was solved by comparing the logistic regression analysis with ENN-SMOTE and class weight with the 2-stage algorithm developed in this study. The 2-stage algorithm uses XGboost, LGBoost, and DNN to classify faults and normalities in the first step, and to classify the causes of faults in the second step. The model using LGBoost in the 2-stage algorithm was the best with 99.16% accuracy. This study proposes a final model using a two-stage algorithm to solve data imbalance, which is thought to be applicable to other industries.

Deep Learning based Scrapbox Accumulated Status Measuring

  • Seo, Ye-In;Jeong, Eui-Han;Kim, Dong-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.27-32
    • /
    • 2020
  • In this paper, we propose an algorithm to measure the accumulated status of scrap boxes where metal scraps are accumulated. The accumulated status measuring is defined as a multi-class classification problem, and the method with deep learning classify the accumulated status using only the scrap box image. The learning was conducted by the Transfer Learning method, and the deep learning model was NASNet-A. In order to improve the accuracy of the model, we combined the Random Forest classifier with the trained NASNet-A and improved the model through post-processing. Testing with 4,195 data collected in the field showed 55% accuracy when only NASNet-A was applied, and the proposed method, NASNet with Random Forest, improved the accuracy by 88%.

A Study on Automatic Classification System of Red Blood Cell for Pathological Diagnosis in Blood Digitial Image (혈액영상에서 병리진단을 위한 적혈구 세포의 자동분류에 관한 연구)

  • 김경수;김동현
    • Journal of the Korea Society of Computer and Information
    • /
    • v.4 no.1
    • /
    • pp.47-53
    • /
    • 1999
  • In medical field, the computer has been used in the automatic processing of data derived in hospital. the automation of diagonal devices, and processing of medical digital images. In this paper, we classify red blood cell into 16 class including normal cell to the automation of blood analysis to diagnose disease. First, using UNL Fourier and invariant moment algorithm, we extract features of red blood cell from blood cell image and then construct multi-layer backpropagation neural network to recognize. We proof that the system can give support to blood analyzer through blood sample analysis of 10 patients.

  • PDF

Internet Application Traffic Classification using a Hierarchical Multi-class SVM (계층적 다중 클래스 SVM을 이용한 인터넷 애플리케이션 트래픽 분류)

  • Yu, Jae-Hak;Kim, Sung-Yun;Lee, Han-Sung;Kim, Myung-Sup;Park, Dai-Hee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06a
    • /
    • pp.174-178
    • /
    • 2008
  • P2P를 포함하는 인터넷 애플리케이션 트래픽의 보다 빠르고 정확한 분류는 최근 학계의 중요한 이슈 중 하나이다. 본 논문에서는 기존의 전통적인 분류방법으로 대표되는 port 번호 및 payload 정보를 이용하는 방법론의 구조적 한계점을 극복하는 새로운 대안으로써, 이진 분류기인 SVM과 단일클래스 SVM을 계층적으로 결합한 다중 클래스 SVM을 구축하여 인터넷 애플리케이션 트래픽 분류를 수행하였다. 제안된 시스템은 이진 분류기인 SVM으로 P2P 트래픽과 non-P2P 트래픽을 빠르게 분류하는 첫 번째 계층, 3개의 단일클래스 SVM을 기반으로 P2P 트래픽들을 파일공유, 메신저, TV로 분류하는 두 번째 계층, 그리고 전체 16가지의 애플리케이션 트래픽별로 세분화 분류하는 세 번째 계층으로 구성된다. 제안된 시스템은 flow 기반의 트래픽 정보를 수집하여 인터넷 애플리케이션 트래픽을 coarse 혹은 fine하게 분류함으로써 효율적인 시스템의 자원 관리, 안정적인 네트워크 환경의 지원, 원활한 bandwidth의 사용, 그리고 적절한 QoS를 보장하였다. 또한, 새로운 애플리케이션 트래픽이 추가되더라도 전체 시스템을 재학습 시킬 필요 없이 새로운 애플리케이션 트래픽만을 추가 학습함으로써 시스템의 점증적 갱신 및 확장성에도 기여하였다. 평가항목인 recall과 precision에서 만족스러운 수치 등을 실험을 통하여 확인함으로써 제안된 시스템의 성능을 검증하였다.

  • PDF

Hydrosphere Change Monitoring of the Daecheong-Dam Basin using Multi-temporal Landsat Images (시계열 Landsat영상을 이용한 대청댐 유역의 수계변화 모니터링)

  • Um, dae-yong;Park, joon-kyu;Lee, jin-duk
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.932-936
    • /
    • 2007
  • In this study, it analyzed the hydrosphere change up to recently since the construction of Daecheong dam using Landsat satellite images and qualitatively the hydrosphere change of the Daecheong dam basin. These study detected the hydrosphere change with applying supervised classification about Landsat satellite image corresponding to 4 periods of 1981, 1987, 1993, and 2002. For this, it designated the class of hydrosphere, vegetation, etc and achieved overlay analysis with extracting only the hydrosphere, and though this, These study monitored the change about hydrosphere of Daecheong dam basin efficiently.

  • PDF

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.2
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Fault Diagnosis of Power Transformer Using Support Vector Machine (써포트 벡터머신을 이용한 전력용 변압기 고장진단)

  • Lim, Jae-Yoon;Lee, Dae-Jong;Lee, Jong-Pil;Ji, Pyeong-Shik
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.23 no.2
    • /
    • pp.62-69
    • /
    • 2009
  • For the fault diagnosis of power transformer, we develop a diagnosis algorithm based on support vector machine. The proposed fault diagnosis system consists of data acquisition, fault/normal diagnosis, and identification of fault. In data acquisition part, concentrated gases are extracted from transformer for data gas analysis. In fault/normal diagnosis part, KEPCO based decision rule is performed to separate normal state from fault types. The determination of fault type is executed by multi-class SVM in identification part. As the simulation results to verify the effectiveness, the proposed method showed more improved classification results than conventional methods.

Multi-class Classification of Histopathology Images using Fine-Tuning Techniques of Transfer Learning

  • Ikromjanov, Kobiljon;Bhattacharjee, Subrata;Hwang, Yeong-Byn;Kim, Hee-Cheol;Choi, Heung-Kook
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.7
    • /
    • pp.849-859
    • /
    • 2021
  • Prostate cancer (PCa) is a fatal disease that occurs in men. In general, PCa cells are found in the prostate gland. Early diagnosis is the key to prevent the spreading of cancers to other parts of the body. In this case, deep learning-based systems can detect and distinguish histological patterns in microscopy images. The histological grades used for the analysis were benign, grade 3, grade 4, and grade 5. In this study, we attempt to use transfer learning and fine-tuning methods as well as different model architectures to develop and compare the models. We implemented MobileNet, ResNet50, and DenseNet121 models and used three different strategies of freezing layers techniques of fine-tuning, to get various pre-trained weights to improve accuracy. Finally, transfer learning using MobileNet with the half-layer frozen showed the best results among the nine models, and 90% accuracy was obtained on the test data set.

Corporate Credit Rating based on Bankruptcy Probability Using AdaBoost Algorithm-based Support Vector Machine (AdaBoost 알고리즘기반 SVM을 이용한 부실 확률분포 기반의 기업신용평가)

  • Shin, Taek-Soo;Hong, Tae-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.25-41
    • /
    • 2011
  • Recently, support vector machines (SVMs) are being recognized as competitive tools as compared with other data mining techniques for solving pattern recognition or classification decision problems. Furthermore, many researches, in particular, have proved them more powerful than traditional artificial neural networks (ANNs) (Amendolia et al., 2003; Huang et al., 2004, Huang et al., 2005; Tay and Cao, 2001; Min and Lee, 2005; Shin et al., 2005; Kim, 2003).The classification decision, such as a binary or multi-class decision problem, used by any classifier, i.e. data mining techniques is so cost-sensitive particularly in financial classification problems such as the credit ratings that if the credit ratings are misclassified, a terrible economic loss for investors or financial decision makers may happen. Therefore, it is necessary to convert the outputs of the classifier into wellcalibrated posterior probabilities-based multiclass credit ratings according to the bankruptcy probabilities. However, SVMs basically do not provide such probabilities. So it required to use any method to create the probabilities (Platt, 1999; Drish, 2001). This paper applied AdaBoost algorithm-based support vector machines (SVMs) into a bankruptcy prediction as a binary classification problem for the IT companies in Korea and then performed the multi-class credit ratings of the companies by making a normal distribution shape of posterior bankruptcy probabilities from the loss functions extracted from the SVMs. Our proposed approach also showed that their methods can minimize the misclassification problems by adjusting the credit grade interval ranges on condition that each credit grade for credit loan borrowers has its own credit risk, i.e. bankruptcy probability.

The Efficiency Rating Prediction for Cultural Tourism Festival Based of DEA (DEA를 적용한 문화관광축제의 효율성 등급 예측모형)

  • Kim, Eun-Mi;Hong, Tae-Ho
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.145-157
    • /
    • 2020
  • Purpose This study proposed an approach for predicting the efficiency rating of the cultural tourism festivals using DEA and machine learning techniques. The cultural tourism festivals are selected for the best festivals through peer reviews by tourism experts. However, only 10% of the festivals which are held in a year could be evaluated in the view of effectiveness without considering the efficiency of festivals. Design/methodology/approach Efficiency scores were derived from the results of DEA for the prediction of efficiency ratings. This study utilized BCC models to reflect the size effect of festivals and classified the festivals into four ratings according the efficiency scores. Multi-classification method were considered to build the prediction of four ratings for the festivals in this study. We utilized neural networks and SVMs with OAO(one-against-one), OAR(one-against-rest), C&S(crammer & singer) with Korea festival data from 2013 to 2018. Findings The number of total visitors in low efficient rating of DEA is more larger than the number of total visitors in high efficient ratings although the total expenditure of visitors is the highest in the most efficient rating when we analyzed the results of DEA for the characteristics of four ratings. SVM with OAO model showed the most superior performance in accuracy as SVM with OAR model was not trained well because of the imbalanced distribution between efficient rating and the other ratings. Our approach could predict the efficiency of festivals which were not included in the review process of culture tourism festivals without rebuilding DEA models each time. This enables us to manage the festivals efficiently with the proposed machine learning models.