• Title/Summary/Keyword: 분류적중률

Search Result 35, Processing Time 0.022 seconds

A Hybrid SVM Classifier for Imbalanced Data Sets (불균형 데이터 집합의 분류를 위한 하이브리드 SVM 모델)

  • Lee, Jae Sik;Kwon, Jong Gu
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.125-140
    • /
    • 2013
  • We call a data set in which the number of records belonging to a certain class far outnumbers the number of records belonging to the other class, 'imbalanced data set'. Most of the classification techniques perform poorly on imbalanced data sets. When we evaluate the performance of a certain classification technique, we need to measure not only 'accuracy' but also 'sensitivity' and 'specificity'. In a customer churn prediction problem, 'retention' records account for the majority class, and 'churn' records account for the minority class. Sensitivity measures the proportion of actual retentions which are correctly identified as such. Specificity measures the proportion of churns which are correctly identified as such. The poor performance of the classification techniques on imbalanced data sets is due to the low value of specificity. Many previous researches on imbalanced data sets employed 'oversampling' technique where members of the minority class are sampled more than those of the majority class in order to make a relatively balanced data set. When a classification model is constructed using this oversampled balanced data set, specificity can be improved but sensitivity will be decreased. In this research, we developed a hybrid model of support vector machine (SVM), artificial neural network (ANN) and decision tree, that improves specificity while maintaining sensitivity. We named this hybrid model 'hybrid SVM model.' The process of construction and prediction of our hybrid SVM model is as follows. By oversampling from the original imbalanced data set, a balanced data set is prepared. SVM_I model and ANN_I model are constructed using the imbalanced data set, and SVM_B model is constructed using the balanced data set. SVM_I model is superior in sensitivity and SVM_B model is superior in specificity. For a record on which both SVM_I model and SVM_B model make the same prediction, that prediction becomes the final solution. If they make different prediction, the final solution is determined by the discrimination rules obtained by ANN and decision tree. For a record on which SVM_I model and SVM_B model make different predictions, a decision tree model is constructed using ANN_I output value as input and actual retention or churn as target. We obtained the following two discrimination rules: 'IF ANN_I output value <0.285, THEN Final Solution = Retention' and 'IF ANN_I output value ${\geq}0.285$, THEN Final Solution = Churn.' The threshold 0.285 is the value optimized for the data used in this research. The result we present in this research is the structure or framework of our hybrid SVM model, not a specific threshold value such as 0.285. Therefore, the threshold value in the above discrimination rules can be changed to any value depending on the data. In order to evaluate the performance of our hybrid SVM model, we used the 'churn data set' in UCI Machine Learning Repository, that consists of 85% retention customers and 15% churn customers. Accuracy of the hybrid SVM model is 91.08% that is better than that of SVM_I model or SVM_B model. The points worth noticing here are its sensitivity, 95.02%, and specificity, 69.24%. The sensitivity of SVM_I model is 94.65%, and the specificity of SVM_B model is 67.00%. Therefore the hybrid SVM model developed in this research improves the specificity of SVM_B model while maintaining the sensitivity of SVM_I model.

A Study on VoiceXML Application of User-Controlled Form Dialog System (사용자 주도 폼 다이얼로그 시스템의 VoiceXML 어플리케이션에 관한 연구)

  • Kwon, Hyeong-Joon;Roh, Yong-Wan;Lee, Hyon-Gu;Hong, Hwang-Seok
    • The KIPS Transactions:PartB
    • /
    • v.14B no.3 s.113
    • /
    • pp.183-190
    • /
    • 2007
  • VoiceXML is new markup language which is designed for web resource navigation via voice based on XML. An application using VoiceXML is classified into mutual-controlled and machine-controlled form dialog structure. Such dialog structures can't construct service which provide free navigation of web resource by user because a scenario is decided by application developer. In this paper, we propose VoiceXML application structure using user-controlled form dialog system which decide service scenario according to user's intention. The proposed application automatically detects recognition candidates from requested information by user, and then system uses recognition candidate as voice-anchor. Also, system connects each voice-anchor with new voice-node. An example of proposed system, we implement news service with IT term dictionary, and we confirm detection and registration of voice-anchor and make an estimate of hit rate about measurement of an successive offer from information according to user's intention and response speed. As the experiment result, we confirmed possibility which is more freely navigation of web resource than existing VoiceXML form dialog systems.

An empirical study on the impact of intellectual property rights on the management performance of companies: focusing on patent rights (지식재산권이 기업의 경영성과에 미치는 영향에 대한 실증연구: 특허권을 중심으로)

  • Yang, Changyong;Hong, Jung-Wan;You, Yen-Yoo
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.6
    • /
    • pp.173-181
    • /
    • 2021
  • In previous research, large companies were analyzed to focus on their past management performance and patent rights, but in this study we looked at the variability in future sales of small and medium-sized enterprises with patents through empirical analysis. We looked at how the quantitative and qualitative value of patent rights affect management performance of company. We used 'the number of patents' as the quantitative value of patents, 'the average score of patents' as the qualitative value of patents, and the average sales growth rate as the management performance of company. Through a discriminant analysis using the statistical program SPSS, both independent variables used in this study were significant for distinguishing between companies with an average growth in sales more than twice that of general small and medium-sized enterprises and those with less than twice the average sales growth rate. Therefore it is meaningful to provide stakeholders with an analysis framework on how sales will change in the future using the results of this study during guarantee or loan screening for small and medium-sized enterprises with patent rights.

Machine Learning Process for the Prediction of the IT Asset Fault Recovery (IT자산 장애처리의 사전 예측을 위한 기계학습 프로세스)

  • Moon, Young-Joon;Rhew, Sung-Yul;Choi, Il-Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.4
    • /
    • pp.281-290
    • /
    • 2013
  • The IT asset is a core part that supports the management objective of an organization, and the fast settlement of the IT asset fault is very important. In this study, a fault recovery prediction technique is proposed, which uses the existing fault data to address the IT asset fault. The proposed fault recovery prediction technique is as follows. First, the existing fault recovery data were pre-processed and classified by fault recovery type; second, a rule was established for the keyword mapping of the classified fault recovery types and reported data; and third, a machine learning process that allows the prediction of the fault recovery method based on the established rule was presented. To verify the effectiveness of the proposed machine learning process, company A's 33,000 computer fault data for the duration of six months were tested. The hit rate for fault recovery prediction was approximately 72%, and it increased to 81% via continuous machine learning.

Development for City Bus Dirver's Accident Occurrence Prediction Model Based on Digital Tachometer Records (디지털 운행기록에 근거한 시내버스 운전자의 사고발생 예측모형 개발)

  • Kim, Jung-yeul;Kum, Ki-jung
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.1
    • /
    • pp.1-15
    • /
    • 2016
  • This study aims to develop a model by which city bus drivers who are likely to cause an accident can be figured out based on the information about their actual driving records. For this purpose, from the information about the actual driving records of the drivers who have caused an accident and those who have not caused any, significance variables related to traffic accidents are drawn, and the accuracy between models is compared for the classification models developed, applying a discriminant analysis and logistic regression analysis. In addition, the developed models are applied to the data on other drivers' driving records to verify the accuracy of the models. As a result of developing a model for the classification of drivers who are likely to cause an accident, when deceleration ($X_{deceleration}$) and acceleration to the right ($Y_{right}$) are simultaneously in action, this variable was drawn as the optimal factor variable of the classification of drivers who had caused an accident, and the prediction model by discriminant analysis classified drivers who had caused an accident at a rate up to 62.8%, and the prediction model by logistic regression analysis could classify those who had caused an accident at a rate up to 76.7%. In addition, as a result of the verification of model predictive power of the models showed an accuracy rate of 84.1%.

Development of Naïve-Bayes classification and multiple linear regression model to predict agricultural reservoir storage rate based on weather forecast data (기상예보자료 기반의 농업용저수지 저수율 전망을 위한 나이브 베이즈 분류 및 다중선형 회귀모형 개발)

  • Kim, Jin Uk;Jung, Chung Gil;Lee, Ji Wan;Kim, Seong Joon
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.10
    • /
    • pp.839-852
    • /
    • 2018
  • The purpose of this study is to predict monthly agricultural reservoir storage by developing weather data-based Multiple Linear Regression Model (MLRM) with precipitation, maximum temperature, minimum temperature, average temperature, and average wind speed. Using Naïve-Bayes classification, total 1,559 nationwide reservoirs were classified into 30 clusters based on geomorphological specification (effective storage volume, irrigation area, watershed area, latitude, longitude and frequency of drought). For each cluster, the monthly MLRM was derived using 13 years (2002~2014) meteorological data by KMA (Korea Meteorological Administration) and reservoir storage rate data by KRC (Korea Rural Community). The MLRM for reservoir storage rate showed the determination coefficient ($R^2$) of 0.76, Nash-Sutcliffe efficiency (NSE) of 0.73, and root mean square error (RMSE) of 8.33% respectively. The MLRM was evaluated for 2 years (2015~2016) using 3 months weather forecast data of GloSea5 (GS5) by KMA. The Reservoir Drought Index (RDI) that was represented by present and normal year reservoir storage rate showed that the ROC (Receiver Operating Characteristics) average hit rate was 0.80 using observed data and 0.73 using GS5 data in the MLRM. Using the results of this study, future reservoir storage rates can be predicted and used as decision-making data on stable future agricultural water supply.

GAN System Using Noise for Image Generation (이미지 생성을 위해 노이즈를 이용한 GAN 시스템)

  • Bae, Sangjung;Kim, Mingyu;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.6
    • /
    • pp.700-705
    • /
    • 2020
  • Generative adversarial networks are methods of generating images by opposing two neural networks. When generating the image, randomly generated noise is rearranged to generate the image. The image generated by this method is not generated well depending on the noise, and it is difficult to generate a proper image when the number of pixels of the image is small In addition, the speed and size of data accumulation in data classification increases, and there are many difficulties in labeling them. In this paper, to solve this problem, we propose a technique to generate noise based on random noise using real data. Since the proposed system generates an image based on the existing image, it is confirmed that it is possible to generate a more natural image, and if it is used for learning, it shows a higher hit rate than the existing method using the hostile neural network respectively.

Use of Unmanned Aerial Vehicle for Forecasting Pine Wood Nematode in Boundary Area: A Case Study of Sejong Metropolitan Autonomous City (무인항공기를 이용한 소나무재선충병 선단지 예찰 기법: 세종특별자치시를 중심으로)

  • Kim, Myeong-Jun;Bang, Hong-Seok;Lee, Joon-Woo
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.1
    • /
    • pp.100-109
    • /
    • 2017
  • This study was conducted for preliminary survey and management support for Pine Wood Nematode (PWN) suppression. We took areal photographs of 6 areas for a total of 2,284 ha during 2 weeks period from 15/02/2016, and produced 6 ortho-images with a high resolution of 12 cm GSD (Ground Sample Distance). Initially we classified 423 trees suspected for PWN infection based on the ortho-images. However, low accuracy was observed due to the problems of seasonal characteristics of aerial photographing and variation of forest stands. Therefore, we narrowed down 231 trees out of the 423 trees based on the initial classification, snap photos, and flight information; produced thematic maps; conducted field survey using GNSS; and detected 23 trees for PWN infection that was confirmed by ground sampling and laboratory analysis. The infected trees consisted of 14 broad-leaf trees, 5 pine trees (2 Pinus rigida), and 4 other conifers, showing PWN infection occurred regardless of tree species. It took 6 days for 2.3 men from to start taking areal photos using UAV (Unmanned Aerial Vehicle) to finish detecting PNW (Pine Wood Nematode) infected tress for over 2,200 ha, indicating relatively high efficacy.

Assessment of Slope Failures Potential in Forest Roads using a Logistic Regression Model (로지스틱 회귀분석을 이용한 임도붕괴 위험도 평가)

  • Baek, Seung-An;Cho, Koo-Hyun;Hwang, Jin-Sung;Jung, Do-Hyun;Park, Jin-Woo;Choi, Byoungkoo;Cha, Du-Song
    • Journal of Korean Society of Forest Science
    • /
    • v.105 no.4
    • /
    • pp.429-434
    • /
    • 2016
  • Slope failures in forest roads often result in social and economic loss as well as environmental damage. This study was carried out to assess susceptibility of slope failures of forest roads in Hongcheon-gun, Gangwon-do where many slope failures occurred after heavy rainfall in 2013 using GIS and logistic regression analysis. The results showed that sandy soil (6.616) in soil texture type had the highest susceptibility to slope failures while medium class (-3.282) in tree diameter showed the lowest susceptibility. A error matrix for both slope failure and non-slope failure area was made and a model was developed showing a classification accuracy of 74.6%. Non-slope failures area in the forest roads were classified mostly in the range of >0.7 which was higher values than the classification criteria (0.5) used by the logistic regression model. It is suggested that considering forest environment and site factors related to forest road failures would improve the accuracy in predicting susceptibility of slope failures.

Application of Particle Swarm Optimization(PSO) for Prediction of Water Quality in Agricultural Reservoirs of Korea (농업용 저수지의 수질 예측 모델을 위한 PSO(Particle Swarm Optimization) 알고리즘의 적용)

  • Kwon, Yong-Su;Bae, Mi-Jung;Hwang, Soon-Jin;Park, Young-Seuk
    • Korean Journal of Ecology and Environment
    • /
    • v.41 no.spc
    • /
    • pp.11-20
    • /
    • 2008
  • In this study, we applied a Particle Swarm Optimization (PSO) algorithm to predict the changes of chlorophyll-${\alpha}$ related to environmental factors in agricultural reservoirs in Korean national scale. Data were obtained from water quality monitoring networks of reservoirs operated by the Ministry of Agriculture and Forestry and the Ministry of Environment of Korea. From the database of the monitoring networks, 290 reservoirs were chosen with variables such as chlorophyll-${\alpha}$ and 13 environmental factors (COD, TN, TP, Altitude, Bank height, etc.) measured in 2002. Based on Carlson's trophic status index, reservoirs were divided into five groups, and most agricultural reservoirs $(TSI_{CHL}\;64.1%,\;TSI_{TP}\;75.5%)$ were in the eutrophic states. The groups were discriminated with environmental variables, showing that COD, DO, and TP were important factors to determine the trophic states. MLP-PSO (Multilayer perceptron (MLP) with PSO for the optimization) was applied for the prediction of chlorophyll-${\alpha}$ with environment factors, and showed high predictability (r=0.83, p<0.001). Additionally, the sensitivity analysis of the MLP-PSO model showed that COD had the strongest positive effects on the concentration of chlorophyll-${\alpha}$, and followed by TP, TN, DO, whereas altitude and bank height had negative effects on the concentration of chlorophyll-${\alpha}$.