• Title/Summary/Keyword: 분류 회귀 나무

Search Result 72, Processing Time 0.027 seconds

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

An Exploratory Study on the Effect of LCZ Type on Particulate Matter (LCZ 유형이 미세먼지에 미치는 영향에 관한 탐색적 연구)

  • Yeonju Kim;Hansol Mun;Juchul Jung
    • Journal of Environmental Impact Assessment
    • /
    • v.32 no.5
    • /
    • pp.338-352
    • /
    • 2023
  • As of 2019, Korea's fine dust is the most severe among 38 OECD countries, and in the same year, 「the Framework on Disaster and Safety Management」 was revised to define fine dust as a social disaster. Currently, the government is working to achieve its emission reduction goals by preparing a comprehensive fine dust management plan (2022-2023) consisting of a total of five areas, 42 tasks, and 177 detailed tasks. However, it is necessary to come up with measures in consideration of the various spatial characteristics of the city, not just as a source of emission. Therefore, in this study, the shape of the city was classified using the LCZ (Local Climate Zone) classification system into 17 types by building type and land cover type in Busan, and the average annual PM10 and PM2.5 concentration were mapped using the IDW technique. In addition, Fragstats and Moving Window were used to quantify the LCZ classification system. Finally, correlation analysis and regression analysis were conducted to analyze the relationship between the LCZ classification system and PM10 and PM2.5. As a result, it was confirmed that the type of low height of the building and the type of green space with trees had a positive effect on the concentration of PM10 and PM2.5. Therefore, this study is expected to be used as basic data to establish fine dust reduction policies based on efficient spatial planning.

Convergence analysis of determinants affecting on geographic variations in the prevalence of arthritis in Korean women using data mining (데이터마이닝을 이용한 여성 관절염 유병률 소지역 간 변이의 융복합 요인분석)

  • Kim, Yoo-Mi;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.5
    • /
    • pp.277-288
    • /
    • 2015
  • This study aims to identify determinants affecting on geographic variations in the prevalence of arthritis in Korean women using data mining. Data from Korean Community Health Survey 2012 with 249 small districts were analyzed. Socio-demographic, health behavior and status, and morbidity status measures were analyzed using conventional regression model and convergence analysis method such as decision tree for convergence analysis. Rate of workers in agriculture, forestry, and fishing, salaried workers, persons higher than high school graduates, non-treatment of needing care, non-treatment of care because of economic reason, obesity, heavy drunkers, complaining persons of chewing difficulty, persons with experiencing depression, persons with perceiving stress, and persons with diagnosing hypertension and angina pectoris were variation determinants of prevalence of arthritis in 249 small districts and these districts were classified 10 area groups by decision tree model. Our finding suggest that the approach based characteristics by small area groups rather than national wide or individual level would be effective to reduce in variations of prevalence of arthritis.

Group Classification on Management Behavior of Diabetic Mellitus (당뇨 환자의 관리행태에 대한 군집 분류)

  • Kang, Sung-Hong;Choi, Soon-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.2
    • /
    • pp.765-774
    • /
    • 2011
  • The purpose of this study is to provide informative statistics which can be used for effective Diabetes Management Programs. We collected and analyzed the data of 666 diabetic people who had participated in Korean National Health and Nutrition Examination Survey in 2007 and 2008. Group classification on management behavior of Diabetic Mellitus is based on the K-means clustering method. The Decision Tree method and Multiple Regression Analysis were used to study factors of the management behavior of Diabetic Mellitus. Diabetic people were largely classified into three categories: Health Behavior Program Group, Focused Management Program Group, and Complication Test Program Group. First, Health Behavior Program Group means that even though drug therapy and complication test are being well performed, people should still need to improve their health behavior such as exercising regularly and avoid drinking and smoking. Second, Focused Management Program Group means that they show an uncooperative attitude about treatment and complication test and also take a passive action to improve their health behavior. Third, Complication Test Program Group means that they take a positive attitude about treatment and improving their health behavior but they pay no attention to complication test to detect acute and chronic disease early. The main factor for group classification was to prove whether they have hyperlipidemia or not. This varied widely with an individual's gender, income, age, occupation, and self rated health. To improve the rate of diabetic management, specialized diabetic management programs should be applied depending on each group's character.

Analysis of Utilization Characteristics, Health Behaviors and Health Management Level of Participants in Private Health Examination in a General Hospital (일개 종합병원의 민간 건강검진 수검자의 검진이용 특성, 건강행태 및 건강관리 수준 분석)

  • Kim, Yoo-Mi;Park, Jong-Ho;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.1
    • /
    • pp.301-311
    • /
    • 2013
  • This study aims to analyze characteristics, health behaviors and health management level related to private health examination recipients in one general hospital. To achieve this, we analyzed 150,501 cases of private health examination data for 11 years from 2001 to 2011 for 20,696 participants in 2011 in a Dae-Jeon general hospital health examination center. The cluster analysis for classify private health examination group is used z-score standardization of K-means clustering method. The logistic regression analysis, decision tree and neural network analysis are used to periodic/non-periodic private health examination classification model. 1,000 people were selected as a customer management business group that has high probability to be non-periodic private health examination patients in new private health examination. According to results of this study, private health examination group was categorized by new, periodic and non-periodic group. New participants in private health examination were more 30~39 years old person than other age groups and more patients suspected of having renal disease. Periodic participants in private health examination were more male participants and more patients suspected of having hyperlipidemia. Non-periodic participants in private health examination were more smoking and sitting person and more patients suspected of having anemia and diabetes mellitus. As a result of decision tree, variables related to non-periodic participants in private health examination were sex, age, residence, exercise, anemia, hyperlipidemia, diabetes mellitus, obesity and liver disease. In particular, 71.4% of non-periodic participants were female, non-anemic, non-exercise, and suspicious obesity person. To operation of customized customer management business for private health examination will contribute to efficiency in health examination center.

Analysis of Feature Importance of Ship's Berthing Velocity Using Classification Algorithms of Machine Learning (머신러닝 분류 알고리즘을 활용한 선박 접안속도 영향요소의 중요도 분석)

  • Lee, Hyeong-Tak;Lee, Sang-Won;Cho, Jang-Won;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.139-148
    • /
    • 2020
  • The most important factor affecting the berthing energy generated when a ship berths is the berthing velocity. Thus, an accident may occur if the berthing velocity is extremely high. Several ship features influence the determination of the berthing velocity. However, previous studies have mostly focused on the size of the vessel. Therefore, the aim of this study is to analyze various features that influence berthing velocity and determine their respective importance. The data used in the analysis was based on the berthing velocity of a ship on a jetty in Korea. Using the collected data, machine learning classification algorithms were compared and analyzed, such as decision tree, random forest, logistic regression, and perceptron. As an algorithm evaluation method, indexes according to the confusion matrix were used. Consequently, perceptron demonstrated the best performance, and the feature importance was in the following order: DWT, jetty number, and state. Hence, when berthing a ship, the berthing velocity should be determined in consideration of various features, such as the size of the ship, position of the jetty, and loading condition of the cargo.

Exploring Feature Selection Methods for Effective Emotion Mining (효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.107-117
    • /
    • 2019
  • In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.

Context-adaptive Smoothing for Speech Synthesis (음성 합성기를 위한 문맥 적응 스무딩 필터의 구현)

  • 이기승;김정수;이재원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.285-292
    • /
    • 2002
  • One of the problems that should be solved in Text-To-Speech (TTS) is discontinuities at unit-joining points. To cope with this problem, a smoothing method using a low-pass filter is employed in this paper, In the proposed soothing method, a filter coefficient that controls the amount of smoothing is determined according to contort information to be synthesized. This method efficiently reduces both discontinuities at unit-joining points and artifacts caused by undesired smoothing. The amount of smoothing is determined with discontinuities around unit-joins points in the current synthesized speech and discontinuities predicted from context. The discontinuity predictor is implemented by CART that has context feature variables. To evaluate the performance of the proposed method, a corpus-based concatenative TTS was used as a baseline system. More than 6075 of listeners realized that the quality of the synthesized speech through the proposed smoothing is superior to that of non-smoothing synthesized speech in both naturalness and intelligibility.

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

  • 이기승;김정수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.135-144
    • /
    • 2003
  • A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

Ensemble Machine Learning Model Based YouTube Spam Comment Detection (앙상블 머신러닝 모델 기반 유튜브 스팸 댓글 탐지)

  • Jeong, Min Chul;Lee, Jihyeon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.5
    • /
    • pp.576-583
    • /
    • 2020
  • This paper proposes a technique to determine the spam comments on YouTube, which have recently seen tremendous growth. On YouTube, the spammers appeared to promote their channels or videos in popular videos or leave comments unrelated to the video, as it is possible to monetize through advertising. YouTube is running and operating its own spam blocking system, but still has failed to block them properly and efficiently. Therefore, we examined related studies on YouTube spam comment screening and conducted classification experiments with six different machine learning techniques (Decision tree, Logistic regression, Bernoulli Naive Bayes, Random Forest, Support vector machine with linear kernel, Support vector machine with Gaussian kernel) and ensemble model combining these techniques in the comment data from popular music videos - Psy, Katy Perry, LMFAO, Eminem and Shakira.