Search | Korea Science

Measuring Quality of Life in Cerebral Palsy Children According to the Severity Using the Visual Analogue Scale, Time Trade-Off, and EQ-5D-Y Proxy (뇌성마비 환아 중증도별 시각화척도, 시간교환법, EQ-5D-Y Proxy를 이용한 삶의 질 측정)

Lee, Go-Eun;Kim, Nam Kwen;Yun, Young Ju;Wang, Hye Min;Kim, Jeong Hun;Lee, Dong Hyo
- Journal of Oriental Neuropsychiatry
- /
- v.28 no.2
- /
- pp.49-59
- /
- 2017
Objectives: To measure the quality of life in patients according to virtual cerebral palsy severity by using the Korean version of EQ-5D-Y proxy, Visual Analogue Scale (VAS), and Time Trade-Off method (TTO). Methods: The study was conducted in parents of children and adolescents aged 4 to 15 years in Seoul. We analyzed the difference in the utility value according to five levels of cerebral palsy severity in the Gross Motor Function Classification System (GMFCS) and test-retest reliability. Results: 1. There were significant differences in VAS, TTO, and EQ-5D-Y proxy according to the cerebral palsy severity (p<.001). 2. VAS was significantly different according to the respondent's visit to the medical institution, the presence of disease in the respondent, a visit to the child's medical institution, the age of the child, and the sex of the child. The value of TTO was significantly different according to the respondent's visit to the medical institution, respondent's sex, and the age of the child. Also, EQ-5D-Y proxy was significantly different according to the age of the child. 3. Intraclass correlation coefficient values were more than 0.6 for both VAS and TTO at all stages. But for the EQ-5D-Y proxy, the value was less than 0.6 at all stages. Conclusions: The quality of life assessment using EQ-5D-Y proxy showed significant differences in the severity of cerebral palsy. However, large-scale studies using EQ-5D-Y proxy are needed because of low test-retest reliability.
https://doi.org/10.7231/jon.2017.28.2.049 인용 PDF KSCI KPUBS

Deep learning based crack detection from tunnel cement concrete lining (딥러닝 기반 터널 콘크리트 라이닝 균열 탐지)

Bae, Soohyeon;Ham, Sangwoo;Lee, Impyeong;Lee, Gyu-Phil;Kim, Donggyou
- Journal of Korean Tunnelling and Underground Space Association
- /
- v.24 no.6
- /
- pp.583-598
- /
- 2022
As human-based tunnel inspections are affected by the subjective judgment of the inspector, making continuous history management difficult. There is a lot of deep learning-based automatic crack detection research recently. However, the large public crack datasets used in most studies differ significantly from those in tunnels. Also, additional work is required to build sophisticated crack labels in current tunnel evaluation. Therefore, we present a method to improve crack detection performance by inputting existing datasets into a deep learning model. We evaluate and compare the performance of deep learning models trained by combining existing tunnel datasets, high-quality tunnel datasets, and public crack datasets. As a result, DeepLabv3+ with Cross-Entropy loss function performed best when trained on both public datasets, patchwise classification, and oversampled tunnel datasets. In the future, we expect to contribute to establishing a plan to efficiently utilize the tunnel image acquisition system's data for deep learning model learning.
https://doi.org/10.9711/KTAJ.2022.24.6.583 인용 PDF KSCI

Design and Implementation of OpenCV-based Inventory Management System to build Small and Medium Enterprise Smart Factory (중소기업 스마트공장 구축을 위한 OpenCV 기반 재고관리 시스템의 설계 및 구현)

Jang, Su-Hwan;Jeong, Jopil
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.19 no.1
- /
- pp.161-170
- /
- 2019
Multi-product mass production small and medium enterprise factories have a wide variety of products and a large number of products, wasting manpower and expenses for inventory management. In addition, there is no way to check the status of inventory in real time, and it is suffering economic damage due to excess inventory and shortage of stock. There are many ways to build a real-time data collection environment, but most of them are difficult to afford for small and medium-sized companies. Therefore, smart factories of small and medium enterprises are faced with difficult reality and it is hard to find appropriate countermeasures. In this paper, we implemented the contents of extension of existing inventory management method through character extraction on label with barcode and QR code, which are widely adopted as current product management technology, and evaluated the effect. Technically, through preprocessing using OpenCV for automatic recognition and classification of stock labels and barcodes, which is a method for managing input and output of existing products through computer image processing, and OCR (Optical Character Recognition) function of Google vision API. And it is designed to recognize the barcode through Zbar. We propose a method to manage inventory by real-time image recognition through Raspberry Pi without using expensive equipment.
https://doi.org/10.7236/JIIBC.2019.19.1.161 인용 PDF KSCI HTML

The Molecular Insight into the Vascular Endothelial Growth Factor in Cancer: Angiogenesis and Metastasis (암의 혈관내피 성장인자에 대한 분자적 통찰: 혈관신생과 전이)

Han Na Lee;Chae Eun Seo;Mi Suk Jeong;Se Bok Jang
- Journal of Life Science
- /
- v.34 no.2
- /
- pp.128-137
- /
- 2024
This review discusses the pivotal role of vascular endothelial growth factors (VEGF) in angiogenesis and lymphangiogenesis, vital processes influencing vascular permeability, endothelial cell recruitment, and the maintenance of tumor-associated blood and lymphatic vessels. VEGF exerts its effects through tyrosine-kinase receptors, VEGFR-1, VEGFR-2, and VEGFR-3. This VEGF-VEGFR system is central not only to cancer but also to diseases arising from abnormal blood vessel and lymphatic vessel formation. In the context of cancer, VEGF and its receptors are essential for the development of tumor-associated vessels, making them attractive targets for therapeutic intervention. Various approaches, such as anti-VEGF antibodies, receptor antagonists, and VEGF receptor function inhibitors, are being explored to interfere with tumor growth. However, the clinical efficacy of anti-angiogenic agents remains uncertain and necessitates further refinement. The article also highlights the physiological role of VEGFs, emphasizing their involvement in endothelial cell functions, survival, and vascular permeability. The identification of five distinct VEGFs in humans (VEGF-A, VEGF-B, VEGF-C, VEGF-D, and PLGF) is discussed, along with the classification of VEGFRs as typical receptor tyrosine kinases with distinct signaling systems. The family includes VEGFR-1 and VEGFR-2, crucial in tumor biology and angiogenesis, and VEGFR-3, specifically involved in lymphangiogenesis. Overall, this review has provided a comprehensive overview of VEGF and VEGFR, detailing their roles in various diseases, including cancer. This is expected to further facilitate the utilization of VEGF and VEGFR as therapeutic targets.
https://doi.org/10.5352/JLS.2024.34.2.128 인용 PDF HTML

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

Kim, Myung-Jong
- Journal of Intelligence and Information Systems
- /
- v.16 no.4
- /
- pp.99-112
- /
- 2010
Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.
PDF KSCI

A Study on the Application of IUCN Global Ecosystem Typology Using Land Cover Map in Korea (토지피복지도를 활용한 IUCN 생태계유형분류 국내 적용)

Hee-Jung Sohn;Su-Yeon Won;Jeong-Eun Jeon;Eun-Hee Park;Do-Hee Kim;Sang-Hak Han;Young-Keun Song
- Korean Journal of Environment and Ecology
- /
- v.37 no.3
- /
- pp.209-220
- /
- 2023
Over the past few centuries, widespread changes to natural ecosystems caused by human activities have severely threatened biodiversity worldwide. Understanding changes in ecosystems is essential to identifying and managing threats to biodiversity. In line with this need, the IUCN Council formed the IUCN Global Ecosystem Typology (GET) in 2019, taking into account the functions and types of ecosystems. The IUCN provides maps of 10 ecosystem groups and 108 ecological functional groups (EFGs) on a global scale. According to the type classification of IUCN GET ecosystems, Korea's ecosystem is classified into 8 types of Realm (level 1), 18 types of Biome (level 2), and 41 types of Group (level 3). GETs provided by IUCN have low resolution and often do not match the actual land status because it was produced globally. This study aimed to increase the accuracy of Korean IUCN GET type classification by using land cover maps and producing maps that reflected the actual situation. To this end, we ① reviewed the Korean GET data system provided by IUCN GET and ② compared and analyzed it with the current situation in Korea. We evaluated the limitations and usability of the GET through the process and then ③ classified Korea's new Get type reflecting the current situation in Korea by using the national data as much as possible. This study classified Korean GETs into 25 types by using land cover maps and existing national data (Territorial realm: 9, Freshwater: 9, Marine-territorial: 5, Terrestrial-freshwater: 1, and Marine-freshwater-territorial: 1). Compared to the existing map, "F3.2 Constructed lacustrine wetlands", "F3.3 Rice paddies", "F3.4 Freshwater aquafarms", and "T7.3 Plantations" showed the largest area reduction in the modified Korean GET. The area of "T2.2 Temperate Forests" showed the largest area increase, and the "MFT1.3 Coastal saltmarshes and reedbeds" and "F2.2 Small permanent freshwater lakes" types also showed an increase in GET area after modification. Through this process, the existing map, in which the sum of all EFGs in the existing GET accounted for 8.33 times the national area, was modified so that the total sum becomes 1.22 times the national area using the land cover map. This study confirmed that the existing EFG, which had small differences by type and low accuracy, was improved and corrected. This study is significant in that it produced a GET map of Korea that met the GET standard using data reflecting the field conditions.
https://doi.org/10.13047/KJEE.2023.37.3.209 인용 PDF

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

Lee, Mo-Se;Ahn, Hyunchul
- Journal of Intelligence and Information Systems
- /
- v.24 no.1
- /
- pp.167-181
- /
- 2018
Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.
https://doi.org/10.13088/jiis.2018.24.1.167 인용 PDF KSCI

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
- Journal of Intelligence and Information Systems
- /
- v.18 no.2
- /
- pp.143-156
- /
- 2012
People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.
https://doi.org/10.13088/jiis.2012.18.2.143 인용 PDF KSCI

A Study on Status of Utilization and The Related Factors of Primary Medical Care in a Rural Area (일부 농촌지역의 일차의료이용실태와 그 관련요인에 관한 연구)

Wie, Cha-Hyung
- Journal of agricultural medicine and community health
- /
- v.20 no.2
- /
- pp.157-168
- /
- 1995
This study was carried out, through analyzing the annual reports(year of 1973-1993) on health status of Su Dong-Myun, and specific survey data of 332 households(Su Dong-Myun 209, Byul Nae-Myun 123), located in Nam Yang Ju-Si, Kyung Gi-Do, from July 20 to July 31, 1995, to find out more effective means for primary medical care in a rural area. The results were as fellows : 1. Number of population in Su Dong-Myun was 5,419 in 1973, 4,591(the lowest) in 1987 and 5,707 in 1995. In the composition rate of population, "0-14" of age group showed markedly decreasing tendency from 43.1% in 1975, to 19.1% in 1995, however "65 and over" markedly in creasing tendency form 5.3% in 1975 to 9.8% in 1995. 2. Annual utilization rate per 1,000 inhabitants in Su Dong-Myun showed markedly increasing tendency from 1973 to 1977 such as 343 in 1973, 540 in 1975, 900 in 1977. However, since 1979, the rate showed rapidly decreasing tendency, such as 846 in 1979, 519 in 1985, 190 in 1991 and 1993. 3. The morbid household rate per year was 53.6% of respondents and the rate per 15 days was 48.2%. In disease classification rate of morbid household per year, Arthralgia & Neuralgia was the highest rate(33.9%) and gastro-intestinal disorder(19.3%), Cough(11,9%), Hypertension(7.8%), Accident(3.2%) in next order. 4. In the utilizing facilities for Primary Medical Care, Medical facilities was showed the highest rate(58.1% of respondents) and Pharmacy and Drug Shp(33.1%), Tradition Method(4.0%) in next order. In the Medical facilities, General private clinic was showed the highest rate(34.3%) and specific private Clinic(22.3%), Hospital(19.0%), Health (Sub)center(16.3%), Nurse practitioner (3.3%), Oriental hospital and clinic(2.7%) in next order. 5. Experience rate, utilizing health subcenter was 51.8% of the respondents, and it was 55.0% in Su Dong-Myun and 46.3% in Byul Nae-Myun. In utilization times of health subcenter, times-rate showed next orders such as 1-2 times/6months(31.6%), 1-2 times/year (22.1%), 1-2 times/months(19.2%), 1-2 times/3months(15.6%). 6. In objectives, visiting Health Subcenter, Medical Care was the highest rate(59.8% of the respondents) and health control(23.3%) was in next order. In Medical Care, Primary Care by general physician was higher rate(51.1%) almost all. In the Health control, Immunization too was high rate(18.0%) in health control activities. 7. The reasons rate, utilizing health subcenter showed next order, such as distance to Medical facilities(33.0% of the respondents), Medical Cost(28.1%), Simple process of consultation (10.8%), Effectiveness of cure(7.6%), Function of primary medical care(7.0%) and Attitude of physician(6.5%). 8. In the affecting factors to utilization of primary medical facilities, medical needs was showed the highest rate(29.5% of the respondents) and medical cost(15.4%), distance to medical facilities(14.2%), traffic vehicle(14.2%) and farm work(6.9%) in next order. 9. In the priority between 'daily farm work,' and 'primary medical care', only 46.4% of respondents answered that primary health care is more important than the daily farm work The 22.6% of respondents answered 'daily farm work', and the 12.3% answered 'the equal of the both'. 10. In the criterion of medical facilities choice, medical knowledge and technical quality was showed the highest rate(56.3%), distance or time to medical facilities(10.9%), sincerity and kindness of physician(9.4%), medical cost(8.7%) and traffic vehicle(6.5%) in next order 11. In the advise for improvement of health subcenter function, the 36.1% of respondents answered that 'enforcement of medical personnel and equipment' was required, and then 'improved medical technology'(25.5%), 'good attitude of physician'(14.9%), 'improved medical system'(13.3%), 'enforced drug'(6.7%) in next order. 12. The study on affecting factors to utilization of primary medical facilities was very difficult subject to systematize the analyzed results, due to a prejudice of protocol planner, surveyer and respondent, and variety and overlapping of subject matter.
PDF

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

Min, Sung-Hwan
- Journal of Intelligence and Information Systems
- /
- v.20 no.4
- /
- pp.121-139
- /
- 2014
Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.
https://doi.org/10.13088/jiis.2014.20.4.121 인용 PDF KSCI

Search Result 529, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)