• Title/Summary/Keyword: method validation

Search Result 3,097, Processing Time 0.04 seconds

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Effect of the Suicide Prevention Program to the Impulsive Psychology of the Elementary School Student (자살예방 프로그램이 초등학교 충동심리에 미치는 영향)

  • Kang, Soo Jin;Kang, Ho Jung;Cho, Won Cheol;Lee, Tae Shik
    • Journal of Korean Society of Disaster and Security
    • /
    • v.6 no.1
    • /
    • pp.65-72
    • /
    • 2013
  • In this study, the early suicide prevention program was applied to the elementary school students and compared the prior & post effect of the program, and verified the status of psychology change like emotional status, or temptation to take a suicide, and presented the possibility as a suicide prevention program. The period of adolescence is the very unstable period in the process of growth being cognitively immature, emotionally impulsive period. It is the period emotionally unstable and unpredictable possible to select the method of suicide as an extreme method to escape the reality, or impulsive problem solving against small conflict or dispute situation. Many stress of the student such as recent nuclear family, expectation of parents to their children, education problem, socio-environmental elements, individual psychological factor lead students to the extreme activity of suicide in recent days. In this study, the scope of stress experienced in the elementary school as well as idea and degree of temptation regarding suicide by the suicide prevention program were identified, and through prevention program such as meditation training, breath training and through experience of anger control, emotion-expression, self overcome and establish positive self-identity and make understanding Self-control, Self-esteem & preciousness of life based on which the effect to suicide prevention was analyzed. The study was made targeting 51 students of 2 classes of 6th grade of elementary school of Goyang-si and processed 30 minutes every morning focused on through experience & activity of the principle & method of brain science. The data was collected for 20 times before starting morning class by using Suicide Probability Scale(herein SPS-A) designed to predict effectively suicide Probability, suicide risk prediction scale, surveyed by 7 areas such as Positive outlook, Within the family closeness, Impulsivity, Interpersonal hostility, Hopelessness, Hopelessness syndrome, suicide accident. Analytical methods and validation was used the Wilcoxon's signed rank test using SPSS Program. Though the process of program in short period, but there was a effective and positive results in the 7 areas in the average comparison. But in the t-test result, there was a different outcome. It indicated changes in the 3 questionnaires (No.7, No.14, No.19) out of 31 SPS-A questionnaires, and there was a no change to the rest item. It also indicated more changes of the students in the class A than class B. And in case of the class A students, psychological changes were verified in the areas of Hopelessness syndrome, suicide accident among 7 areas after the program was processed. Through this study, it could be verified that different results could be derived depending on the Student tendency, program professional(teacher in charge, processing lecturer). The suicide prevention program presented in this article can be a help in learning and suicide prevention with consistent systematization, activation through emotion and impulse control based on emotional stress relief and positive self-identity recovery, stabilization of brain waves, and let the short period program not to be died out but to be continued connecting from childhood to adolescence capable to make surrounding environment for spiritual, physical healthy growth for which this could be an effective program for suicide prevention of the social problem.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

An Operations Study on a Home Health Nursing Demonstration Program for the Patients Discharged with Chronic Residual Health Care Problems (추후관리가 필요한 만성질환 퇴원환자 가정간호 시범사업 운영 연구)

  • 홍여신;이은옥;이소우;김매자;홍경자;서문자;이영자;박정호;송미순
    • Journal of Korean Academy of Nursing
    • /
    • v.20 no.2
    • /
    • pp.227-248
    • /
    • 1990
  • The study was conceived in relation to a concern over the growing gap between the needs of chronic patients and the availability of care from the current health care system in Korea. Patients with agonizing chronic pain, discomfort, despair and disability are left with helplessly unprepared families with little help from the acute care oriented health care system after discharge from hospital. There is a great need for the development of an alternative means of quality care that is economically feasible and culturally adaptible to our society. Thus, the study was designed to demonstrate the effectiveness of home heath care as an alternative to bridge the existing gap between the patients' needs and the current practice of health care. The study specifically purports to test the effects of home care on health expenditure, readmission, job retention, compliance to health care regime, general conditions, complications, and self-care knowledge and practices. The study was guided by the operations research method advocated by the Primary Health Care Operations Research Institute(PRICOR) which constitutes 3 stages of research : namely, problem analysis solution development, and solution validation. The first step in the operations research was field preparation to develop the necessary consensus and cooperation. This was done through the formation of a consulting body at the hospital and a steering committee among the researchers. For the stage of problem analysis, the Annual Report of Seoul National University Hospital and the patients records for last 5 years were reviewed and selective patient interviews were conducted to find out the magnitude of chronic health problems and areas of unmect health care needs to finally decide on the kinds of health problems to study. On the basis of problem analysis, the solution development stage was devoted to home care program development asa solution alternative. Assessment tools, teaching guidelines and care protocols were developed and tested for their validity. The final stage was the stage of experimentation and evaluation. Patients with liver diseases, hemiplegic and diabetic conditions were selected as study samples. Discharge evaluation, follow up home care, measurement and evaluation were carried out according to the protocols of care and measurement plan for each patient for the period of 6 months after discharge. The study was carried out for the period from Jan. 1987 to Dec. 1989. The following are the results of the study presented according to the hypotheses set forth for the study ; 1. Total expenditures for the period of study were not reduced for the experimental group, however, since the cost per hospital visit is about 4 times as great as the cost per home visit, the effect of cost saving by home care will become a reality as home care replaces part of the hospital visits. 2. The effect on the rate of readmission and job retention was found to be statistically nonsignificant though the number of readmission was less among the experimental group receiving home care. 3. The effect on compliance to the health care regime was found to be statistically significant at the 5% level for hepatopathic and diabetic patients. 4. Education on diet, rest and excise, and medication through home care had an effect on improved liver function test scores, prevention of complications and self - care knowledge in hepatopathic patients at a statistically significant level. 5. In hemiplegic patient, home care had an effect on increased grasping power at a significant level. However. there was no significant difference between the experimental and control groups in the level of compliane, prevention of complications or in self-care practices. 6. In diabetic patients, there was no difference between the experimental and control groups in scores of laboratory tests, appearance of complications, and self-care knowledge or self -care practices. The above findings indicate that a home care program instituted for such short term as 6 months period could not totally demonstrate its effectiveness at a statistically significant level by quantitative analysis however, what was shown in part in this analysis, and in the continuous consultation sought by those who had been in the experimental group, is that home health care has a great potential in retarding or preventing pathological progress, facilitating rehabilitative and productive life, and improving quality of life by adding comfort, confidence and strength to patients and their families. For the further studies of this kind with chronic patients it is recommended that a sample of newly diagnosed patients be followed up for a longer period of time with more frequent observations to demonstrate a more dear- cut picture of the effectiveness of home care.

  • PDF

Role of enzyme immunoassay for the Detection of Helicobacter pylori Stool Antigen in Confirming Eradication After Quadruple Therapy in Children (소아에서 4제요법 후 enzyme immunoassay에 의한 Helicobacter pylori 대변 항원 검출법의 유용성에 대한 연구)

  • Yang, Hye Ran;Seo, Jeong Kee
    • Pediatric Gastroenterology, Hepatology & Nutrition
    • /
    • v.7 no.2
    • /
    • pp.153-162
    • /
    • 2004
  • Purpose: The Helicobacter pylori stool antigen (HpSA) enzyme immunoassay is a non-invasive test for the diagnosis and monitoring of H. pylori infection. But, there are few validation studies on the HpSA test after eradication in children. The aim of this study was to assess the diagnostic accuracy of HpSA enzyme immunoassay for the detection of H. pylori to confirm eradication in children. Methods: From January 2001 to October 2003, 164 tests were performed in 146 children aged 1 to 17.5 years (mean $9.3{\pm}4.3$ years). H. pylori infection was confirmed by endoscopy-based tests (rapid urease test, histology, and culture). All H. pylori infected children were treated with quadruple regimens (Omeprazole, amoxicillin, metronidazole and bismuth subcitrate for 7 days). Stool specimens were collected from all patients for the HpSA enzyme immunoassay (Primier platinum HpSA). The results of HpSA tests were interpreted as positive for $OD{\geq}0.160$, unresolved for $$0.140{\leq_-}OD$$<0.160, and negative for OD<0.140 at 450 nm on spectrophotometer. Results: 1) One hundred thirty-one HpSA tests were performed before treatment. The result of HpSA enzyme immunoassay showed three false positive cases and one false negative case. The sensitivity, specificity, positive predictive value, and negative predictive value of HpSA enzyme immunoassay before treatment were 96.4%, 97.1%, 90%, and 99%, respectively. 2) Thirty-three HpSA enzyme immunoassay were performed at least 4 weeks after eradication therapy. The results of HpSA enzyme immunoassay showed two false positive cases and one false negative case. The sensitivity, specificity, positive predictive value, and negative predictive value after treatment were 88.9%, 91.7%, 80%, and 95.7%, respectively. Conclusion: Diagnostic accuracy of the HpSA enzyme immunoassay after eradication therapy was as high as that of the HpSA test before eradication therapy. The HpSA enzyme immunoassay was found to be a useful non-invasive method to confirm H. pylori eradication in children.

  • PDF

Accuracy evaluation of microwave water surface current meter for measurement angles in middle flow condition (전자파표면유속계의 측정 각도에 따른 평수기 유속 측정 정확도 분석)

  • Son, Geunsoo;Kim, Dongsu;Kim, Kyungdong;Kim, Jongmin
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.1
    • /
    • pp.15-27
    • /
    • 2020
  • Streamflow discharge as a fundamental riverine quantity plays a crucial role in water resources management, thereby requiring accurate in-situ measurement. Recent advances in instrumentations for the streamflow discharge measurement has complemented or substituted classical devices and methods. Among various potential methods, surface current meter using microwave has increasingly begun to be applied not only for flood but also normal flow discharge measurement, remotely and safely enabling practitioners to measure flow velocity postulating indirect contact. With minimized field preparedness, this method facilitated and eased flood discharge measurement in the difficult in-situ conditions such as extreme flood in active ways emitting 24.125 GHz microwave without relying on natural lights. In South Korea, a rectangular shaped instrument named with Microwave Water Surface Current Meter (MWSCM) has been developed and commercially released around 2010, in which domestic agencies charging on streamflow observation shed lights on this approach regarding it as a potential substitute. Considering this brand-new device highlighted for efficient flow measurement, however, there has been few noticeable efforts in systematic and comprehensive evaluation of its performance in various measurement and riverine conditions that lead to lack in imminent and widely spreading usages in practices. This study attempted to evaluate the MWSCM in terms of instrumen's monitoring configuration particularly regarding tilt and yaw angle. In the middle of pointing the measurement spot in a given cross-section, the observation campaign inevitably poses accuracy issues related with different tilt and yaw angles of the instrument, which can be a conventionally major source of errors for this type of instrument. Focusing on the perspective of instrument configuration, the instrument was tested in a controlled outdoor river channel located in KICT River Experiment Center with a fixed flow condition of around 1 m/s flow speed with steady flow supply, 6 m of channel width, and less than 1 m of shallow flow depth, where the detailed velocity measurements with SonTek micro-ADV was used for validation. As results, less than 15 degree in tilting angle generated much higher deviation, and higher yawing angle proportionally increased coefficient of variance. Yaw angles affected accuracy in terms of measurement area.

A study on the efficient application of the replicating portfolio according to the tax imposition within K-OTC market for activating financial transactions of small-medium and venture business (중소 벤처 기업의 금융거래 활성화를 위하여 K-OTC 시장에서 조세부과에 따른 복제포트폴리오의 효율적 활용에 대한 연구)

  • Yoo, Joon-soo
    • Journal of Venture Innovation
    • /
    • v.1 no.1
    • /
    • pp.83-98
    • /
    • 2018
  • This paper makes a theoretical approach to the differences between transaction tax and capital gains tax when the financial instruments are traded and imposed taxes in K-OTC market, a newly emerging off-board market. Since it is difficult to reduce risk to the level which investors would like to pursue - depending on the taxation methods of portfolio-composed financial instruments - when it comes to forming a synthetic bond to hedge risk, this paper also seeks for effective taxation methods to make this applicable. First of all, to thoroughly review the taxation balance of synthetic bonds, this paper analyzed the effects of the transaction tax and capital gains tax imposed upon synthetic bonds according to the changes in final stock price and strike price in K-OTC market, and analyzed after-tax profit differences among them depending on whether income tax deduction took place or not. As a result of the research upon the tax gap in transaction tax and capital gains tax according to the changes of final stock prices, it was shown that imposing transaction tax is more likely to be effective for some level of risk hedging with replicating portfolio considering taxation policies and financial markets, since the effect of the transaction tax has a much lower tax gap than that of capital gains tax. In addition, in relation to whether income tax deduction was permitted or not, it was proved that the effect of the transaction tax and the capital gains tax vary depending on the variation in the strike price. Above all, it was shown that if the strike price is lower than the stock price, the transaction tax will be less affected by the existence of income tax deduction than the capital gains tax, while both will be equally affected by the existence of income tax deduction if the strike price is higher than the stock price. Further study would be to demonstrate the validation of this in the K-OTC market with actual financial instruments and, also, to seek for a more systematic hedging method by using a ratio analysis approach to the calculation of the option transaction tax

Monitoring Ochratoxin A in Coffee and Fruit Products in Korea (커피 및 과실류 가공품의 오크라톡신 A 모니터링)

  • Park, Ji-Eun;Heo, Seok;Lee, Mi-Seon;Kim, Eun-Jung;Park, Jong-Seok;Oh, Jae-Ho;Jang, Young-Mi;Kim, Mee-Hye
    • Korean Journal of Food Science and Technology
    • /
    • v.42 no.3
    • /
    • pp.263-268
    • /
    • 2010
  • This research was conducted to evaluate the occurrence of ochratoxin A (OTA) in coffee and fruit products in Korea. A total of 388 coffee and fruit product samples were collected from retail or outlet markets; 177 samples were coffee and 211 were fruits or their products. Analytical methods including AOAC and Comit$\acute{e}$ Europ$\acute{e}$en de Normalisation (CEN) were selected and modified by method validation to detect and quantify the OTA in samples. All samples were analyzed by liquid chromatography with fluorescence detection. OTA was detected in 3.9% of 177 kinds of coffee and 0% of 211 kinds of fruit products. The levels of OTA were $0.7-4.6\;{\mu}g/kg$ in green coffee, $0.3-4.8\;{\mu}g/kg$ in roasted coffee, $1.4\;{\mu}g/kg$ in mixed coffee, and $0.4-0.6\;{\mu}g/kg$ in instant coffee. However, OTA was not detected in liquid coffee, dried fruits, or grape juice. OTA levels of all samples detected were less than the European Union legislation of $5.0\;{\mu}g/kg$ in coffee, $10.0\;{\mu}g/kg$ in raisins and $2.0\;{\mu}g/kg$ in grape juice. Therefore, the risk of OTA in coffee and fruit products in Korea is relatively low at safe levels.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.