• Title/Summary/Keyword: Information classification

Search Result 8,304, Processing Time 0.047 seconds

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

Analysis of Urban Heat Island Intensity Among Administrative Districts Using GIS and MODIS Imagery (GIS 및 MODIS 영상을 활용한 행정구역별 도시열섬강도 분석)

  • SEO, Kyeong-Ho;PARK, Kyung-Hun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.2
    • /
    • pp.1-16
    • /
    • 2017
  • This study was conducted to analyze the urban heat island(UHI) intensity of South Korea by using Moderate Resolution Imaging Spectroradiometer(MODIS) satellite imagery. For this purpose, the metropolitan area was spatially divided according to land cover classification into urban and non-urban land. From the analysis of land surface temperature(LST) in South Korea in the summer of 2009 which was calculated from MODIS satellite imagery it was determined that the highest temperature recorded nationwide was $36.0^{\circ}C$, lowest $16.2^{\circ}C$, and that the mean was $24.3^{\circ}C$, with a standard deviation of $2.4^{\circ}C$. In order to analyze UHI by cities and counties, UHI intensity was defined as the difference in average temperature between urban and non-urban land, and was calculated through RST1 and RST2. The RST1 calculation showed scattered distribution in areas of high UHI intensity, whereas the RST2 calculation showed that areas of high UHI intensity were concentrated around major cities. In order to find an effective method for analyzing UHI by cities and counties, analysis was conducted of the correlation between the urbanization ratio, number of tropical heat nights, and number of heat-wave days. Although UHI intensity derived through RST1 showed barely any correlation, that derived through RST2 showed significant correlation. The RST2 method is deemed as a more suitable analytical method for measuring the UHI of urban land in cities and counties across the country. In cities and counties with an urbanization ratio of < 20%, the rate of increase for UHI intensity in proportion to increases in urbanization ratio, was very high; whereas this rate gradually declined when the urbanization ratio was > 20%. With an increase of $1^{\circ}C$ in RST2 UHI intensity, the number of tropical heat nights and heat wave days was predicted to increase by approximately five and 0.5, respectively. These results can be used for reference when predicting the effects of increased urbanization on UHI intensity.

An Comparison Analysis of Science Writing Tasks in the Chemistry Domain of Middle School Science Textbooks Developed under the 2007 & the 2009 Revised National Curriculums (RNC) (2007 개정·2009 개정 중학교 과학 교과서 화학영역에 사용된 과학 글쓰기 문항의 비교 분석)

  • Lee, Gyu Hui;Hong, Hun-Gi
    • Journal of the Korean Chemical Society
    • /
    • v.58 no.6
    • /
    • pp.600-611
    • /
    • 2014
  • In this study, we sampled science writing tasks and investigated their frequency of use shown in the chemistry domain from two sets of 18 middle school science textbooks developed under the 2007 Revised National Curriculum(RNC) and the 2009 RNC, respectively. In addition, we categorized the sampled science writing tasks depending on the cognitive process and type of writing and compared with the results obtained from analysis of global issues presented in the science writings. From the textbooks developed under the 2007 RNC, a total of 183 science writing tasks were identified in which 10.17 tasks per textbook and 1.32 tasks per 10 pages were used averagely. A total of 168 were identified from the textbooks for the 2009 RNC. Among them, 9.33 tasks per textbook and 1.23 tasks per 10 pages were used on average. Comparing with these results, the average frequency of use of the tasks per textbook and per ten pages were decreased, respectively. Moreover, the number of science writing tasks were found in each curriculum varied considerably depending on the units and the publishers, and that the writing tasks were mainly arranged in the finale, wrapping up stage. In the analysis of science writing tasks according to the cognitive process, the highest and lowest frequency of use were observed in the category of 'understand' and 'remember', respectively. According to the classification of science writing tasks based on the types of writing, the writings for the information delivery were most used and the highest frequency of use was observed in the category of 'understand' of the cognitive process belonging to 'information delivery'. As for the results of the analysis of global issues, the number of science writing tasks including global issues increased from 21(11.48%) in the 2007 RNC to 33(19.64%) in the 2009 RNC. Furthermore, science writing tasks associated with protection of environment showed the highest frequency of use in the both curriculums, and it was analyzed that the materials of global issues used in the 2009 RNC were much more diverse.

A Study on the Service Quality Improvement by Kano Model & Weighted Potential Customer Satisfaction Index (Kano 모델 및 가중 PCSI를 통한 서비스품질 개선에 관한 연구)

  • Kim, Sang-Cheol
    • Journal of Distribution Science
    • /
    • v.8 no.4
    • /
    • pp.17-23
    • /
    • 2010
  • The Banking industry is expanding rapidly. To keep the competitive advantages, participating companies concentrate their resource to provide the distinguishable services by increasing the service quality. This study is to find that how three kinds of service quality(process, output, and service environment) affect on the customer satisfaction. In this paper, WPCSI (Weighted Potential Customer Satisfaction Index) was developed using Kano model and PCSI. Kano's model of service quality classification was used to improve customer satisfaction, customer satisfaction index was calculated. Customer satisfaction index was calculated using the existing potential for improving customer satisfaction index (PCSI Index) to complement the limitations of the weighted potential improve customer satisfaction index (WPCSI) were used. Analysis using PCSI improve the quality of service levels may be useful in assessing. However, this figure is a marginal degree of importance on customers and quality characteristics have been overlooked but has its problems. A service provided to customers with some important differences depending on the interpretation of the scope for improvement is to be classified. In other words, the level of customer satisfaction and the satisfaction of the current difference between the comparison factor for the company to provide information about the priority of the improvement was not significant. Companies are also considered important that the customer does not consider the uniform quality of service provided can be fallible. In this study, the weighted potential to improve it improve customer satisfaction index (WPCSI) proposed a new customer satisfaction index. This is for customers to recognize the importance of quality characteristics by weighting factors, to identify practical and improved priority to provide more useful information than has been. Weighted potentially improve customer satisfaction index (WPCSI) presented in this study by the customers aware of the importance of considering the quality factor is an exponent. The results, 'Employees' working ability', 'provided the desired service level', 'staff to handle this task quickly enough' to the customer of the factors had significant effects on satisfaction are met. On the other hand 'aggressiveness on the product description of employees', 'service environment as a whole, beautiful enough to' meet and shows no significant difference between satisfaction. But 'aggressiveness on the product description of employees' and reverse (逆) were attributable to the quality. Small dogs and overly aggressive products that encourage the customer dissatisfaction that can result in widening should be careful because the quality factor can be said. As a result, WPCSI is more effect to find critical factors which can affect customer satisfaction than PCSI. After that, we discuss effects and advantages of customer satisfaction using WPCSI. This study, along with these positive aspects, the limitations are implied. First, this study directly to the bank so that I could visit any other way for customers, utilizing the Internet or mobile to take advantage of the respondents were excluded from the analysis. Second, in survey questionnaires can help improve understanding of the measures will be taken. In addition to the survey targeted mainly focused on Seoul, according to a sample, so sampling can cause problems is the viscosity revealed intends.

  • PDF

Study of Geological Log Database for Public Wells, Jeju Island (제주도 공공 관정 지질주상도 DB 구축 소개)

  • Pak, Song-Hyon;Koh, Giwon;Park, Junbeom;Moon, Dukchul;Yoon, Woo Seok
    • Economic and Environmental Geology
    • /
    • v.48 no.6
    • /
    • pp.509-523
    • /
    • 2015
  • This study introduces newly implemented geological well logs database for Jeju public water wells, built for a research project focusing on integrated hydrogeology database of Jeju Island. A detailed analysis of the existing 1,200 Jeju Island geological logs for the public wells developed since 1970 revealed six major indications to be improved for their use in Jeju geological logs DB construction: (1) lack of uniformity in rock name classification, (2) poor definitions of pyroclastic deposits and sand and gravel layers, (3) lack of well borehole aquifer information, (4) lack of information on well screen installation in many water wells, (5) differences by person in geological logging descriptions. A new Jeju geological logs DB enabling standardized input and output formats has been implemented to overcome the above indications by reestablishing the names of Jeju volcanic and sedimentary rocks and utilizing a commercial, database-based input structured, geological log program. The newly designed database structure in geological log program enables users to store a large number of geology, well drilling, and test data at the standardized DB input structure. Also, well borehole groundwater and aquifer test data can be easily added without modifying the existing database structure. Thus, the newly implemented geological logs DB could be a standardized DB for a large number of Jeju existing public wells and new wells to be developed in the future at Jeju Island. Also, the new geological logs DB will be a basis for ongoing project 'Developing GIS-based integrated interpretation system for Jeju Island hydrogeology'.

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

Research Trend and Futuristic Guideline of Platform-Based Business in Korea (플랫폼 기반 비즈니스에 대한 국내 연구동향 및 미래를 위한 가이드라인)

  • Namn, Su Hyeon
    • Management & Information Systems Review
    • /
    • v.39 no.1
    • /
    • pp.93-114
    • /
    • 2020
  • Platform is considered as an alternative strategy to the traditional linear pipeline based business. Moreover, in the 4th industrial revolution period, efficiency driven pipeline business model needs to be changed to platform business. We have such success stories about platform as Apple, Google, Amazon, Uber, and so on. However, for those smaller corporations, it is not easy to find out the transformation strategy. The essence of platform business is to leverage network effect in management. Thus platform based management can be rephrased as network management across the business functions. Research on platform business is popular and related to diverse facets. But few scholars cover what the research trend of the domain is. The main purpose of this paper is to identify the research trend on platform business in Korea. To do that we first propose the analytical model for platform architecture whose components are consumers, suppliers, artifacts, and IT platform system. We conjecture that mapping of the research work on platform to the components of the model will make us understand the hidden domain of platform research. We propose three hypotheses regarding the characteristics of research and one proposition for the transitional path from pipeline to platform business model. The mapping is based on the research articles filtered from the Korea Citation Index, using keyword search. Research papers are searched through the keywords provided by authors using the word of "platform". The filtered articles are summarized in terms of the attributes such as major component of platform considered, platform type, main purpose of the research, and research method. Using the filtered data, we test the hypotheses in exploratory ways. The contribution of our research is as follows: First, based on the findings, scholars can find the areas of research on the domain: areas where research has been matured and territory where future research is actively sought. Second, the proposition provided can give business practitioners the guideline for changing their strategy from pipeline to platform oriented. This research needs to be considered as exploratory not inferential since subjective judgments are involved in data collection, classification, and interpretation of research articles.

The effects of the gender and situations on purchase intention (사회심리적 성(gender)과 상황이 구매의도에 미치는 영향)

  • Suh, Mun-Shik;Kim, Dae-Yong;Rho, Tae-Seok
    • Management & Information Systems Review
    • /
    • v.31 no.4
    • /
    • pp.167-195
    • /
    • 2012
  • This study focuses on the characteristic of socio-psychological gender of a consumer except a biological gender. Socio-psychological gender is the self-image of a consumer which is related to the gender role. The goal of this study is, first, to examine if socio-psychological gender has more effect on the purchase intention than biological gender. Second, by classifying a group with femininity among socio-psychological gender, it is to examine what kinds of shopping value it aims at. Finally, it is to examine the difference between men and women from the purchase intention according to the circumstances. The result of this study is summarized as follows. First, a consumer can have both characteristics of biological gender and socio-psychological gender. There are masculinity, femininity, bisexuality, and undifferentiated type for the classification of socio-psychological gender. This study shows that there is relatively much bisexuality which has masculinity and femininity at the same time. The outcome showed that the purchase intention was higher for the product which corresponds to socio-psychological gender of a consumer than biological gender. Second, it indicated that a group with femininity pursued pleasurable shopping as compared to a group with masculinity. By contrast, it showed that a group with masculinity aimed at practical shopping more greatly. Finally, it showed that while women are less sensitive to the purchase intention at the male-dominated circumstances, men's purchase intention get lower at the female-dominated circumstances.

  • PDF

The Social Influence of the Landscape Architecture Engineer Examination on the Establishment of Authenticity in Landscaping History Department (조경기사 '조경사' 과목이 조경역사학(造景歷史學) 분야의 진정성 확립에 미친 사회적 영향)

  • Lee, Chang-Hun;Shin, Hyun-Sil;Kim, Kyu-Seob;Lee, Won-Ho
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.36 no.3
    • /
    • pp.128-136
    • /
    • 2018
  • This study was centered on the protested data of the issue of "History of Landscape Architecture" in the handwritten course of landscaping articles of National Qualifications Test. The purpose of this study is to examine the types of social problems in the process of correcting erroneous historical facts. The purpose of this study was to find alternatives for the development of the field of landscape and culture history that can assist in the verification of the historical facts of the landscape sciences examination questions. The main results are as follows. First, as a result of analyzing the contents of the landscape architects' subject matter, the establishment of concept of landscape style and form and the confirmation of historical facts were investigated as important types to be established for development of landscape landscape history department. It seems that the social consensus of the expert group is needed to supplement the lack of data to refer to landscape architectural theory. Second, the analysis of the problematic narrative contents resulted in a total of five types of questionnaires. The appeared in the Undefined style and form(52.94%), Unproven historical facts(25.13%), Obscurity Era classification(11.77%), Lack of specificity(6.95%), Content scope of obscurity events(3.21%) Third, it is not only the lack of information to learn the theory by comparing and analyzing the contents of the statements in the landscape architect 's question items, but also the difference of contents between books was analyzed as the main cause of the problem. As a result of examining the characteristics and examples of the issues raised in landscape architectural problems, it was related to the social phenomenon, and it was classified into cultural factors and political factors. Fourth, the resolution of problematic issues in landscape architects' landscaping articles, which are national technical qualification tests, shows positive results. The information determined in the process of solving the perceived content can be used directly in landscaping field, and it helps the accuracy of the verification process by identifying the types and characteristics of the issues.

The Risk Assessment of the Fire Occurrence According to Urban Facilities in Jinju-si (진주시 도시시설물별 화재발생 위험도 평가)

  • Bae, Gyu Han;Won, Tae Hong;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.1
    • /
    • pp.43-50
    • /
    • 2016
  • Urbanization in Korea has increased significantly and subsequently, various facilities have been concentrated in urban areas at high speed in accordance with a growing urban population. Accordingly, damages have occurred due to a variety of disasters. In particular, fire damage among the social disasters caused the most severe damage in urban areas along with traffic accidents. 44,432 cases of fire occurred in 2015 in Korea. Due to these accidents, 253 were killed and property damage of 4,50 billion won was generated. However, despite the efforts to reduce a variety of damage, fire danger still remains high. In this regard, this study collected fire data, generated from 2007 to 2014 through the Jinju Fire Department and the National Fire Data System(NFDS) and calculated fire risk by analyzing the clustering of fire cases and facilities in Jinju-si based on the current DB of facilities, offered by the Ministry of Government Administration and Home Affairs. As a result, the risk ratings of fire occurrence were classified as four stages under the standards of the US Society of Fire Protection Engineers(SEPE). Business facilities, entertainment facilities, and automobile facilities were classified as the highest A grade, detached houses, Apartment houses, education facilities, sales facilities, accommodation, set of facilities, medical facilities, industrial facilities, and life service facilities were classified as U grade, and other facilities were classified as EU grade. Finally, hazardous production facilities were classified as BEU grade, the lowest grade. In addition, in the case of setting the standard with loss of life, the highest risk facility was the hazardous production facilities, while in the case of setting the standard with property damage, a set of facilities and industrial facilities showed the highest risk. In this regard, this study is expected to be effectively utilized to establish the fire reduction measures against facilities, distributed in urban space by calculating risk grades regarding the generation frequency, casualties, and property damage, through the classification of fire, occurred in the city, according to the facilities.