• Title/Summary/Keyword: Logistic Business

Search Result 436, Processing Time 0.024 seconds

A Study on the Revitalization of Tourism Industry through Big Data Analysis (한국관광 실태조사 빅 데이터 분석을 통한 관광산업 활성화 방안 연구)

  • Lee, Jungmi;Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.149-169
    • /
    • 2018
  • Korea is currently accumulating a large amount of data in public institutions based on the public data open policy and the "Government 3.0". Especially, a lot of data is accumulated in the tourism field. However, the academic discussions utilizing the tourism data are still limited. Moreover, the openness of the data of restaurants, hotels, and online tourism information, and how to use SNS Big Data in tourism are still limited. Therefore, utilization through tourism big data analysis is still low. In this paper, we tried to analyze influencing factors on foreign tourists' satisfaction in Korea through numerical data using data mining technique and R programming technique. In this study, we tried to find ways to revitalize the tourism industry by analyzing about 36,000 big data of the "Survey on the actual situation of foreign tourists from 2013 to 2015" surveyed by the Korea Culture & Tourism Research Institute. To do this, we analyzed the factors that have high influence on the 'Satisfaction', 'Revisit intention', and 'Recommendation' variables of foreign tourists. Furthermore, we analyzed the practical influences of the variables that are mentioned above. As a procedure of this study, we first integrated survey data of foreign tourists conducted by Korea Culture & Tourism Research Institute, which is stored in the tourist information system from 2013 to 2015, and eliminate unnecessary variables that are inconsistent with the research purpose among the integrated data. Some variables were modified to improve the accuracy of the analysis. And we analyzed the factors affecting the dependent variables by using data-mining methods: decision tree(C5.0, CART, CHAID, QUEST), artificial neural network, and logistic regression analysis of SPSS IBM Modeler 16.0. The seven variables that have the greatest effect on each dependent variable were derived. As a result of data analysis, it was found that seven major variables influencing 'overall satisfaction' were sightseeing spot attraction, food satisfaction, accommodation satisfaction, traffic satisfaction, guide service satisfaction, number of visiting places, and country. Variables that had a great influence appeared food satisfaction and sightseeing spot attraction. The seven variables that had the greatest influence on 'revisit intention' were the country, travel motivation, activity, food satisfaction, best activity, guide service satisfaction and sightseeing spot attraction. The most influential variables were food satisfaction and travel motivation for Korean style. Lastly, the seven variables that have the greatest influence on the 'recommendation intention' were the country, sightseeing spot attraction, number of visiting places, food satisfaction, activity, tour guide service satisfaction and cost. And then the variables that had the greatest influence were the country, sightseeing spot attraction, and food satisfaction. In addition, in order to grasp the influence of each independent variables more deeply, we used R programming to identify the influence of independent variables. As a result, it was found that the food satisfaction and sightseeing spot attraction were higher than other variables in overall satisfaction and had a greater effect than other influential variables. Revisit intention had a higher ${\beta}$ value in the travel motive as the purpose of Korean Wave than other variables. It will be necessary to have a policy that will lead to a substantial revisit of tourists by enhancing tourist attractions for the purpose of Korean Wave. Lastly, the recommendation had the same result of satisfaction as the sightseeing spot attraction and food satisfaction have higher ${\beta}$ value than other variables. From this analysis, we found that 'food satisfaction' and 'sightseeing spot attraction' variables were the common factors to influence three dependent variables that are mentioned above('Overall satisfaction', 'Revisit intention' and 'Recommendation'), and that those factors affected the satisfaction of travel in Korea significantly. The purpose of this study is to examine how to activate foreign tourists in Korea through big data analysis. It is expected to be used as basic data for analyzing tourism data and establishing effective tourism policy. It is expected to be used as a material to establish an activation plan that can contribute to tourism development in Korea in the future.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."

Comparison of Deep Learning Frameworks: About Theano, Tensorflow, and Cognitive Toolkit (딥러닝 프레임워크의 비교: 티아노, 텐서플로, CNTK를 중심으로)

  • Chung, Yeojin;Ahn, SungMahn;Yang, Jiheon;Lee, Jaejoon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.1-17
    • /
    • 2017
  • The deep learning framework is software designed to help develop deep learning models. Some of its important functions include "automatic differentiation" and "utilization of GPU". The list of popular deep learning framework includes Caffe (BVLC) and Theano (University of Montreal). And recently, Microsoft's deep learning framework, Microsoft Cognitive Toolkit, was released as open-source license, following Google's Tensorflow a year earlier. The early deep learning frameworks have been developed mainly for research at universities. Beginning with the inception of Tensorflow, however, it seems that companies such as Microsoft and Facebook have started to join the competition of framework development. Given the trend, Google and other companies are expected to continue investing in the deep learning framework to bring forward the initiative in the artificial intelligence business. From this point of view, we think it is a good time to compare some of deep learning frameworks. So we compare three deep learning frameworks which can be used as a Python library. Those are Google's Tensorflow, Microsoft's CNTK, and Theano which is sort of a predecessor of the preceding two. The most common and important function of deep learning frameworks is the ability to perform automatic differentiation. Basically all the mathematical expressions of deep learning models can be represented as computational graphs, which consist of nodes and edges. Partial derivatives on each edge of a computational graph can then be obtained. With the partial derivatives, we can let software compute differentiation of any node with respect to any variable by utilizing chain rule of Calculus. First of all, the convenience of coding is in the order of CNTK, Tensorflow, and Theano. The criterion is simply based on the lengths of the codes and the learning curve and the ease of coding are not the main concern. According to the criteria, Theano was the most difficult to implement with, and CNTK and Tensorflow were somewhat easier. With Tensorflow, we need to define weight variables and biases explicitly. The reason that CNTK and Tensorflow are easier to implement with is that those frameworks provide us with more abstraction than Theano. We, however, need to mention that low-level coding is not always bad. It gives us flexibility of coding. With the low-level coding such as in Theano, we can implement and test any new deep learning models or any new search methods that we can think of. The assessment of the execution speed of each framework is that there is not meaningful difference. According to the experiment, execution speeds of Theano and Tensorflow are very similar, although the experiment was limited to a CNN model. In the case of CNTK, the experimental environment was not maintained as the same. The code written in CNTK has to be run in PC environment without GPU where codes execute as much as 50 times slower than with GPU. But we concluded that the difference of execution speed was within the range of variation caused by the different hardware setup. In this study, we compared three types of deep learning framework: Theano, Tensorflow, and CNTK. According to Wikipedia, there are 12 available deep learning frameworks. And 15 different attributes differentiate each framework. Some of the important attributes would include interface language (Python, C ++, Java, etc.) and the availability of libraries on various deep learning models such as CNN, RNN, DBN, and etc. And if a user implements a large scale deep learning model, it will also be important to support multiple GPU or multiple servers. Also, if you are learning the deep learning model, it would also be important if there are enough examples and references.

Drug Abuse Status and Its Determinants of Male High School Students in Taegu (대구시(大邱市) 일부(一部) 남자고등학생(男子高等學生)의 약물남용(藥物濫用) 실태(實態)와 관련요인(關聯要因))

  • Nam, Jung-Rak;Kam, Sin;Park, Jae-Yong;Han, Chang-Hyun;Ha, Young-Ae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.29 no.3 s.54
    • /
    • pp.451-469
    • /
    • 1996
  • To identify the drug abuse status and its determinant factors in high school boys in Taegu, the study was performed from April to May, 1995. Study population were selected by cluster sampling method and total 5,665 students replied to the self-administered questionnaire survey (2,207 in academic high school, 3,458 in business high school). The major findings were as follows; The proportion of drinking, smoking experience was 55.0%, 45.8%, respectively, and the proportion of current drinker, current smoker was 27.2%, 27.5%. The drinking, smoking experience rate of second grade students was higher than first grade and it was higher in business high school boys. The proportion of a stimulant, a hallucinogen, hemp leaf cigarets experience was 3.2%, 1.6%, 0.1%, respectively. Drug abuse had significant association with home environment(lower economic status, frequent move, death of father or mother, apart from family), parents environment(parents' indifference, parents' drinking and smoking, etc.), school life(lower school grades, intimate friend's drug abuse, etc.), generous attitude to drug abuse, higher level of stress. Students who replied that the law prohibited immature person(students) from drinking and smoking showed lower drug abuse rate. In multiple logistic regression analysis, second grade students, business high school students, parents' indifference, lower school grades, intimate friend's drug abuse, no recognition of the fact that the law prohibits high school students from drinking and smoking, generous attitude to drug abuse, higher level of stress were significantly related with alcohol abuse and smoking. Other drugs abuse were related with above factors. On consideration of above findings, to prevent students from drug abuse, we have to try together in house, school, and society.

  • PDF

Analysis to Determine the Employment Status of Married Women's on the Social Factors Associated (기혼여성의 고용지위 결정요인에 관련한 사회변인 분석)

  • Hwang, Hee-Sook;Kim, Youn-Jae;Park, Jung-Woo
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.7 no.3
    • /
    • pp.181-190
    • /
    • 2012
  • After industrialization, the labor force participation rates of women, especially married women is drastically increasing. So, this study was designed to analyze the determinants of married women's employment status considered. For this, the determinants of married women's employment status were divided into individual-related, children-related, household-related and job-related variables to establish the research models. Based on this, the following results were drawn from a multinominal logistic regression analysis of the determinants of married women's employment status. First, an analysis of individual-related variable showed that married women had the employment status of labor wages with residence in the center of the city and high academic background. Second, an analysis of children-related variable showed that they had the employment status of labor wages with many their children and no their children under the age of six. Third, an analysis of household-related variable showed that they had the self-employment status of labor wages with nuclear family and few income earners of family members. Finally, an analysis of job-related variable showed that they had the employment status of labor wages when they got a job before they got married, their husband didn't get a job, and their husband worked in a professional field. As for findings stated above, as there was a difference in the determinants of married women's employment status, the ways for improvement in the married women's employment status would be suggested as follows. First, married women with young children have the low employment status, basically, requiring problem-solving ways for this because the housekeeping and child-rearing burden caused by marriage and childbirth are factors that continue to obstruct a job. For this, the flexible working hours system, which housekeeping and child-rearing can harmonize with economic activities like developed countries, needs to be activated. But the activation of such flexible working system will produce actual results under institutional protection, such as a related-protection law. Second, the Leave of Child Care System is debated as one of the most representatively systems that housekeeping can harmonize with economic activities. Now, although the System is legislated, the use is very poor.

  • PDF

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

The Comparison of Basic Science Research Capacity of OECD Countries

  • Lim, Yang-Taek;Song, Choong-Han
    • Journal of Technology Innovation
    • /
    • v.11 no.1
    • /
    • pp.147-176
    • /
    • 2003
  • This Paper Presents a new measurement technique to derive the level of BSRC (Basic Science and Research Capacity) index by use of the factor analysis which is extended with the assumption of the standard normal probability distribution of the selected explanatory variables. The new measurement method is used to forecast the gap of Korea's BSRC level compared with those of major OECD countries in terms of time lag and to make their international comparison during the time period of 1981∼1999, based on the assumption that the BSRC progress function of each country takes the form of the logistic curve. The US BSRC index is estimated to be 0.9878 in 1981, 0.9996 in 1990 and 0.99991 in 1999, taking the 1st place. The US BSRC level has been consistently the top among the 16 selected variables, followed by Japan, Germany, France and the United Kingdom, in order. Korea's BSRC is estimated to be 0.2293 in 1981, taking the lowest place among the 16 OECD countries. However, Korea's BSRC indices are estimated to have been increased to 0.3216 (in 1990) and 0.44652 (in 1999) respectively, taking 10th place. Meanwhile, Korea's BSRC level in 1999 (0.44652) is estimated to reach those of the US and Japan in 2233 and 2101, respectively. This means that Korea falls 234 years behind USA and 102 years behind Japan, respectively. Korea is also estimated to lag 34 years behind Germany, 16 years behind France and the UK, 15 years behind Sweden, 11 years behind Canada, 7 years behind Finland, and 5 years behind the Netherlands. For the period of 1981∼1999, the BSRC development speed of the US is estimated to be 0.29700. Its rank is the top among the selected OECD countries, followed by Japan (0.12800), Korea (0.04443), and Germany (0.04029). the US BSRC development speed (0.2970) is estimated to be 2.3 times higher than that of Japan (0.1280), and 6.7 times higher than that of Korea. German BSRC development speed (0.04029) is estimated to be fastest in Europe, but it is 7.4 times slower than that of the US. The estimated BSRC development speeds of Belgium, Finland, Italy, Denmark and the UK stand between 0.01 and 0.02, which are very slow. Particularly, the BSRC development speed of Spain is estimated to be minus 0.0065, staying at the almost same level of BSRC over time (1981 ∼ 1999). Since Korea shows BSRC development speed much slower than those of the US and Japan but relative]y faster than those of other countries, the gaps in BSRC level between Korea and the other countries may get considerably narrower or even Korea will surpass possibly several countries in BSRC level, as time goes by. Korea's BSRC level had taken 10th place till 1993. However, it is estimated to be 6th place in 2010 by catching up the UK, Sweden, Finland and Holland, and 4th place in 2020 by catching up France and Canada. The empirical results are consistent with OECD (2001a)'s computation that Korea had the highest R&D expenditures growth during 1991∼1999 among all OECD countries ; and the value-added of ICT industries in total business sectors value added is 12% in Korea, but only 8% in Japan. And OECD (2001b) observed that Korea, together with the US, Sweden, and Finland, are already the four most knowledge-based countries. Hence, the rank of the knowledge-based country was measured by investment in knowledge which is defined as public and private spending on higher education, expenditures on R&D and investment in software.

  • PDF

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

Financial Characteristics Affecting the Accounting Choices of Capitalized Interest Costs (기업의 재무적 특성이 금융비용 자본화의 회계선택에 미치는 영향)

  • Park, Hee-Woo;Shin, Hyun-Geol
    • 한국산학경영학회:학술대회논문집
    • /
    • 2004.11a
    • /
    • pp.55-72
    • /
    • 2004
  • Before 2003 the companies In Korea should capitalize the interest expenses that are attributable to the acquisition, construction or production of a qualifying assets. However, according to the revised standard which should be applied from 2003, the companies can either capitalize the interest expenses or recognize as an expense when they are incurred. Therefore almost all the companies confronted with the decision making of accounting choices on the interest capitalization. This paper empirically examines which financial characteristics of the companies affect the accounting choice by using logistic regression model and reviews the sufficiency of the foot notes disclosures regarding the capitalized interest. The variables of the financial characteristics are change of debt-equity ratio, borrowing ratio, qualifying assets ratio, firm sire and income smoothing. The results of this study are summarized as follows. First, among the financial characteristics, only qualifying asset ratio has the significant difference between capitalized companies and expensing companies. Second, the results of logistic regression indicate that qualifying asset ratio and firm size have the significant influence on the accounting choices. Therefore, I cannot find the evidence supporting that the companies use the accounting choice to manage the financial ratios.

  • PDF