• Title/Summary/Keyword: Predictive

Search Result 5,261, Processing Time 0.035 seconds

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

The clinical utility of K-CBCL 6-18 in diagnosing ADHD -focused on children with psychological disorders in child welfare institution- (ADHD 진단에서 K-CBCL 6-18의 임상적 유용성 -아동복지시설 심리장애 아동에의 적용-)

  • Kim, Sang A;Ha, Eun Hye
    • Journal of the Korean Society of Child Welfare
    • /
    • no.56
    • /
    • pp.253-281
    • /
    • 2016
  • The purpose of this study was to verify the clinical utility of th Korea Child Behavior Checklist 16-18(K-CBCL 6-18) in diagnosing ADHD among children with psychological disorders in child welfare institutions. The participants were 509 elementary school children(309 boys and 200 girls) who lived in child welfare institutions. They were assessed using the Korean ADHD Rating Scale(K-ARS) and K-CBCL 6-18. Only five scales of the K-CBCL 6-18 related with attention were used for analysis: syndrom total, externalizing total, aggressive behavior, attention problems and DSM-oriented ADHD scales. The results were as follows. First, K-ARS and K-CBCL 6-18 had significantly positive correlations with all five scales. Second, as a result of a t-test on the ADHD and the non-ADHD groups, which were divided using K-ARS, the mean scores of ADHD group were significantly higher than the non-ADHD group for all five scales of the K-CBCL 6-18. The hit rate of all five scales of the K-CBCL 6-18 was 60 to 70 percent. The syndrom total and externalizing total scales had high sensitivity, whereas the aggressive behavior, attention problems, and the DSM-oriented ADHD scales had high specificity. In addition, all scales had high positive predictive values. Third, as the result of a t-test on the ADHD group and the emotional disorder group, there were significant difference in the mean scores of the attention problems and the DSM-oriented ADHD scales. The attention problems and the DSM-oriented ADHD scales had a similar percentage of hit rate, high specificity and low sensitivity. Especially, the DSM-oriented ADHD scale revealed higher specificity than the attention problems scale. The results of this study suggested that the five scales related to attention of the K-CBCL 6-18 are useful in diagnosing ADHD in child welfare institutions.

Distribution and Potential Suitable Habitats of an Endemic Plant, Sophora koreensis in Korea (MaxEnt 분석을 통한 한반도 특산식물 개느삼 서식 가능지역 분석)

  • An, Jong-Bin;Sung, Chan Yong;Moon, Ae-Ra;Kim, Sodam;Jung, Ji-Young;Son, Sungwon;Shin, Hyun-Tak;Park, Wan-Geun
    • Korean Journal of Environment and Ecology
    • /
    • v.35 no.2
    • /
    • pp.154-163
    • /
    • 2021
  • This study was carried out to present the habitat distribution status and the habitat distribution prediction of Sophora koreensis, which is the Korean Endemic Plant included in the EN (Endangered) class of the IUCN Red List. The habit distribution survey of Sophora koreensis confirmed 19 habitats in Gangwon Province, including 13 habitats in Yanggu-gun, 3 habitats in Inje-gun, 2 habitats in Chuncheon-si, and 1 habitat in Hongcheon-gun. The northernmost habitat of Sophora koreensis in Korea was in Imdang-ri, Yanggu-gun; the easternmost habitat in Hangye-ri, Inje-gun; the westernmost habitat in Jinae-ri, Chuncheon-si; and the southernmost habitat in Sungdong-ri, Hongcheon-gun. The altitude of the Sophora koreensis habitats ranged from 169 to 711 m, with an average altitude of 375m. The area of the habitats was 8,000-734,000 m2, with an average area of 202,789 m2. Most habitats were the managed forests, such as thinning and pruning forests. The MaxEnt program analysis for the potential habitat of Sophora koreensis showed the AUC value of 0.9762. The predictive habitat distribution was Yanggu-gun, Inje-gun, Hwacheon-gun, and Chuncheon-si in Gangwon Province. The variables that influence the prediction of the habitat distribution were the annual precipitation, soil carbon content, and maximum monthly temperature. This study confirmed that habitats of Sophora koreensis were mostly found in the ridge area with rich light intensity. They can be used as basic data for the designation of protected areas of Sophora koreensis habitat.

Comparative analysis of activation functions of artificial neural network for prediction of optimal groundwater level in the middle mountainous area of Pyoseon watershed in Jeju Island (제주도 표선유역 중산간지역의 최적 지하수위 예측을 위한 인공신경망의 활성화함수 비교분석)

  • Shin, Mun-Ju;Kim, Jin-Woo;Moon, Duk-Chul;Lee, Jeong-Han;Kang, Kyung Goo
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1143-1154
    • /
    • 2021
  • The selection of activation function has a great influence on the groundwater level prediction performance of artificial neural network (ANN) model. In this study, five activation functions were applied to ANN model for two groundwater level observation wells in the middle mountainous area of the Pyoseon watershed in Jeju Island. The results of the prediction of the groundwater level were compared and analyzed, and the optimal activation function was derived. In addition, the results of LSTM model, which is a widely used recurrent neural network model, were compared and analyzed with the results of the ANN models with each activation function. As a result, ELU and Leaky ReLU functions were derived as the optimal activation functions for the prediction of the groundwater level for observation well with relatively large fluctuations in groundwater level and for observation well with relatively small fluctuations, respectively. On the other hand, sigmoid function had the lowest predictive performance among the five activation functions for training period, and produced inappropriate results in peak and lowest groundwater level prediction. The ANN-ELU and ANN-Leaky ReLU models showed groundwater level prediction performance comparable to that of the LSTM model, and thus had sufficient potential for application. The methods and results of this study can be usefully used in other studies.

Factors Associated with Personal and Social Performance Status in Patients with Bipolar Disorder (양극성 장애 환자의 개인적·사회적 기능 상태에 대한 관련 요인)

  • Kim, Min-Jung;Lee, Jeon-Ho;Youn, HyunChul;Jeong, Hyun-Ghang;Kim, Seung-Hyun
    • Sleep Medicine and Psychophysiology
    • /
    • v.26 no.1
    • /
    • pp.33-43
    • /
    • 2019
  • Objectives: Bipolar disorder is characterized by repetitive relapses that result in psychosocial dysfunctions. The functioning of bipolar disorder patients is related to the severity of symptoms, quality of sleep, drug compliance, and social support. The purpose of this study was to investigate the association between sociodemographic and clinical factors and functional status in bipolar disorder patients. Methods: A total of 52 bipolar disorder patients participated in the study. The following scales were utilized: Korean version of personal and social performance scale (K-PSP), Korean version of Hamilton rating scale for depression (K-HDRS), Korean version of young mania rating scale (K-YMRS), Korean version of pittsburgh sleep quality index (PSQI-K), Korean version of drug attitude inventory (K-DAI), mood disorders insight scale (MDIS), and multidimensional scale of perceived social support (MSPSS). Results: The K-PSP score showed a negative relationship with K-HDRS score (r = -0.387, p = 0.005), but not with K-YMRS score (r = -0.205, p = 0.145). The K-PSP score showed a negative relationship with global PSQI-K score (r = -0.378, p = 0.005) and overall sleep quality (r = -0.353, p = 0.010). The K-PSP scores were positively associated with the KDAI score (r = 0.409, p = 0.003) and MSPSS score (r = 0.334, p = 0.015). The predictive factors for K-PSP were overall sleep quality and social support from family. Conclusion: Our study showed that depressive symptoms were related to overall function in bipolar disorder. Also, our study suggested that improving sleep quality is important in maintaining functional status. Appropriate social support and positive perception toward the drug may lead to the higher level of functioning. This study is meaningful in that the functional status of bipolar disorder patients is analyzed in a multivariate manner in relation to various variables in psychosocial aspects.

Evaluation of the Congenital Hypothyroidism for Newborn Screening Program in Korea: A 14-year Retrospective Cohort Study (한국인 선천성 갑상선기능저하증에 대한 신생아선별검사의 14년간의 후향적 연구; 발생빈도와 유효성)

  • Yoon, Hye-Ran;Ahn, Sunhyun;Lee, Hyangja
    • Journal of The Korean Society of Inherited Metabolic disease
    • /
    • v.19 no.1
    • /
    • pp.1-11
    • /
    • 2019
  • Purpose: Congenital hypothyroidism (CH) is the most common congenital endocrine disorder. The purpose of the present study was to determine the incidence of CH in South Korea during the period from January 1991 to March 2004. Methods: Central data from each city branch of SCL (Seoul Clinical Reference Laboratories) in Yongin, South Korea, was gathered and collectively analyzed. Newborn screening (NBS) for CH was based on measuring the levels of neonatal thyroid stimulating hormone (TSH) and free T4 (a cut-off of 20 mIU/L and less than 0.8 ng/dL, respectively). Results: During the study period, 671,805 live births were screened for CH based on TSH and free T4 ELISA assays. A total of 159 newborns were deemed positive for CH out of 671,805, with a corresponding incidence of 1 in 4,225. When a cut-off of 20 mIU/L was used in TSH assays, the associated sensitivity, specificity, and positive predictive values (PPV) were 100.0%, 99.7%, and 10.8%, respectively. When a cut-off of 0.8 ng/dL in free T4 assays was used, the associated sensitivity, specificity, and PPV were 100.0%, 98.5%, and 3.9%, respectively. Conclusion: CH incidence in South Korea as evidenced by the results of NBS was compared with its incidence and comparable to the other countries prior to 2004.

  • PDF

Prediction of a hit drama with a pattern analysis on early viewing ratings (초기 시청시간 패턴 분석을 통한 대흥행 드라마 예측)

  • Nam, Kihwan;Seong, Nohyoon
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.33-49
    • /
    • 2018
  • The impact of TV Drama success on TV Rating and the channel promotion effectiveness is very high. The cultural and business impact has been also demonstrated through the Korean Wave. Therefore, the early prediction of the blockbuster success of TV Drama is very important from the strategic perspective of the media industry. Previous studies have tried to predict the audience ratings and success of drama based on various methods. However, most of the studies have made simple predictions using intuitive methods such as the main actor and time zone. These studies have limitations in predicting. In this study, we propose a model for predicting the popularity of drama by analyzing the customer's viewing pattern based on various theories. This is not only a theoretical contribution but also has a contribution from the practical point of view that can be used in actual broadcasting companies. In this study, we collected data of 280 TV mini-series dramas, broadcasted over the terrestrial channels for 10 years from 2003 to 2012. From the data, we selected the most highly ranked and the least highly ranked 45 TV drama and analyzed the viewing patterns of them by 11-step. The various assumptions and conditions for modeling are based on existing studies, or by the opinions of actual broadcasters and by data mining techniques. Then, we developed a prediction model by measuring the viewing-time distance (difference) using Euclidean and Correlation method, which is termed in our study similarity (the sum of distance). Through the similarity measure, we predicted the success of dramas from the viewer's initial viewing-time pattern distribution using 1~5 episodes. In order to confirm that the model is shaken according to the measurement method, various distance measurement methods were applied and the model was checked for its dryness. And when the model was established, we could make a more predictive model using a grid search. Furthermore, we classified the viewers who had watched TV drama more than 70% of the total airtime as the "passionate viewer" when a new drama is broadcasted. Then we compared the drama's passionate viewer percentage the most highly ranked and the least highly ranked dramas. So that we can determine the possibility of blockbuster TV mini-series. We find that the initial viewing-time pattern is the key factor for the prediction of blockbuster dramas. From our model, block-buster dramas were correctly classified with the 75.47% accuracy with the initial viewing-time pattern analysis. This paper shows high prediction rate while suggesting audience rating method different from existing ones. Currently, broadcasters rely heavily on some famous actors called so-called star systems, so they are in more severe competition than ever due to rising production costs of broadcasting programs, long-term recession, aggressive investment in comprehensive programming channels and large corporations. Everyone is in a financially difficult situation. The basic revenue model of these broadcasters is advertising, and the execution of advertising is based on audience rating as a basic index. In the drama, there is uncertainty in the drama market that it is difficult to forecast the demand due to the nature of the commodity, while the drama market has a high financial contribution in the success of various contents of the broadcasting company. Therefore, to minimize the risk of failure. Thus, by analyzing the distribution of the first-time viewing time, it can be a practical help to establish a response strategy (organization/ marketing/story change, etc.) of the related company. Also, in this paper, we found that the behavior of the audience is crucial to the success of the program. In this paper, we define TV viewing as a measure of how enthusiastically watching TV is watched. We can predict the success of the program successfully by calculating the loyalty of the customer with the hot blood. This way of calculating loyalty can also be used to calculate loyalty to various platforms. It can also be used for marketing programs such as highlights, script previews, making movies, characters, games, and other marketing projects.

Distress and Associated Factors in Patients with Breast Cancer Surgery : A Cross-Sectional Study (유방암 수술환자의 디스트레스 및 연관인자 : 단면연구)

  • Lee, Sang-Shin;Rim, Hyo-Deog;Woo, Jungmin
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.26 no.2
    • /
    • pp.77-85
    • /
    • 2018
  • Objectives : This study aimed to investigate the level of distress using the distress thermometer (DT) and the factors associated with distress in postoperative breast cancer (BC) patients. Methods : DT and WHOQOL-BREF (World Health Organization Quality of Life Scale Abbreviated Version) along with sociodemographic variables were assessed in patients undergoing surgery for their first treatment of BC within one week postoperatively. The distress group consisted of participants with a DT score ${\geq}4$. The prevalence and associative factors of distress were examined by descriptive, univariable, and logistic regression analysis. Results : Three hundred seven women were recruited, and 264 subjects were finally analyzed. A total of 173 (65.5%) were classified into the distress group. The distress group showed significantly younger age (p=0.045), living without a spouse (p=0.032), and worse quality of life (QOL) as measured by overall QOL (p=0.009), general health (p=0.005), physical health domain (p<0.000), and psychological health domain (p=0.002). The logistic regression analysis showed that patients aged 40-49 years were more likely to experience distress than those aged ${\geq}60years$ (Odds ratios [OR]=2.992, 95% confidence interval [CI] 1.241-7.215). Moreover, the WHOQOL-BREF physical health domain was a predictive factor of distress (OR=0.777, 95% CI 0.692-0.873). Conclusions : A substantial proportion of patients are experiencing significant distress after BC surgery. It would be expected that distress management, especially in the middle-aged patients and in the domain of physical QOL (e.g., pain, insomnia, fatigue), from the early BC treatment stage might reduce chronic distress.

Psychological Characteristics of Living Liver Transplantation Donors using MMPI-2 Profiles (MMPI-2를 이용한 생체 간 공여자들의 심리적 특성에 대한 연구)

  • Lee, Jin Hyeok;Choi, Tae Young;Yoon, Seoyoung
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.27 no.1
    • /
    • pp.42-49
    • /
    • 2019
  • Objectives : Living donor liver transplantation (LDLT) is a life-saving therapy for patients with terminal liver disease. Many studies have focused on recipients rather than donors. The aim of this study was to assess the emotional status and personality characteristics of LDLT donors. Methods : We evaluated 218 subjects (126 male, 92 female) who visited Daegu Catholic University Medical Center from August 2012 to July 2018. A retrospective review of their preoperative psychological evaluation was done. We investigated epidemiological data and the Minnesota Multiphasic Personality Inventory-2 questionnaire. Subanalysis was done depending on whether subjects actually underwent surgery, relationship with the recipient, and their gender. Results : Mean age of subjects was $32.19{\pm}10.91years$. 187 subjects received LDLT surgery (actual donors) while 31 subjects didn't (potential donors). Donor-recipient relationship included husband-wife, parent-children, brother-sister etc. Subjects had statistical significance on validity scale L, F, K and all clinical scales compared to the control group. Potential donors had significant difference in F(b), F(p), K, S, Pa, AGGR, PSYC, DISC and NEGE scales compared to actual donors. F, D and NEGE scales were found to be predictive for actual donation. Subanalysis on donor-recipient relationship and gender also showed significant difference in certain scales. Conclusions : Under-reporting of psychological problems should be considered when evaluating living-liver donors. Information about the donor's overall psychosocial background, mental status and donation process should also be acquired.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.