• Title/Summary/Keyword: test accuracy

Search Result 4,913, Processing Time 0.032 seconds

Deep Learning-based Fracture Mode Determination in Composite Laminates (복합 적층판의 딥러닝 기반 파괴 모드 결정)

  • Muhammad Muzammil Azad;Atta Ur Rehman Shah;M.N. Prabhakar;Heung Soo Kim
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.37 no.4
    • /
    • pp.225-232
    • /
    • 2024
  • This study focuses on the determination of the fracture mode in composite laminates using deep learning. With the increase in the use of laminated composites in numerous engineering applications, the insurance of their integrity and performance is of paramount importance. However, owing to the complex nature of these materials, the identification of fracture modes is often a tedious and time-consuming task that requires critical domain knowledge. Therefore, to alleviate these issues, this study aims to utilize modern artificial intelligence technology to automate the fractographic analysis of laminated composites. To accomplish this goal, scanning electron microscopy (SEM) images of fractured tensile test specimens are obtained from laminated composites to showcase various fracture modes. These SEM images are then categorized based on numerous fracture modes, including fiber breakage, fiber pull-out, mix-mode fracture, matrix brittle fracture, and matrix ductile fracture. Next, the collective data for all classes are divided into train, test, and validation datasets. Two state-of-the-art, deep learning-based pre-trained models, namely, DenseNet and GoogleNet, are trained to learn the discriminative features for each fracture mode. The DenseNet models shows training and testing accuracies of 94.01% and 75.49%, respectively, whereas those of the GoogleNet model are 84.55% and 54.48%, respectively. The trained deep learning models are then validated on unseen validation datasets. This validation demonstrates that the DenseNet model, owing to its deeper architecture, can extract high-quality features, resulting in 84.44% validation accuracy. This value is 36.84% higher than that of the GoogleNet model. Hence, these results affirm that the DenseNet model is effective in performing fractographic analyses of laminated composites by predicting fracture modes with high precision.

Accuracy of bite registration according to the buccal bite scan range of intra-oral scanner (구강 스캐너의 협측 교합 스캔 부위에 따른 교합 인기의 정확도)

  • Tae-sung Kwon;Dae-hyun Kim;Min-su Kim;Dong-jun Song;Joo-Hun Song
    • Journal of Dental Rehabilitation and Applied Science
    • /
    • v.40 no.3
    • /
    • pp.125-134
    • /
    • 2024
  • Purpose: The aim of this study was to determine which scan range would provide the most accurate bite registration when performing a bite scan after scanning an upper and lower arch using an intraoral scanner. Materials and Methods: The occlusal contact points were recorded using articulating paper for 30 adults, and the results of various ranges of buccal bite scan were compared based on this. Buccal bite scan of 5 ranges (1st premolar to 2nd premolar, 1st premolar to 1st molar, 1st premolar to 2nd molar, 2nd premolar to 1st molar, and canines to another side canine of the maxillary teeth) was performed, and then the buccal bite scan file was used in a CAD program to confirm the occlusal area in the scan file through data editing and alignment, leaving the buccal area of the teeth. Afterwards, the degree of agreement between the occlusal contact points obtained from the articulating paper and the occlusal area obtained from the scan file was compared, and statistical analysis was performed using the homoscedastic T-test (α = 0.05). Results: The alignment success and alignment failure rates among each group were 77.23% and 40.85% in canine to another side canine, 68.23% and 28.89% in bilateral first premolar to second premolar, 63.76% and 29.97% in bilateral first premolar to first molar, 61.31% and 32.04% in bilateral first premolar to second molar, 67.55% and 27.46% in second premolar to first molar. The results of the anterior scan of both canines showed higher alignment success and failure rates compared to the scan results of all maxillary posterior teeth. In the alignment success rate, statistical significance was not found depending on the scan range of the posterior teeth, but in comparing the results of the posterior teeth and both canines, statistical significance was observed except for the scan results of the second premolar to the first molar. There was no statistical significance in the alignment failure rate depending on the scan range of the posterior teeth, and statistical significance was observed in the results of the posterior teeth and both canines. Conclusion: When taking a buccal bite scan, in the case of scanning the anterior teeth, more occlusal area appear than when scanning the posterior teeth, and in the case of scanning the posterior teeth, there is no significant difference in the bite registration depending on the scan range.

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

The Evaluation of Attenuation Difference and SUV According to Arm Position in Whole Body PET/CT (전신 PET/CT 검사에서 팔의 위치에 따른 감약 정도와 SUV 변화 평가)

  • Kwak, In-Suk;Lee, Hyuk;Choi, Sung-Wook;Suk, Jae-Dong
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.14 no.2
    • /
    • pp.21-25
    • /
    • 2010
  • Purpose: For better PET imaging with accuracy the transmission scanning is inevitably required for attenuation correction. The attenuation is affected by condition of acquisition and patient position, consequently quantitative accuracy may be decreased in emission scan imaging. In this paper, the present study aims at providing the measurement for attenuation varying with the positions of the patient's arm in whole body PET/CT, further performing the comparative analysis over its SUV changes. Materials and Methods: NEMA 1994 PET phantom was filled with $^{18}F$-FDG and the concentration ratio of insert cylinder and background water fit to 4:1. Phantom images were acquired through emission scanning for 4min after conducting transmission scanning by using CT. In an attempt to acquire image at the state that the arm of the patient was positioned at the lower of ahead, image was acquired in away that two pieces of Teflon inserts were used additionally by fixing phantoms at both sides of phantom. The acquired imaged at a were reconstructed by applying the iterative reconstruction method (iteration: 2, subset: 28) as well as attenuation correction using the CT, and then VOI was drawn on each image plane so as to measure CT number and SUV and comparatively analyze axial uniformity (A.U=Standard deviation/Average SUV) of PET images. Results: It was found from the above phantom test that, when comparing two cases of whether Teflon insert was fixed or removed, the CT number of cylinder increased from -5.76 HU to 0 HU, while SUV decreased from 24.64 to 24.29 and A.U from 0.064 to 0.052. And the CT number of background water was identified to increase from -6.14 HU to -0.43 HU, whereas SUV decreased from 6.3 to 5.6 and A.U also decreased from 0.12 to 0.10. In addition, as for the patient image, CT number was verified to increase from 53.09 HU to 58.31 HU and SUV decreased from 24.96 to 21.81 when the patient's arm was positioned over the head rather than when it was lowered. Conclusion: When arms up protocol was applied, the SUV of phantom and patient image was decreased by 1.4% and 9.2% respectively. With the present study it was concluded that in case of PET/CT scanning against the whole body of a patient the position of patient's arm was not so much significant. Especially, the scanning under the condition that the arm is raised over to the head gives rise to more probability that the patient is likely to move due to long scanning time that causes the increase of uptake of $^{18}F$-FDG of brown fat at the shoulder part together with increased pain imposing to the shoulder and discomfort to a patient. As regarding consideration all of such factors, it could be rationally drawn that PET/CT scanning could be made with the arm of the subject lowered.

  • PDF

Business Application of Convolutional Neural Networks for Apparel Classification Using Runway Image (합성곱 신경망의 비지니스 응용: 런웨이 이미지를 사용한 의류 분류를 중심으로)

  • Seo, Yian;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.1-19
    • /
    • 2018
  • Large amount of data is now available for research and business sectors to extract knowledge from it. This data can be in the form of unstructured data such as audio, text, and image data and can be analyzed by deep learning methodology. Deep learning is now widely used for various estimation, classification, and prediction problems. Especially, fashion business adopts deep learning techniques for apparel recognition, apparel search and retrieval engine, and automatic product recommendation. The core model of these applications is the image classification using Convolutional Neural Networks (CNN). CNN is made up of neurons which learn parameters such as weights while inputs come through and reach outputs. CNN has layer structure which is best suited for image classification as it is comprised of convolutional layer for generating feature maps, pooling layer for reducing the dimensionality of feature maps, and fully-connected layer for classifying the extracted features. However, most of the classification models have been trained using online product image, which is taken under controlled situation such as apparel image itself or professional model wearing apparel. This image may not be an effective way to train the classification model considering the situation when one might want to classify street fashion image or walking image, which is taken in uncontrolled situation and involves people's movement and unexpected pose. Therefore, we propose to train the model with runway apparel image dataset which captures mobility. This will allow the classification model to be trained with far more variable data and enhance the adaptation with diverse query image. To achieve both convergence and generalization of the model, we apply Transfer Learning on our training network. As Transfer Learning in CNN is composed of pre-training and fine-tuning stages, we divide the training step into two. First, we pre-train our architecture with large-scale dataset, ImageNet dataset, which consists of 1.2 million images with 1000 categories including animals, plants, activities, materials, instrumentations, scenes, and foods. We use GoogLeNet for our main architecture as it has achieved great accuracy with efficiency in ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Second, we fine-tune the network with our own runway image dataset. For the runway image dataset, we could not find any previously and publicly made dataset, so we collect the dataset from Google Image Search attaining 2426 images of 32 major fashion brands including Anna Molinari, Balenciaga, Balmain, Brioni, Burberry, Celine, Chanel, Chloe, Christian Dior, Cividini, Dolce and Gabbana, Emilio Pucci, Ermenegildo, Fendi, Giuliana Teso, Gucci, Issey Miyake, Kenzo, Leonard, Louis Vuitton, Marc Jacobs, Marni, Max Mara, Missoni, Moschino, Ralph Lauren, Roberto Cavalli, Sonia Rykiel, Stella McCartney, Valentino, Versace, and Yve Saint Laurent. We perform 10-folded experiments to consider the random generation of training data, and our proposed model has achieved accuracy of 67.2% on final test. Our research suggests several advantages over previous related studies as to our best knowledge, there haven't been any previous studies which trained the network for apparel image classification based on runway image dataset. We suggest the idea of training model with image capturing all the possible postures, which is denoted as mobility, by using our own runway apparel image dataset. Moreover, by applying Transfer Learning and using checkpoint and parameters provided by Tensorflow Slim, we could save time spent on training the classification model as taking 6 minutes per experiment to train the classifier. This model can be used in many business applications where the query image can be runway image, product image, or street fashion image. To be specific, runway query image can be used for mobile application service during fashion week to facilitate brand search, street style query image can be classified during fashion editorial task to classify and label the brand or style, and website query image can be processed by e-commerce multi-complex service providing item information or recommending similar item.

Effect of the Changing the Lower Limits of Normal and the Interpretative Strategies for Lung Function Tests (폐기능검사 해석에 정상하한치 변화와 새 해석흐름도가 미치는 영향)

  • Ra, Seung Won;Oh, Ji Seon;Hong, Sang-Bum;Shim, Tae Sun;Lim, Chae Man;Koh, Youn Suck;Lee, Sang Do;Kim, Woo Sung;Kim, Dong-Soon;Kim, Won Dong;Oh, Yeon-Mok
    • Tuberculosis and Respiratory Diseases
    • /
    • v.61 no.2
    • /
    • pp.129-136
    • /
    • 2006
  • Background: To interpret lung function tests, it is necessary to determine the lower limits of normal (LLN) and to derive a consensus on the interpretative algorithm. '0.7 of LLN for the $FEV_1$/FVC' was suggested by the COPD International Guideline (GOLD) for defining obstructive disease. A consensus on a new interpretative algorithm was recently achieved by ATS/ERS in 2005. We evaluated the accuracy of '0.7 of LLN for the $FEV_1$/FVC' for diagnosing obstructive diseases, and we also determined the effect of the new algorithm on diagnosing ventilatory defects. Methods: We obtained the age, gender, height, weight, $FEV_1$, FVC, and $FEV_1$/FVC from 7362 subjects who underwent spirometry in 2005 at the Asan Medical Center, Korea. For diagnosing obstructive diseases, the accuracy of '0.7 of LLN for the $FEV_1$/FVC' was evaluated in reference to the $5^{th}$ percentile of the LLN. By applying the new algorithm, we determined how many more subjects should have lung volumes testing performed. Evaluation of 1611 patients who had lung volumes testing performed as well as spirometry during the period showed how many more subjects were diagnosed with obstructive diseases according to the new algorithm. Results: 1) The sensitivity of '0.7 of LLN for the $FEV_1$/FVC' for diagnosing obstructive diseases increased according to age, but the specificity was decreased according to age; the positive predictive value decreased, but the negative predictive value increased. 2) By applying the new algorithm, 34.5% (2540/7362) more subjects should have lung volumes testing performed. 3) By applying the new algorithm, 13% (205/1611) more subjects were diagnosed with obstructive diseases; these subjects corresponded to 30% (205/681) of the subjects who had been diagnosed with restrictive diseases by the old interpretative algorithm. Conclusion: The sensitivity and specificity of '0.7 of LLN for the $FEV_1$/FVC' for diagnosing obstructive diseases changes according to age. By applying the new interpretative algorithm, it was shown that more subjects should have lung volumes testing performed, and there was a higher probability of being diagnosed with obstructive diseases.

Development of an Offline Based Internal Organ Motion Verification System during Treatment Using Sequential Cine EPID Images (연속촬영 전자조사 문 영상을 이용한 오프라인 기반 치료 중 내부 장기 움직임 확인 시스템의 개발)

  • Ju, Sang-Gyu;Hong, Chae-Seon;Huh, Woong;Kim, Min-Kyu;Han, Young-Yih;Shin, Eun-Hyuk;Shin, Jung-Suk;Kim, Jing-Sung;Park, Hee-Chul;Ahn, Sung-Hwan;Lim, Do-Hoon;Choi, Doo-Ho
    • Progress in Medical Physics
    • /
    • v.23 no.2
    • /
    • pp.91-98
    • /
    • 2012
  • Verification of internal organ motion during treatment and its feedback is essential to accurate dose delivery to the moving target. We developed an offline based internal organ motion verification system (IMVS) using cine EPID images and evaluated its accuracy and availability through phantom study. For verification of organ motion using live cine EPID images, a pattern matching algorithm using an internal surrogate, which is very distinguishable and represents organ motion in the treatment field, like diaphragm, was employed in the self-developed analysis software. For the system performance test, we developed a linear motion phantom, which consists of a human body shaped phantom with a fake tumor in the lung, linear motion cart, and control software. The phantom was operated with a motion of 2 cm at 4 sec per cycle and cine EPID images were obtained at a rate of 3.3 and 6.6 frames per sec (2 MU/frame) with $1,024{\times}768$ pixel counts in a linear accelerator (10 MVX). Organ motion of the target was tracked using self-developed analysis software. Results were compared with planned data of the motion phantom and data from the video image based tracking system (RPM, Varian, USA) using an external surrogate in order to evaluate its accuracy. For quantitative analysis, we analyzed correlation between two data sets in terms of average cycle (peak to peak), amplitude, and pattern (RMS, root mean square) of motion. Averages for the cycle of motion from IMVS and RPM system were $3.98{\pm}0.11$ (IMVS 3.3 fps), $4.005{\pm}0.001$ (IMVS 6.6 fps), and $3.95{\pm}0.02$ (RPM), respectively, and showed good agreement on real value (4 sec/cycle). Average of the amplitude of motion tracked by our system showed $1.85{\pm}0.02$ cm (3.3 fps) and $1.94{\pm}0.02$ cm (6.6 fps) as showed a slightly different value, 0.15 (7.5% error) and 0.06 (3% error) cm, respectively, compared with the actual value (2 cm), due to time resolution for image acquisition. In analysis of pattern of motion, the value of the RMS from the cine EPID image in 3.3 fps (0.1044) grew slightly compared with data from 6.6 fps (0.0480). The organ motion verification system using sequential cine EPID images with an internal surrogate showed good representation of its motion within 3% error in a preliminary phantom study. The system can be implemented for clinical purposes, which include organ motion verification during treatment, compared with 4D treatment planning data, and its feedback for accurate dose delivery to the moving target.

Effects of Anti-thyroglobulin Antibody on the Measurement of Thyroglobulin : Differences Between Immunoradiometric Assay Kits Available (면역방사계수법을 이용한 Thyroglobulin 측정시 항 Thyroglobulin 항체의 존재가 미치는 영향: Thyroglobulin 측정 키트에 따른 차이)

  • Ahn, Byeong-Cheol;Seo, Ji-Hyeong;Bae, Jin-Ho;Jeong, Shin-Young;Yoo, Jeong-Soo;Jung, Jin-Hyang;Park, Ho-Yong;Kim, Jung-Guk;Ha, Sung-Woo;Sohn, Jin-Ho;Lee, In-Kyu;Lee, Jae-Tae;Kim, Bo-Wan
    • The Korean Journal of Nuclear Medicine
    • /
    • v.39 no.4
    • /
    • pp.252-256
    • /
    • 2005
  • Purpose: Thyroglobulin (Tg) is a valuable and sensitive tool as a marker for diagnosis and follow-up for several thyroid disorders, especially, in the follow-up of patients with differentiated thyroid cancer (DTC). Often, clinical decisions rely entirely on the serum Tg concentration. But the Tg assay is one of the most challenging laboratory measurements to perform accurately owing to antithyroglobulin antibody (Anti-Tg). In this study, we have compared the degree of Anti-Tg effects on the measurement of Tg between availale Tg measuring kits. Materials and Methods: Measurement of Tg levels for standard Tg solution was performed with two different kits commercially available (A/B kits) using immunoradiometric assay technique either with absence or presence of three different concentrations of Anti-Tg. Measurement of Tg for patient's serum was also performed with the same kits. Patient's serum samples were prepared with mixtures of a serum containing high Tg levels and a serum containg high Anti-Tg concentrations. Results: In the measurements of standard Tg solution, presence of Anti-Tg resulted in falsely lower Tg level by both A and B kits. Degree of Tg underestimation by h kit was more prominent than B kit. The degree of underestimation by B kit was trivial therefore clinically insignificant, but statistically significant. Addition of Anti-Tg to patient serum resulted in falsely lower Tg levels with only A kit. Conclusion: Tg level could be underestimated in the presence of anti-Tg. Anti-Tg effect on Tg measurement was variable according to assay kit used. Therefore, accuracy test must be performed for individual Tg-assay kit.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.