• Title/Summary/Keyword: Predict

Search Result 18,926, Processing Time 0.053 seconds

Research on the Relation between Musculoskeletal symptoms and Diagnosis using Moire Topography among Workers at an Automobile Manufacturing Plant (자동차회사 근로자를 대상으로 한 근골격계 자각증상과 moire 영상 진단과의 관계 연구)

  • Chun Eun-Joo;Lee Young-Gil;Jahng Doo-Sub;Lee Ki-Nam;Song Yung-Sun
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.5 no.2
    • /
    • pp.69-92
    • /
    • 2001
  • The purposes of this study were to offer foundation making more certain standards of musculoskeletal disorder diagnosis, We researched musculoskeletal symptoms degrees, frequencies, and cares and then examined relation between musculoskeletal symptoms and diagnosis of musculoskeletal conditions using moire topography among workers at an automobile manufacturing plant. Therefore we propose the possibility of moire topography as diagnosing utilities of musculoskeletal disorders. Methods: This study was to examine the general characteristics, complaints of musculoskeletal symptoms, and work-related musculoskeletal disorder rates of cervicobrachial and lumbar area by survey among 435 workers at an automobile manufacturing plant and then to show each frequency and percentage, In the diagnosis using moire topography, we studied pain control necessity of cervicobrachial and lumbar area, 435 subjects were classified by 5 levels: A(no symptoms), B(need management), C(need treatment) and then more divided by B1(light symptoms)/B2(heavy symptoms), C1(light symptoms)/C2(heavy symptoms), And musculoskeletal areas were divided by 2 parts, cervicobrachial area(neck, shoulder, arm&elbow, and wrist&hand) and lumbar area, Then, frequency and percentage of each musculoskeletal areas(cervicobrachial and lumbar area) were appeared. At last, Pearson's chi-square test analysis was utilized to observe the relation between diagnosis using moire topography and general characteristics and the relation between diagnosis using moire topography and work-related complaint of musculoskeletal symptoms of cervicobrachial and lumbar area, Results: The subjects employed for this research were categorized into; by gender, all of them were males(l00%): by age, under 35 years 12 %, 36-40 years 56.3%, 41-45 years 26.3 %, and above 46 years 5.3% with 36-40 years accounting for most of it. By living location, owned houses represented 69.7%, rented houses 23.4%, monthly-rented 1.6%, the others 5.3%; by education, middle school and lower represented 3.0%, high school 89.4%, and junior college and higher 7.6% with high school occupying most of the group. By marital status, married represented 95.2%, unmarried 4.1%, and the others 0.7% with most of them married; by alcohol, drinking represented 81.8% and non-drinking 18.2%; by smoking status, smoking represented 53.6%, non-smoking 46.4% with no big difference between them. By working time(hours/week), below 50 represented 26.9%, 50-60 67.6%, above 60 5.5%; by working time(hours/day), below 9 represented 21.6%, 10-12 73.1%, above 13 5.3%; by job tenure(years), below 10 represented 25.1%, 11-15 54.3%, 16-20 15.2%, above 21 5.5%. By personal income per year, below 30 million won represented 11.0%, 30-40 84.8%, above 40 4.1%; by sleeping hours, below 6 hours represented 26.7%, 7-8 hours 69.9%, above 9 hours 3.4%. Complaint rates of musculoskeletal symptoms and work-related musculoskeletal disorder rates were 63.9% and 54.9% with shoulder area occupying most of both them. By pain degree of musculoskeletal symptoms, shoulder area represented $2.73{\pm}0.84$, lumbar area $2.66{\pm}0.86$, wrist and hand area $2.59{\pm}0.86$, neck area $2.55{\pm}0.74$, and arm and elbow area $2.48{\pm}0.71$. By cares about musculoskeletal symptoms, taking medication or care represented 34.4%-46.7%, absence or leave 15.4%-28.7%, and job transfer 6.3%-11.5%. So experienced cases more than one thing among cares about musculoskeletal symptoms represented 39.6%-54%. In the diagnosis using moire topography, pain control necessity of cervicobrachial area was shown below; A(no symptoms) 20.7%, B1(need management/light symptoms) 64.6%, B2(need management/heavy symptoms) 11.5%, C1(need treatment/light symptoms) 3.0%, C2(need treatment/heavy symptoms) 0.2%. By lumbar area, A(no symptoms) 8.7%, B1(need management/light symptoms) 52.2%, B2(need management/heavy symptoms) 30.3%, C1(need treatment/light symptoms) 8.7%, C2(need treatment/heavy symptoms) was none. In the relation between pain control necessity and general characteristics, age(P=0.013), education(P=0.000), and job tenure(P=0.012) with pain control necessity showed differences with significance. The relation between pain control necessity and complaint of musculoskeletal symptoms of cervicobrachial and lumbar area showed no difference with significance; in cervicobrachial area represented P=0.708, lumbar area P=0.318 Conclusions: This study for musculoskeletal symptoms on workers at automobile manufacturing plant showed that complaint rates of musculoskeletal symptoms for cervicobrachial and lumbar area were so high, 63.9%. But Pearson's chi-square test analysis was utilized to study the relation between musculoskeletal symptoms and the diagnosis using moire topography, showed no differences with significance. They have no differences with significance, but the prevalence rates of diagnosis using moire topography for cervicobrachial and lumbar area were more higher than complaint rates of musculoskeletal symptoms; complaint rates of musculoskeletal symptoms were 52.4%, 34.5% and the diagnosis using moire topography were 79.3%, 91.3% for cervicobrachial and lumbar area. The results of this study indicate that the diagnosis using moire topography can find weak musculoskeletal disorders that an individual can not feel, not be judged work-related musculoskeletal disease. Therefore, this study has an important meaning that diagnosis using moire topography can predict and control own physical condition complete musculoskeletal disorders beforehand, since oriental medicine theory considers that prevention is important.

  • PDF

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

  • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.97-117
    • /
    • 2020
  • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.

A Study on Risk Parity Asset Allocation Model with XGBoos (XGBoost를 활용한 리스크패리티 자산배분 모형에 관한 연구)

  • Kim, Younghoon;Choi, HeungSik;Kim, SunWoong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.135-149
    • /
    • 2020
  • Artificial intelligences are changing world. Financial market is also not an exception. Robo-Advisor is actively being developed, making up the weakness of traditional asset allocation methods and replacing the parts that are difficult for the traditional methods. It makes automated investment decisions with artificial intelligence algorithms and is used with various asset allocation models such as mean-variance model, Black-Litterman model and risk parity model. Risk parity model is a typical risk-based asset allocation model which is focused on the volatility of assets. It avoids investment risk structurally. So it has stability in the management of large size fund and it has been widely used in financial field. XGBoost model is a parallel tree-boosting method. It is an optimized gradient boosting model designed to be highly efficient and flexible. It not only makes billions of examples in limited memory environments but is also very fast to learn compared to traditional boosting methods. It is frequently used in various fields of data analysis and has a lot of advantages. So in this study, we propose a new asset allocation model that combines risk parity model and XGBoost machine learning model. This model uses XGBoost to predict the risk of assets and applies the predictive risk to the process of covariance estimation. There are estimated errors between the estimation period and the actual investment period because the optimized asset allocation model estimates the proportion of investments based on historical data. these estimated errors adversely affect the optimized portfolio performance. This study aims to improve the stability and portfolio performance of the model by predicting the volatility of the next investment period and reducing estimated errors of optimized asset allocation model. As a result, it narrows the gap between theory and practice and proposes a more advanced asset allocation model. In this study, we used the Korean stock market price data for a total of 17 years from 2003 to 2019 for the empirical test of the suggested model. The data sets are specifically composed of energy, finance, IT, industrial, material, telecommunication, utility, consumer, health care and staple sectors. We accumulated the value of prediction using moving-window method by 1,000 in-sample and 20 out-of-sample, so we produced a total of 154 rebalancing back-testing results. We analyzed portfolio performance in terms of cumulative rate of return and got a lot of sample data because of long period results. Comparing with traditional risk parity model, this experiment recorded improvements in both cumulative yield and reduction of estimated errors. The total cumulative return is 45.748%, about 5% higher than that of risk parity model and also the estimated errors are reduced in 9 out of 10 industry sectors. The reduction of estimated errors increases stability of the model and makes it easy to apply in practical investment. The results of the experiment showed improvement of portfolio performance by reducing the estimated errors of the optimized asset allocation model. Many financial models and asset allocation models are limited in practical investment because of the most fundamental question of whether the past characteristics of assets will continue into the future in the changing financial market. However, this study not only takes advantage of traditional asset allocation models, but also supplements the limitations of traditional methods and increases stability by predicting the risks of assets with the latest algorithm. There are various studies on parametric estimation methods to reduce the estimated errors in the portfolio optimization. We also suggested a new method to reduce estimated errors in optimized asset allocation model using machine learning. So this study is meaningful in that it proposes an advanced artificial intelligence asset allocation model for the fast-developing financial markets.

Effects of Temperature Conditions on the Growth and Oviposition of Brown Planthopper, Nilaparvata lugens $St{\aa}l$ (온도조건(溫度條件)이 벼멸구의 발육(發育) 및 산란(産卵)에 미치는 영향(影響)에 관한 연구(硏究))

  • Bae, Soon-Do;Song, Yoo-Han;Park, Yeong-Do
    • Korean journal of applied entomology
    • /
    • v.26 no.1 s.70
    • /
    • pp.13-23
    • /
    • 1987
  • This study was conducted to know the effects of temperature conditions on the growth and oviposition of the brown planthopper(BPH), Nilaparvata lugens $St{\aa}l$. Results obtained were to predict the timing of the BPH control by measuring population dynamics of the BPH in response to temperature fluctuations upon migration of the insects in paddy fields. Developmental and ovipositional rates under constant and alternating temperature conditions were observed in a plant growth cabinet. Hatchabilities of eggs of the BPH were the highest at $25^{\circ}C$ and were decreased below or above the optimum temperature. Egg periods were the shortest at $27.5^{\circ}C$ and prolonged with decreasing temperature, but retarded at higher temperature above $30^{\circ}C$. Adult emergence rates were the highest at $27.5^{\circ}C$ and reduced with decreasing temperature, and no adult emerged at $32.5^{\circ}C$ and $35^{\circ}C$. Developmental period of nymph was the shortest at both $27.5^{\circ}C$ and $30^{\circ}C$, but extended with decreasing temperature. Female longevity was increased with decreasing temperature and the male longevity was the shortest at $27.5^{\circ}C$. Preoviposition period was the shortest at $32.5^{\circ}C$, but prolonged with decreasing temperature. It was about 6.5 times longer at $17.5^{\circ}C$ than that at $32.5^{\circ}C$. Number of eggs oviposited per female was the greatest at $25^{\circ}C$, but decreased at the temperature below or above the optimum. Under the same total effective day-degrees, hatchabilty at the alternating temperature was about 10% higher than that at the constant temperature but egg period at the alternating temperature was nearly identical as that at the constant. Under the $22^{\circ}C$ condition, emergence rate was about 8% higher at the alternating temperature than that at the constant, however, at the $28^{\circ}C$, the rate was about 8% higher at the constant than that at the alternating. Nymphal period was about $4{\sim}6$ days longer at the alternating temperature than that at the constant. Under the same total effective day-degrees in adult stage, both longevity and oviposition period were longer at alternating temperature than those at the constant. Number of eggs oviposited per female was also higher at the alternating. Longevities of females reared under $28^{\circ}C$ of constant temperature was the longest no matter what temperatures they were exposed after the emergence. This result seems to be indicating that female longevity is greatly influenced by the temperature to which they were exposed durings immature stages. Preoviposition period was affected by the temperature exposed during the nympal and adult stage whereas the number of eggs oviposited was affected by the temperature during the adult stage only. Based on the results from this study, the developmental threshold temperatures seem to be $14.12^{\circ}C$ for eggs, $14.76^{\circ}C$ for nymphs, $9.62^{\circ}C$ for adults, and $15.95^{\circ}C$ for preoviposition period. Estimated values of the total effective temperature for completing each stage were 141.25 day-degrees for eggs, 167.83 day-degrees for nymphs, 349.64 day-degrees for adults, and 58.60 day-degrees for preoviposition.

  • PDF

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

The Abolition Type and The Regional Characteristics of The Elementary Schools in Chungbuk Province (忠淸北道의 國民學校 廢校類型과 그 地域的 特性)

  • ;Chae, Son-Ha
    • Journal of the Korean Geographical Society
    • /
    • v.29 no.1
    • /
    • pp.84-104
    • /
    • 1994
  • The migration of population into the city has been on the increase according as Korea has been industrialized repidly since the 1960's. And there is a steady decrease in rural population. Thus lack of the number of the students forced many elementary Schools to be abolished. The aim of this study is to grasp the abolition types and the regional characteristics with the subject region of Chungbuk province. From the viewpoint of the increasing abolition of the elementary schools, I think it is very important to understand how the elementary schools have been abolished so far and predict how the subject region will have been changed in geography. Data for this study are based on Annual Establishment-Abolition Situation of the Schools published by Chungbuk office of Education in 1992, and many Kinds of the statistical reports, and the interview with the related. The results are as follows: 1. By examining the change of the number of the elementary schools and students in Chnugbuk, the numder of the students had also decreased since 1969 and was less than the half in 1990. As the number of the schools began to decrease ten years later than the students began to, the abolition of the elementary schools has started in reality from 1980's. 2. The 72 elementary schools were aboilshed between 1980 and 1992: the principal school is 9.7%, the branch school is 90.3%. The most fifteen schools are abolished in Yongdong-county and Chechon-county, and the least one school is abolished in Chechon-city and Okchon-county, and there is no abolition in Chongju-city and Chungju-city: According to the type of the abolition process, the least seven principal schools are abolished, and the principal school is reorganized as a branch school and twenty eight branch schools are abolished, and the most thirty seven branch schools are abolished. 3. When special change of the abolition is classified into the first perio (1980-1986) and the second period (1987-1992), in the first period the principal and branch schools were abolished and they are 13.9% of total abolition. The abolition out of them by building a dam is 60%. The principal schools in the submerged area though they have many students, were abolished. In the second period sixty two branch schools are abolished and they are 86.1% of total abolition. The most fifteen schools are abolished in Yongdong-county, thirteen in Chechon-county, seven in Tanyang-county, six in Chongwon-county, five in Chungwon-county and Koesan-county. Unlike the first period, the schools were abolished in this period because the number of students was so small. In this period sixty branch schools were abolished. All the students in the abolished schools except six schools transfered to the principal schools. The 58 school authorities help the students attend school by bus or support the expenses for attending school after that. 4. The abolition types of city, county and myon are classified into five types by the number of the abolished schools. The most forty nine abolished schools in type II are 68.1 of the total abolition. The least three abolished ones in type I are 12.5%. Considering the relation between the abolition type ane the number of schools and students, the number of the schools, increased in type I, II, III, V except IV from 1980 and then have decreased by abolition since 1980, while the more students decreased than they did in 1970 and the more the abolished school increases, the less the students decreases. The average students per school decreased in every abolition type and the most students decreased in type IV. 5. Considering the relation between the abolition type and the regional characteristics, most abolished schools were located between 100m and 300m above the sea level and it is 71% of the total abolition. The region without the abolition is high in the ratio of the cultivate land, ratio of rice field, and the part-time farmer, but the region with many abolition is low in the ratio of cultivated land. As for the manufacturing there are the most city, county and myon in the abolition type in Youngdong-county and Chechon-county where the manufacturing ratio of employing is low but Chongju-city without the abolition is a region where the manufacturing ratio is high. Consequently the development of the manufacturing causes the population to emigation out and the decrease of the population leads the transport is difficult of access, the facilities sold after being abolished are not being used in many ways. 7. Take an example of Youndong-county where the most schools were abolished, I have examined the school district and the population characteristics of the abolition. Though there were more villages, households, populations in the region that is higher than low above the sea level, the schools were abolished. Therefore we know that above the sea level had a great effect on the abolition. As a result of the regional analysis of the abolition, many schools were abolished by the artificial buildings such as a dam in the early 1980's but the schools in the late 1980's were abolished ten years later after the students decreased. More schools were abolished in the region where the manufacturing industry didn't develop. And the higher the school position was above the sea level, the sooner the school was abolished. It is also proved that both the beautiful natural scenery and accessibility are the important factor in using the abolished facilities practically.

  • PDF

Effect of the Suicide Prevention Program to the Impulsive Psychology of the Elementary School Student (자살예방 프로그램이 초등학교 충동심리에 미치는 영향)

  • Kang, Soo Jin;Kang, Ho Jung;Cho, Won Cheol;Lee, Tae Shik
    • Journal of Korean Society of Disaster and Security
    • /
    • v.6 no.1
    • /
    • pp.65-72
    • /
    • 2013
  • In this study, the early suicide prevention program was applied to the elementary school students and compared the prior & post effect of the program, and verified the status of psychology change like emotional status, or temptation to take a suicide, and presented the possibility as a suicide prevention program. The period of adolescence is the very unstable period in the process of growth being cognitively immature, emotionally impulsive period. It is the period emotionally unstable and unpredictable possible to select the method of suicide as an extreme method to escape the reality, or impulsive problem solving against small conflict or dispute situation. Many stress of the student such as recent nuclear family, expectation of parents to their children, education problem, socio-environmental elements, individual psychological factor lead students to the extreme activity of suicide in recent days. In this study, the scope of stress experienced in the elementary school as well as idea and degree of temptation regarding suicide by the suicide prevention program were identified, and through prevention program such as meditation training, breath training and through experience of anger control, emotion-expression, self overcome and establish positive self-identity and make understanding Self-control, Self-esteem & preciousness of life based on which the effect to suicide prevention was analyzed. The study was made targeting 51 students of 2 classes of 6th grade of elementary school of Goyang-si and processed 30 minutes every morning focused on through experience & activity of the principle & method of brain science. The data was collected for 20 times before starting morning class by using Suicide Probability Scale(herein SPS-A) designed to predict effectively suicide Probability, suicide risk prediction scale, surveyed by 7 areas such as Positive outlook, Within the family closeness, Impulsivity, Interpersonal hostility, Hopelessness, Hopelessness syndrome, suicide accident. Analytical methods and validation was used the Wilcoxon's signed rank test using SPSS Program. Though the process of program in short period, but there was a effective and positive results in the 7 areas in the average comparison. But in the t-test result, there was a different outcome. It indicated changes in the 3 questionnaires (No.7, No.14, No.19) out of 31 SPS-A questionnaires, and there was a no change to the rest item. It also indicated more changes of the students in the class A than class B. And in case of the class A students, psychological changes were verified in the areas of Hopelessness syndrome, suicide accident among 7 areas after the program was processed. Through this study, it could be verified that different results could be derived depending on the Student tendency, program professional(teacher in charge, processing lecturer). The suicide prevention program presented in this article can be a help in learning and suicide prevention with consistent systematization, activation through emotion and impulse control based on emotional stress relief and positive self-identity recovery, stabilization of brain waves, and let the short period program not to be died out but to be continued connecting from childhood to adolescence capable to make surrounding environment for spiritual, physical healthy growth for which this could be an effective program for suicide prevention of the social problem.