• Title/Summary/Keyword: 중회귀분석모형

Search Result 844, Processing Time 0.026 seconds

High-Risk Area for Human Infection with Avian Influenza Based on Novel Risk Assessment Matrix (위험 매트릭스(Risk Matrix)를 활용한 조류인플루엔자 인체감염증 위험지역 평가)

  • Sung-dae Park;Dae-sung Yoo
    • Korean Journal of Poultry Science
    • /
    • v.50 no.1
    • /
    • pp.41-50
    • /
    • 2023
  • Over the last decade, avian influenza (AI) has been considered an emerging disease that would become the next pandemic, particularly in countries like South Korea, with continuous animal outbreaks. In this situation, risk assessment is highly needed to prevent and prepare for human infection with AI. Thus, we developed the risk assessment matrix for a high-risk area of human infection with AI in South Korea based on the notion that risk is the multiplication of hazards with vulnerability. This matrix consisted of highly pathogenic avian influenza (HPAI) in poultry farms and the number of poultry-associated production facilities assumed as hazards of avian influenza and vulnerability, respectively. The average number of HPAI in poultry farms at the 229-municipal level as the hazard axis of the matrix was predicted using a negative binomial regression with nationwide outbreaks data from 2003 to 2018. The two components of the matrix were classified into five groups using the K-means clustering algorithm and multiplied, consequently producing the area-specific risk level of human infection. As a result, Naju-si, Jeongeup-si, and Namwon-si were categorized as high-risk areas for human infection with AI. These findings would contribute to designing the policies for human infection to minimize socio-economic damages.

Assessment of the Contribution of Weather, Vegetation and Land Use Change for Agricultural Reservoir and Stream Watershed using the SLURP model (II) - Calibration, Validation and Application of the Model - (SLURP 모형을 이용한 기후, 식생, 토지이용변화가 농업용 저수지 유역과 하천유역에 미치는 기여도 평가(II) - 모형의 검·보정 및 적용 -)

  • Park, Geun-Ae;Ahn, So-Ra;Park, Min-Ji;Kim, Seong-Joon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.2B
    • /
    • pp.121-135
    • /
    • 2010
  • This study is to assess the effect of potential future climate change on the inflow of agricultural reservoir and its impact to downstream streamflow by reservoir operation for paddy irrigation water supply using the SLURP. Before the future analysis, the SLURP model was calibrated using the 6 years daily streamflow records (1998-200398 and validated using 3 years streamflow data (2004-200698 for a 366.5 $km^2$ watershed including two agricultural reservoirs (Geumgwang8 and Gosam98located in Anseongcheon watershed. The calibration and validation results showed that the model was able to simulate the daily streamflow well considering the reservoir operation for paddy irrigation and flood discharge, with a coefficient of determination and Nash-Sutcliffe efficiency ranging from s 7 to s 9 and 0.5 to s 8 respectively. Then, the future potential climate change impact was assessed using the future wthe fu data was downscaled by nge impFactor method throuih bias-correction, the future land uses wtre predicted by modified CA-Markov technique, and the future ve potentiacovfu information was predicted and considered by the linear regression bpowten mecthly NDVI from NOAA AVHRR ima ps and mecthly mean temperature. The future (2020s, 2050s and 2e 0s) reservoir inflow, the temporal changes of reservoir storaimpand its impact to downstream streamflow watershed wtre analyzed for the A2 and B2 climate change scenarios based on a base year (2005). At an annual temporal scale, the reservoir inflow and storaimpchange oue, anagricultural reservoir wtre projected to big decrease innautumnnunder all possiblmpcombinations of conditions. The future streamflow, soossmoosture and grounwater recharge decreased slightly, whtre as the evapotransporation was projected to increase largely for all possiblmpcombinations of the conditions. At last, this study was analysed contribution of weather, vegetation and land use change to assess which factor biggest impact on agricultural reservoir and stream watershed. As a result, weather change biggest impact on agricultural reservoir inflow, storage, streamflow, evapotranspiration, soil moisture and groundwater recharge.

Impacts of Climate Change and Follow-up Cropping Season Shift on Growing Period and Temperature in Different Rice Maturity Types (미래 기후변화 및 그에 따른 재배시기 조정이 벼 생태형별 생육기간과 생육온도에 미치는 영향)

  • Lee, Chung-Kuen;Kwak, Kang-Su;Kim, Jun-Hwan;Son, Ji-Young;Yang, Won-Ha
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.56 no.3
    • /
    • pp.233-243
    • /
    • 2011
  • This experiment was conducted to investigate the effect of future climate change on growing period and temperature in different rice maturity types as global warming progressed, where Odaebyeo, Hwaseongbyeo, Ilpumbyeo were used as a representative cultivar of early, medium, and medium-late rice maturity type, respectively, and A1B scenario was applied to weather data for future climate change at 57 sites in Korea. When cropping season was not adjusted to climate change, entire growing period and growing temperature were shorten and risen, respectively, as global warming progressed. On the other side, when cropping season was adjusted to climate change, growing period and temperature after heading date were not changed in contrast to growing period and growing temperature before heading which were more seriously shortened and risen as global warming progressed than in not adjusted cropping season. It is supposed that adjusting cropping season to climate change can alleviate rice yield reduction and quality deterioration to some degree by improving growing temperature condition during grain-filling period, but also still have a limit such as seriously shortened growing period indicating that there need to develope actively new rice cultivation methods and varieties for future climate change.

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

Classifying Predominant Type and Examining Risk Factors for Recurrence of Child Maltreatment (아동학대사례의 잠재유형화와 유형별 재학대 위험요인)

  • Lee, Sang-Gyun;Lee, Bong Joo;Kim, Sewon;Kim, Hyun-Soo;Yoo, Joan P.;Jang, Hwa Jung;Chin, Meejung;Park, Ji-Myung
    • Korean Journal of Social Welfare Studies
    • /
    • v.48 no.3
    • /
    • pp.171-208
    • /
    • 2017
  • The purpose of this study is to classify the underlying and parsimonious types of child maltreatment and examine whether the effects of risk factors on child maltreatment recurrence differ by type of maltreatment. We utilized the multiyear national administrative data from the National Child Maltreatment Information System collected by Child Protection Agency in Korea. Of 26,921 child maltreatment victims reported and substantiated on or after January 1, 2012, 1,447 children who had recurrence of child maltreatment until December 31, 2015 were selected as maltreatment recurrence group and 4,580 children who had not experienced maltreatment since first substantiation were assigned as maltreatment non-recurrence group. Latent class analysis(LCA) and latent transition analysis(LTA) were used to group children with similar maltreatment subtypes into discrete classes of child maltreatment recurrence. Logistic regression is employed to examine the association between the child maltreatment predominant types and risk factors for recurrence. Results of LCA and LTA showed four latent classes representing predominant type of child maltreatment: 'physical abuse predominant type', 'emotional abuse predominant type', 'sexual abuse predominant type', and 'neglect type'. Significant differences in the effect of risk factors among latent classes were found in child's age and gender, perpetrator's gender, family poverty, biological parent as the perpetrator, domestic violence toward partner, perpetrator's alcoholic problem, insufficient parenting skills, and out-of-home care service, Based on these findings, results suggested how the typology can be used to guide decision about who to target in prevention and intervention programs, and which features of risk factors to target. Practice and policy implications as well as further research tasks were discussed in the lights of searching for useful and important strategies to prevent recurrence of child maltreatment.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Estimation of Soybean Growth Using Polarimetric Discrimination Ratio by Radar Scatterometer (레이더 산란계 편파 차이율을 이용한 콩 생육 추정)

  • Kim, Yi-Hyun;Hong, Suk-Young
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.44 no.5
    • /
    • pp.878-886
    • /
    • 2011
  • The soybean is one of the oldest cultivated crops in the world. Microwave remote sensing is an important tool because it can penetrate into cloud independent of weather and it can acquire day or night time data. Especially a ground-based polarimetric scatterometer has advantages of monitoring crop conditions continuously with full polarization and different frequencies. In this study, soybean growth parameters and soil moisture were estimated using polarimetric discrimination ratio (PDR) by radar scatterometer. A ground-based polarimetric scatterometer operating at multiple frequencies was used to continuously monitor the soybean growth condition and soil moisture change. It was set up to obtain data automatically every 10 minutes. The temporal trend of the PDR for all bands agreed with the soybean growth data such as fresh weight, Leaf Area Index, Vegetation Water Content, plant height; i.e., increased until about DOY 271 and decreased afterward. Soil moisture lowly related with PDR in all bands during whole growth stage. In contrast, PDR is relative correlated with soil moisture during below LAI 2. We also analyzed the relationship between the PDR of each band and growth data. It was found that L-band PDR is the most correlated with fresh weight (r=0.96), LAI (r=0.91), vegetation water content (r=0.94) and soil moisture (r=0.86). In addition, the relationship between C-, X-band PDR and growth data were moderately correlated ($r{\geq}0.83$) with the exception of the soil moisture. Based on the analysis of the relation between the PDR at L, C, X-band and soybean growth parameters, we predicted the growth parameters and soil moisture using L-band PDR. Overall good agreement has been observed between retrieved growth data and observed growth data. Results from this study show that PDR appear effective to estimate soybean growth parameters and soil moisture.

A Study on the Undrained Deformation Characteristics of Remoulded Marine Clay (재성형(再成形)한 해성점토(海成粘土)의 비배수(非排水) 변형특성(變形特性)에 관(關)한 연구(硏究))

  • Yoon, Hyun Jung;Kang, Yea Mook;Cho, Seong Seup
    • Korean Journal of Agricultural Science
    • /
    • v.12 no.2
    • /
    • pp.309-323
    • /
    • 1985
  • The Paper describes the observed behaviour in the undrained triaxial condition of marine clays remoulded at various different levels of factors, to find out the effects of restricted factors on the stress-strain characteristics. The conventional triaxial compression tests $({\sigma}1>{\sigma}2={\sigma}3)$ were carried out on the 50mm in diameter and 100mm long cylindrical specimens of Gun-san bay mud under controlled various moisture content, density, axial strain rate and passing on No. 200 sieve. Significant conclusions from this study are; 1. The compressible deviator stress at failure of pure marine clay was observed to increase with the decrease of moulding moisture content. 2. The compressible deviator stress at failure increased with the increasing of moulding dry density. 3. The interaction between moisture content and density on the stress-strain characteristics of marine clay was remarkedly significant, as the result of factorial experimental method. 4. The effect of axial strain rate on stress-strain behaviour was unsignificant in marine clay and but the secant moduli could be pronounced on a slight decreasing with increase of the strain rate. 5. With the increasing of the passing on No. 200 sieve, the deviator stress increased regularly. 6. The multiple regression equation could be modeled for the prediction of stress or strain and the comparison with experimental results relatively proved the accuracy.

  • PDF

A Study on the Relationship between Patenting Activity Factors and Company Performance of Korean IT Industry (국내 IT기업의 특허활동요인이 경영성과에 미치는 영향 연구)

  • Kim, Chang Bong;Park, Jeong Ho
    • International Commerce and Information Review
    • /
    • v.18 no.3
    • /
    • pp.249-273
    • /
    • 2016
  • Recently companies consider the patent activity as one of the critical factor for success in global economy even though one of the enterprise's competitiveness factor was productivity in past industry economy. Since there are so many patent dispute globally in IT industry, it is very important for companies to register and manage patents strategically. Therefore, this research analyze relationship between Financial result and 3 patent activity factors like productivity, effectiveness, and high-quality by investigating patent and financial data of 217 Korean IT enterprises. This paper get the following results after building research model and hypothesis based on resource-based theory and analysing the data sets using multiple regression model. First, effectiveness and high-quality of patents showed positive(+) effect on growth of total assets of IT enterprises. Second, three factors of patent activities do not have significant results with average increase rate of sales. Third, only high-quality of patents have positive(+) effect on average increase rate of net income. The differentiation factor of this research is that this paper categorized patent activity factors as quantitative and qualitative factors, and practically suggested strategic direction of patent activities of IT companies which face serious patent distribute globally.

  • PDF