• Title/Summary/Keyword: Predictive analysis

Search Result 2,056, Processing Time 0.033 seconds

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

Clinical Analysis of Influenza in Children and Rapid Antigen Detection Test on First Half of the Year 2004 in Busan (2004 상반기 부산 지역에서 유행한 인플루엔자의 임상 역학적 분석 및 인플루엔자 진단에 있어서의 신속 항원 검사법)

  • Choi, So Young;Lee, Na Young;Kim, Sung Mi;Kim, Gil Heun;Jung, Jin Hwa;Choi, Im Jung;Cho, Kyung Soon
    • Pediatric Infection and Vaccine
    • /
    • v.11 no.2
    • /
    • pp.158-169
    • /
    • 2004
  • Purpose : Although influenza is one of the most important cause of acute respiratory tract infections in children, virus isolation is not popular and there are only a few clinical studies on influenza and diagnostic methods. We evaluated the epidemiological and clinical features of influenza in children and rapid antigen detection test(QuickVue influenza test) on fist half of the year 2004 in Busan. Methods : From January 2004 to June 2004, throat swab and nasal secretion were obtained and cultured for the isolation of influenza virus and tested by rapid antigen detection test(QuickVue influenza test) in children with suspected influenza infections. The medical records of patients with influenza virus infection were reviewed retrospectively. Results : Influenza viruses were isolated in 79(17.2%) out of 621 patients examined. Influenza virus was isolated mainly from March to April 2004. The ratio of male and female with influenza virus infection was 1.2 : 1 with median age of 4 years 6month. The most common clinical diagnosis of influenza virus infection was bronchitis. There was no difference between influenza A and B infection in clinical diagnosis and symptoms. All patients recovered without severe complication. The sensitivity obtained for rapid antigen detection test (QuickVue influenza test) was 93.6% and the specificity was 80.2%, the positive predictive value 40.8%, the negative predictive value 98.8%. Conclusion : With rapid antigen detection test, it is possible early detection of influenza in children. reduction in use of antimicrobial agent and early use of antiviral agent.

  • PDF

Analysis of Vasopressin Receptor Type 2(AVPR2) Gene in a Pedigree with Congenital Nehrogenic Diabetes Insipidus : Identification of a Family with R202C Mutation in AVPR2 Gene (신성요붕증 가계에서 바소프레신 V2 수용체(AVPR2) 유전자 분석 : AVPR2 유전자 R202C 돌연변이의 발견)

  • Park June-Dong;Kim Ho-Sung;Kim Hee-Joo;Lee Yoon-Kyung;Kwak Young-Ho;Ha Il-Soo;Cheong Hae-Il;Choi Yong;Park Hye-Won
    • Childhood Kidney Diseases
    • /
    • v.3 no.2
    • /
    • pp.209-216
    • /
    • 1999
  • Purpose : Nephrogenic diabetes insipidus (NDI) is a rare X-linked disorder associated with renal tubule resistance to arginine vasopressin (AVP). The hypothesis that the defect underlying NDI might be a dysfunctional renal AVPR2 has recently been proven by the identification of mutations in the AVPR2 gene in NDT patients. To investigate the association of mutations in th AVPR2 gene with NDI, we analyzed the AVPR2 gene located on the X chromosome. Methods : We have analyzed the AVPR2 gene in a kindred with X-linked NDI. The proband and proband's mother were analyzed by polymerase chain reaction-single strand conformational polymorphism(PCR-SSCP) and DNA sequencing of the AVPR2 gene. We also have used restriction enzyme analysis of genomic PCR product to evaluate the AVPR2 gene. Results : C to T transition at codon 202, predictive of an exchange of tryptophan 202 by cysteine(R202C) in the third extracellular domain was identified. This mutation causes a loss of Hae III site within the gene. Conclusion : We found a R202C missense mutation in the AVPR2 gene causing X-linked NDI, and now direct mutational analysis is available for carrier screening and early diagnosis.

  • PDF

Limitation of Prediction on Intravenous Immunoglobulin Responsiveness in Kawasaki Disease (가와사끼병에서 정맥용 면역글로불린 치료 반응 예측의 한계)

  • Kim, Seong-Koo;Han, Ji-Yoon;Rhim, Jung Woo;Oh, Jin Hee;Han, Ji-Whan;Lee, Kyung Yil;Kang, Jin-Han;Lee, Joon-Sung
    • Pediatric Infection and Vaccine
    • /
    • v.17 no.2
    • /
    • pp.169-176
    • /
    • 2010
  • Purpose : We aimed to evaluate predictive parameters for non-response to intravenous immunoglobulin (IVIG) in patients with Kawasaki disease (KD) before IVIG use using two controls. Methods : We evaluated 229 consecutive KD patients who were treated with 2 g/kg of IVIG at a single center. Those who had persistent fever >24 hours after IVIG infusion made up the 23 IVIG non-responders; the first control included a total 206 defervesced cases and the second control included 46 cases that were matched for age and pre-treatment fever duration to non-responders. Results : Demographic and clinical characteristics were similar in IVIG non-responders and responders at presentation. As for laboratory findings, the neutrophil differential, CRP, AST, ALT, and LDH were higher, and lymphocyte differential, total protein, albumin, platelet count, and total cholesterol were significantly lower in IVIG non-responders compared to responders by univariate analysis in both study designs. However in multivariate analysis, non-responders showed a significantly higher neutrophil differential (cutoff value, >77%, sensitivity 68.4% and specificity 79.5%) and lower cholesterol (<124 mg/dL, sensitivity 79% and specificity 70.5%). Whereas plasma albumin (<3.6 g/dL, sensitivity 73.7% and specificity 60%) was the sole laboratory parameter of non-responders in the second study design. Conclusion : Severity of inflammation in KD was reflected by higher or lower laboratory values at presentation. Because the multivariate analysis for these indices may be influenced by some confounding factors, including the numbers of patients of different ages and fever duration, other assessment modalities are needed for KD patients with the greatest risk of coronary artery lesions.

The Results of Curative Concurrent Chemoradiotherapy for Anal Carcinoma (항문암 환자에서 근치적 목적의 동시 항암화학 방사선치료의 결과)

  • Jeong, Jae-Uk;Yoon, Mee-Sun;Song, Ju-Young;Ahn, Sung-Ja;Chung, Woong-Ki;Nah, Byung-Sik;Nam, Taek-Keun
    • Radiation Oncology Journal
    • /
    • v.28 no.4
    • /
    • pp.205-210
    • /
    • 2010
  • Purpose: To evaluate the predictive factors for treatment response and prognostic factors affecting survival outcomes after concurrent chemoradiotherapy (CCRT) for patients with anal squamous cell carcinoma. Materials and Methods: Medical records of forty two patients with histologically confirmed analsquamous cell carcinoma, who had complete CCRT between 1993 and 2008, were reviewed retrospectively. Median age was 61.5 years (39~89 years), and median radiotherapy (RT) dose was 50.4 Gy (30.0~64.0 Gy). A total of 36 patients had equal to or less than T2 stage (85.7%). Fourteen patients (33.3%) showed regional nodal metastasis, 36 patients (85.7%) were treated with 5-fluorouracil (5-FU) plus mitomycin, and the remaining patients were treated by 5-FU plus cisplatinum. Results: The median follow-up time was 62 months (2~202 months). The 5-year overall survival, loco regional relapse-free survival, disease-free survival, and colostomy-free survival rates were 86.0%, 71.7%, 71.7%, 78.2%, respectively. Regarding overall survival, the Eastern Cooperative Oncology Group (ECOG) performance status and complete response were found to be significant prognostic factors on univariate analysis. For multivariate analysis, only the ECOG performance status was significant. No significant factor was found for locoregional relapse-free survival or disease-free survival and similarly for treatment response, no significant factor was determined on logistic regression analysis. There were 7 patients who had local or regional recurrences and one patient with distant metastasis. The only evaluable toxicity in all patients was radiation dermatitis of perianal skin (grade 3), which developed in 4 patients (9.5%) and grade 2 in 22 patients (52.4%). Conclusion: This study revealed that patients with a performance score of ECOG 0-1 survived significantly longer than those with a poorer score. Finally, there was no significant predicting factors tested for treatment response.

Crime Incident Prediction Model based on Bayesian Probability (베이지안 확률 기반 범죄위험지역 예측 모델 개발)

  • HEO, Sun-Young;KIM, Ju-Young;MOON, Tae-Heon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.4
    • /
    • pp.89-101
    • /
    • 2017
  • Crime occurs differently based on not only place locations and building uses but also the characteristics of the people who use the place and the spatial structures of the buildings and locations. Therefore, if spatial big data, which contain spatial and regional properties, can be utilized, proper crime prevention measures can be enacted. Recently, with the advent of big data and the revolutionary intelligent information era, predictive policing has emerged as a new paradigm for police activities. Based on 7420 actual crime incidents occurring over three years in a typical provincial city, "J city," this study identified the areas in which crimes occurred and predicted risky areas. Spatial regression analysis was performed using spatial big data about only physical and environmental variables. Based on the results, using the street width, average number of building floors, building coverage ratio, the type of use of the first floor (Type II neighborhood living facility, commercial facility, pleasure use, or residential use), this study established a Crime Incident Prediction Model (CIPM) based on Bayesian probability theory. As a result, it was found that the model was suitable for crime prediction because the overlap analysis with the actual crime areas and the receiver operating characteristic curve (Roc curve), which evaluated the accuracy of the model, showed an area under the curve (AUC) value of 0.8. It was also found that a block where the commercial and entertainment facilities were concentrated, a block where the number of building floors is high, and a block where the commercial, entertainment, residential facilities are mixed are high-risk areas. This study provides a meaningful step forward to the development of a crime prediction model, unlike previous studies that explored the spatial distribution of crime and the factors influencing crime occurrence.

A Study on the Field Data Applicability of Seismic Data Processing using Open-source Software (Madagascar) (오픈-소스 자료처리 기술개발 소프트웨어(Madagascar)를 이용한 탄성파 현장자료 전산처리 적용성 연구)

  • Son, Woohyun;Kim, Byoung-yeop
    • Geophysics and Geophysical Exploration
    • /
    • v.21 no.3
    • /
    • pp.171-182
    • /
    • 2018
  • We performed the seismic field data processing using an open-source software (Madagascar) to verify if it is applicable to processing of field data, which has low signal-to-noise ratio and high uncertainties in velocities. The Madagascar, based on Python, is usually supposed to be better in the development of processing technologies due to its capabilities of multidimensional data analysis and reproducibility. However, this open-source software has not been widely used so far for field data processing because of complicated interfaces and data structure system. To verify the effectiveness of the Madagascar software on field data, we applied it to a typical seismic data processing flow including data loading, geometry build-up, F-K filter, predictive deconvolution, velocity analysis, normal moveout correction, stack, and migration. The field data for the test were acquired in Gunsan Basin, Yellow Sea using a streamer consisting of 480 channels and 4 arrays of air-guns. The results at all processing step are compared with those processed with Landmark's ProMAX (SeisSpace R5000) which is a commercial processing software. Madagascar shows relatively high efficiencies in data IO and management as well as reproducibility. Additionally, it shows quick and exact calculations in some automated procedures such as stacking velocity analysis. There were no remarkable differences in the results after applying the signal enhancement flows of both software. For the deeper part of the substructure image, however, the commercial software shows better results than the open-source software. This is simply because the commercial software has various flows for de-multiple and provides interactive processing environments for delicate processing works compared to Madagascar. Considering that many researchers around the world are developing various data processing algorithms for Madagascar, we can expect that the open-source software such as Madagascar can be widely used for commercial-level processing with the strength of expandability, cost effectiveness and reproducibility.

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

A study on design process for public space by users behavioral characteristics (이용자 행태 특성에 의한 공용공간의 디자인 프로세스 연구)

  • 김개천;김범중
    • Archives of design research
    • /
    • v.17 no.1
    • /
    • pp.89-98
    • /
    • 2004
  • A systemic approach to behavior on the basis of human psychology is needed for behavior-centered space design. Also, the recognition that human and environment, in all, have complementarity is needed- human and space shall be understood as a general phenomenon, supposing interaction. Design of behavior-oriented space means configuration and coordination of physical subjects as well as understanding, analysis and reflection of psychological and behavioral phenomena. It is analysis of a private individual as well as understanding of interaction between human groups, as well. In respect of space recognition, analysis not on material movement but on energy circulation and variable is important. It means that the understanding of user's behavior and psychology does not orient reasonable purpose just for convenience. That is, such understanding intends to understand behavioral patterns and psychological phenomena between space and human beyond the decomposition of structure of human and space into physical elements and the design based on standardized data. Thereby, more human-oriented space design might be implemented by the understanding of behavioral essence. Also, a user-centered design process from another viewpoint might be created, and the general amenity among man, space and environment - better environmental quality - might be produced. For this, the consciousness of human activity that is, activity system shall be ahead of it, and the approaches for design shall be implemented into a process not in predictive ideas but in semi-scientific system. On the basis of the above view, this study was attempted to investigate the orientation of design to recognize space as another life, and explore a process where it is drawn into a design language on the basis of human behavior. If the essence of space behavior and the activity system are analyzed through user observation and it is reflected upon a space design program and then developed into a formative language, a new design process on human and environment might be produced. In conclusion, the reflection of user's behavior and psychology into design, contrary to existing public space design based on physical data, can orient quality improvement of human life and ultimately be helpful to the proposition, 'humanization of space'.

  • PDF

Evaluation of Error Factors in Quantitative Analysis of Lymphoscintigraphy (Lymphoscintigraphy의 정량분석 시 오류 요인에 관한 평가)

  • Yeon, Joon-Ho;Kim, Soo-Yung;Choi, Sung-Ook;Seok, Jae-Dong
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.15 no.2
    • /
    • pp.76-82
    • /
    • 2011
  • Purpose: Lymphoscintigraphy is absolutely being used standard examination in lymphatic diagnosis, evaluation after treatment, and it is useful for lymphedema to plan therapy. In case of lymphoscintigraphy of lower-extremity lymphedema, it had an effect on results if patients had not pose same position on the examination of 1 min, 1 hour and 2 hours after injection. So we'll study the methods to improve confidence with minimized quantitative analysis errors by influence factors. Materials and Methods: Being used the Infinia of GE Co. we injected $^{99m}Tc$-phytate 37 MBq (1.0 mCi) 4 sylinges into 40 people's feet hypodermically from June to August 2010 in Samsung Medical Center. After we acquired images of fixed and unfixed condition, we confirmed the count values change by attenuation of soft tissue and bone according to different feet position. And we estimated 5 times increasing 2 cm of distance between $^{99m}Tc$ point source and detector each time to check counts difference according to distance change by different feet position. Finally, we compared 1 and 6 min lymphoscintigraphy images with same position to check the effect of quantitative analysis results owing to difference of amounts of movement of the $^{99m}Tc$-phytate in the lymphatic duct. Results: Percentage difference regarding error values showed minimum 2.7% and maximum 25.8% when comparing fixed and unfixed feet position of lymphoscintigraphy examination at 1 min after injection. And count values according to distance were 173,661 (2 cm), 172,095 (4 cm), 170,996 (6 cm), 167,677 (8 cm), 169,208 counts (10 cm) which distance was increased interval of 2 cm and basal value was mean 176,587 counts, and percentage difference values were not over 2.5% such as 1.27, 1.79, 2.04, 2.42, 2.35%. Also, Assessment results about amounts of movement in lymphatic duct within 6 min until scanning after injection showed minimum 0.15%, and maximum 2.3% which were amounts of movement. We can recognize that error values represent over 20% due to only attenuation of soft tissue and bone except for distance difference (2.42%) and amounts of movement in lymphatic duct (2.3%). Conclusion: It was show that if same patients posed different feet position on the examination of 1 min, 1 hour and 2 hours after injection in the lymphoscintigraphy which is evaluating lymphatic flow of patients with lymphedema and analyzing amount of intake by lymphatic system, maximum error value represented 25.8% due to attenuation of soft tissue and bone, and PASW (Predictive Analytics Software) showed that fixed and unfixed feet position was different each other. And difference of distance between detector and feet and change of count values by difference of examination beginning time after injection influence on quantitative analysis results partially. Therefore, we'll make an effort to fix feet position and make the most of fixing board in lymphoscintigraphy with quantitative analysis.

  • PDF