• Title/Summary/Keyword: Rule based regression

Search Result 87, Processing Time 0.02 seconds

Penalized quantile regression tree (벌점화 분위수 회귀나무모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1361-1371
    • /
    • 2016
  • Quantile regression provides a variety of useful statistical information to examine how covariates influence the conditional quantile functions of a response variable. However, traditional quantile regression (which assume a linear model) is not appropriate when the relationship between the response and the covariates is a nonlinear. It is also necessary to conduct variable selection for high dimensional data or strongly correlated covariates. In this paper, we propose a penalized quantile regression tree model. The split rule of the proposed method is based on residual analysis, which has a negligible bias to select a split variable and reasonable computational cost. A simulation study and real data analysis are presented to demonstrate the satisfactory performance and usefulness of the proposed method.

Age Prediction based on the Transcriptome of Human Dermal Fibroblasts through Interval Selection (피부섬유모세포 전사체 정보를 활용한 구간 선택 기반 연령 예측)

  • Seok, Ho-Sik
    • Journal of IKEEE
    • /
    • v.26 no.3
    • /
    • pp.494-499
    • /
    • 2022
  • It is reported that genome-wide RNA-seq profiles has potential as biomarkers of aging. A number of researches achieved promising prediction performance based on gene expression profiles. We develop an age prediction method based on the transcriptome of human dermal fibroblasts by selecting a proper age interval. The proposed method executes multiple rules in a sequential manner and a rule utilizes a classifier and a regression model to determine whether a given test sample belongs to the target age interval of the rule. If a given test sample satisfies the selection condition of a rule, age is predicted from the associated target age interval. Our method predicts age to a mean absolute error of 5.7 years. Our method outperforms prior best performance of mean absolute error of 7.7 years achieved by an ensemble based prediction method. We observe that it is possible to predict age based on genome-wide RNA-seq profiles but prediction performance is not stable but varying with age.

Implementation and Analysis of the Agent based Object-Oriented Software Test Tool, TAS (에이전트 기반의 객체지향 소프트웨어 테스트 도구인 TAS의 구현 및 분석)

  • Choi, Jeon-Geun;Choi, Byoungju
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.10
    • /
    • pp.732-742
    • /
    • 2001
  • The concept of an agent has become important in computer science and has been applied to the number of application domains such electronic commerce and information retrieval. But, no one has proposed yet in software test. The test agent system applied the concept of an agent to software test is new test tool. It consists of the User Interface Agent. the Test Case Selection & Testing Agent and the Regression Test Agent. Each of these agents, with their intelligent rules, carry out the tests autonomously by empolying the object-oriented test processes. This system has 2 advantages. Firstly since the tests are carried our autonomously, it minimizes tester interference and secondly, since redundant-free and consistent effective test cases are intellectually selected, the testing time is reduced while the fault detection effectiveness improves. In this paper, by actually showing the testing process being carried out autonomously by the 3 agents that form the TAS, we show that the TAS minimizes tester interference. By also carrying out the 4 different types of experiments on the RE-Rule, CTS-Rule, overall TAS experiment, and the fault-detection effectiveness experiment on the RE-Rule, we show the cut-down on the testing time and improvement in the fault detection effectivity.

  • PDF

Development of Hysteretic Analysis Model for RC beam with Relocated Plastic Hinge from Column Face (소성힌지가 기둥면에서 이동된 RC보의 이력거동 해석모델)

  • Seo, Soo-Yeon;Yoon, Seung-Joe;Lee, Li-Hyung;Kwon, Young-Joon
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.6 no.3
    • /
    • pp.167-175
    • /
    • 2002
  • In this paper, an analytical model is proposed for analyzing the hysteretic behavior of RC beam with relocated plastic hinge region under load reversals. The plastic hinge is modeled not to be concentrated on a point but to be distributed on a finite size in beam. This is based on the assumption that the plastic hinge is formed over a certain region, in which the curvature varies. Tangential matrix is reformed using stiffness coefficients including variales such as the length and location of plastic hinge region. In order to construct the hysteretic rule of hinge, modified Takeda rule is also proposed on the base of regression analysis for the previous test results. Previous specimens are analyzed using the proposed model and the result is compared with test result. On the result of the comparison, it was shown that the hysteretic behavior of beams with different location of plastic hinge region could be prediced using the proposed analytical process.

Comparing Accuracy of Imputation Methods for Categorical Incomplete Data (범주형 자료의 결측치 추정방법 성능 비교)

  • 신형원;손소영
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.33-43
    • /
    • 2002
  • Various kinds of estimation methods have been developed for imputation of categorical missing data. They include category method, logistic regression, and association rule. In this study, we propose two fusions algorithms based on both neural network and voting scheme that combine the results of individual imputation methods. A Mont-Carlo simulation is used to compare the performance of these methods. Five factors used to simulate the missing data pattern are (1) input-output function, (2) data size, (3) noise of input-output function (4) proportion of missing data, and (5) pattern of missing data. Experimental study results indicate the following: when the data size is small and missing data proportion is large, modal category method, association rule, and neural network based fusion have better performances than the other methods. However, when the data size is small and correlation between input and missing output is strong, logistic regression and neural network barred fusion algorithm appear better than the others. When data size is large with low missing data proportion, a large noise, and strong correlation between input and missing output, neural networks based fusion algorithm turns out to be the best choice.

Orthotropic Theory for the Prediction of Mechanical Performance in Thermally Point-bonded Nonwovens

  • Kim, Han-Seong
    • Fibers and Polymers
    • /
    • v.5 no.2
    • /
    • pp.139-144
    • /
    • 2004
  • The orthotropic theory is applied for the nonwoven fabrics that have a preferred orientation direction, the case if the structure is not isotropic. The polynomial regression analysis is employed to allow the attainment of more statistically meaningful information. A functional form based on the transformation rule is developed for the orthotropic approach. The predictions thus obtained are seen to be in excellent agreements with experimental data and the resulting compliances exhibit meaningful relationships for the processing conditions. The compatibility of the compliances from tensile and shear analyses has been explored prior to a practical application of the four compliances defining the in-plane strain-stress field.

A new classification method using penalized partial least squares (벌점 부분최소자승법을 이용한 분류방법)

  • Kim, Yun-Dae;Jun, Chi-Hyuck;Lee, Hye-Seon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.931-940
    • /
    • 2011
  • Classification is to generate a rule of classifying objects into several categories based on the learning sample. Good classification model should classify new objects with low misclassification error. Many types of classification methods have been developed including logistic regression, discriminant analysis and tree. This paper presents a new classification method using penalized partial least squares. Penalized partial least squares can make the model more robust and remedy multicollinearity problem. This paper compares the proposed method with logistic regression and PCA based discriminant analysis by some real and artificial data. It is concluded that the new method has better power as compared with other methods.

Data Mining for Knowledge Management in a Health Insurance Domain

  • Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.1
    • /
    • pp.73-82
    • /
    • 2000
  • This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.

  • PDF

Country-Level Institutional Quality and Public Debt: Empirical Evidence from Pakistan

  • MEHMOOD, Waqas;MOHD-RASHID, Rasidah;AMAN-ULLAH, Attia;ZI ONG, Chui
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.4
    • /
    • pp.21-32
    • /
    • 2021
  • This paper aims to investigate the relationship between country-level institutional quality and public debt in the context of Pakistan. The hypotheses of this study were assessed by using the country-level institutional quality data for Pakistan throughout the years from 1996 to 2018. Data came from the World Databank, IMF and Worldwide Governance Indicators databases. For the analysis, ordinary least square, quantile regression and robust regression were employed to assess the factors influencing the public debt. The results of this study indicate that the factors of voice and accountability, regulatory quality, and control of corruption have a positive and significant relationship with public debt, while political stability, government effectiveness, and the rule of law have a negative and significant effect on public debt. Based on the findings, a weak country-level institutional quality poses a substantial market risk as it signals the existence of an unfavorable economic condition that raises public debt. It was also revealed that an improved performance of country-level institutional quality can lead to the improvement of financial market transparency, hence reduce public debt. In contrast to previous studies, the present study will be breaking ground in enhancing public insight regarding the impact of country-level institutional quality on Pakistan's public debt.

Prediction of High Level Ozone Concentration in Seoul by Using Multivariate Statistical Analyses (다변량 통계분석을 이용한 서울시 고농도 오존의 예측에 관한 연구)

  • 허정숙;김동술
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.9 no.3
    • /
    • pp.207-215
    • /
    • 1993
  • In order to statistically predict $O_3$ levels in Seoul, the study used the TMS (telemeted air monitoring system) data from the Department of Environment, which have monitored at 20 sites in 1989 and 1990. Each data in each site was characterized by 6 major criteria pollutants ($SO_2, TSP, CO, NO_2, THC, and O_3$) and 2 meteorological parameters, such as wind speed and wind direction. To select proper variables and to determine each pollutant's behavior, univariate statistical analyses were extensively studied in the beginning, and then various applied statistical techniques like cluster analysis, regression analysis, and expert system have been intensively examined. For the initial study of high level $O_3$ prediction, the raw data set in each site was separated into 2 group based on 60 ppb $O_3$ level. A hierarchical cluster analysis was applied to classify the group based on 60 ppb $O_3$ into small calsses. Each class in each site has its own pattern. Next, multiple regression for each class was repeatedly applied to determine an $O_3$ prediction submodel and to determine outliers in each class based on a certain level of standardized redisual. Thus, a prediction submodel for each homogeneous class could be obtained. The study was extended to model $O_3$ prediction for both on-time basis and 1-hr after basis. Finally, an expect system was used to build a unified classification rule based on examples of the homogenous classes for all of sites. Thus, a concept of high level $O_3$ prediction model was developed for one of $O_3$ alert systems.

  • PDF