• Title/Summary/Keyword: principal components logistic regression

Search Result 9, Processing Time 0.028 seconds

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.1
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.

Principal Components Logistic Regression based on Robust Estimation (로버스트추정에 바탕을 둔 주성분로지스틱회귀)

  • Kim, Bu-Yong;Kahng, Myung-Wook;Jang, Hea-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.531-539
    • /
    • 2009
  • Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.

Bayesian inference of the cumulative logistic principal component regression models

  • Kyung, Minjung
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.203-223
    • /
    • 2022
  • We propose a Bayesian approach to cumulative logistic regression model for the ordinal response based on the orthogonal principal components via singular value decomposition considering the multicollinearity among predictors. The advantage of the suggested method is considering dimension reduction and parameter estimation simultaneously. To evaluate the performance of the proposed model we conduct a simulation study with considering a high-dimensional and highly correlated explanatory matrix. Also, we fit the suggested method to a real data concerning sprout- and scab-damaged kernels of wheat and compare it to EM based proportional-odds logistic regression model. Compared to EM based methods, we argue that the proposed model works better for the highly correlated high-dimensional data with providing parameter estimates and provides good predictions.

Principal Components Regression in Logistic Model (로지스틱모형에서의 주성분회귀)

  • Kim, Bu-Yong;Kahng, Myung-Wook
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.571-580
    • /
    • 2008
  • The logistic regression analysis is widely used in the area of customer relationship management and credit risk management. It is well known that the maximum likelihood estimation is not appropriate when multicollinearity exists among the regressors. Thus we propose the logistic principal components regression to deal with the multicollinearity problem. In particular, new method is suggested to select proper principal components. The selection method is based on the condition index instead of the eigenvalue. When a condition index is larger than the upper limit of cutoff value, principal component corresponding to the index is removed from the estimation. And hypothesis test is sequentially employed to eliminate the principal component when a condition index is between the upper limit and the lower limit. The limits are obtained by a linear model which is constructed on the basis of the conjoint analysis. The proposed method is evaluated by means of the variance of the estimates and the correct classification rate. The results indicate that the proposed method is superior to the existing method in terms of efficiency and goodness of fit.

Supervised Learning-Based Collaborative Filtering Using Market Basket Data for the Cold-Start Problem

  • Hwang, Wook-Yeon;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.421-431
    • /
    • 2014
  • The market basket data in the form of a binary user-item matrix or a binary item-user matrix can be modelled as a binary classification problem. The binary logistic regression approach tackles the binary classification problem, where principal components are predictor variables. If users or items are sparse in the training data, the binary classification problem can be considered as a cold-start problem. The binary logistic regression approach may not function appropriately if the principal components are inefficient for the cold-start problem. Assuming that the market basket data can also be considered as a special regression problem whose response is either 0 or 1, we propose three supervised learning approaches: random forest regression, random forest classification, and elastic net to tackle the cold-start problem, comparing the performance in a variety of experimental settings. The experimental results show that the proposed supervised learning approaches outperform the conventional approaches.

Analysis of cycle racing ranking using statistical prediction models (통계적 예측모형을 활용한 경륜 경기 순위 분석)

  • Park, Gahee;Park, Rira;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.25-39
    • /
    • 2017
  • Over 5 million people participate in cycle racing betting and its revenue is more than 2 trillion won. This study predicts the ranking of cycle racing using various statistical analyses and identifies important variables which have influence on ranking. We propose competitive ranking prediction models using various classification and regression methods. Our model can predict rankings with low misclassification rates most of the time. We found that the ranking increases as the grade of a racer decreases and as overall scores increase. Inversely, we can observe that the ranking decreases when the grade of a racer increases, race number four is given, and the ranking of the last race of a racer decreases. We also found that prediction accuracy can be improved when we use centered data per race instead of raw data. However, the real profit from the future data was not high when we applied our prediction model because our model can predict only low-return events well.

Pharmacists' Perceptions of Barriers to Providing Appropriate Pharmaceutical Services in Community Pharmacies (지역약국 약료서비스 제공의 장애요인: 약사 대상 설문조사)

  • Sohn, Hyun Soon;Kim, Seong-Ok;Joo, Kyung-Mi;Park, Hyekyung;Han, Euna;Ahn, Hyung Tae;Choi, Sang-Eun
    • Korean Journal of Clinical Pharmacy
    • /
    • v.25 no.2
    • /
    • pp.94-101
    • /
    • 2015
  • Background: In order to achieve the goals of community pharmacy practice, its legal, labour-related, and economic barriers need to be identified. This study examined pharmacists' perceptions of constraints on providing optimal pharmacy services in order to identify underlying factors and analyse the associations between barriers and pharmaceutical services in community pharmacies. Methods: A survey targeting pharmacy owners was conducted from May to June 2012 using a structured questionnaire including nine pharmaceutical service items. According to the service provision level, we classified pharmacists as inactive (fewer than 5 items among the listed 9 service items) and active providers (5 or more items). Principal component analysis was used to group significant factors for barriers into four thematic components. Associations between the participants' demographics and pharmacy characteristics and the services provided were explored by logistic regression analyses. Results: Participants were 402 pharmacists. Over 60% provided disease management services for hypertension, diabetes, and hyperlipidaemia. Variables that affected pharmaceutical services included the lack of separate areas for patient counselling (OR: 2.12, 95% CI: 1.18-3.80), and clinical knowledge and information-related barriers (OR: 0.59, 95% CI: 0.36-0.97). Conclusion: Strategies for improving clinical knowledge and providing expeditious information are necessary in order to improve community pharmacy services.

Nutrient-derived Dietary Patterns and Risk of Colorectal Cancer: a Factor Analysis in Uruguay

  • Stefani, Eduardo De;Ronco, Alvaro L.;Boffetta, Paolo;Deneo-Pellegrini, Hugo;Correa, Pelayo;Acosta, Gisele;Mendilaharsu, Maria
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.1
    • /
    • pp.231-235
    • /
    • 2012
  • In order to explore the role of nutrients and bioactive related substances in colorectal cancer, we conducted a case-control in Uruguay, which is the country with the highest production of beef in the world. Six hundred and eleven (611) cases afflicted with colorectal cancer and 1,362 controls drawn from the same hospitals in the same time period were analyzed through unconditional multiple logistic regression. This base population was submitted to a principal components factor analysis and three factors were retained. They were labeled as the meat-based, plant-based, and carbohydrates patterns. They were rotated using orthogonal varimax method. The highest risk was positively associated with the meat-based pattern (OR for the highest quartile versus the lowest one 1.63, 95 % CI 1.22-2.18, P value for trend = 0.001), whereas the plant-based pattern was strongly protective (OR 0.60, 95 % CI 0.45-0.81, P value for trend <0.0001. The carbohydrates pattern was only positively associated with colon cancer risk (OR 1.46, 95 % CI 1.02-2.09). The meat-based pattern was rich in saturated fat, animal protein, cholesterol, and phosphorus, nutrients originated in red meat. Since herocyclic amines are formed in the well-done red meat through the action of amino acids and creatine, it is suggestive that this pattern could be an important etiologic agent for colorectal cancer.

Low Coverage and Disparities of Breast and Cervical Cancer Screening in Thai Women: Analysis of National Representative Household Surveys

  • Mukem, Suwanna;Meng, Qingyue;Sriplung, Hutcha;Tangcharoensathien, Viroj
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.18
    • /
    • pp.8541-8551
    • /
    • 2016
  • Background: The coverage of breast and cervical cancer screening has only slightly increased in the past decade in Thailand, and these cancers remain leading causes of death among women. This study identified socioeconomic and contextual factors contributing to the variation in screening uptake and coverage. Materials and Methods: Secondary data from two nationally representative household surveys, the Health and Welfare Survey (HWS) 2007 and the Reproductive Health Survey (RHS) 2009 conducted by the National Statistical Office were used. The study samples comprised 26,951 women aged 30-59 in the 2009 RHS, and 14,619 women aged 35 years and older in the 2007 HWS were analyzed. Households of women were grouped into wealth quintiles, by asset index derived from Principal components analysis. Descriptive and logistic regression analyses were performed. Results: Screening rates for cervical and breast cancers increased between 2007 and 2009. Education and health insurance coverage including wealth were factors contributing to screening uptake. Lower or non-educated and poor women had lower uptake of screenings, as were young, unmarried, and non-Buddhist women. Coverage of the Civil Servant Medical Benefit Scheme increased the propensity of having both screenings, while the universal coverage scheme increased the probability of cervical screening among the poor. Lack of awareness and knowledge contributed to non-use of both screenings. Women were put off from screening, especially Muslim women on cervical screening, because of embarrassment, fear of pain and other reasons. Conclusions: Although cervical screening is covered by the benefit package of three main public health insurance schemes, free of charge to all eligible women, the low coverage of cervical screening should be addressed by increasing awareness and strengthening the supply side. As mammography was not cost effective and not covered by any scheme, awareness and practice of breast self examination and effective clinical breast examination are recommended. Removal of cultural barriers is essential.