• Title/Summary/Keyword: 로지스틱 회귀모형

Search Result 432, Processing Time 0.021 seconds

Monitoring Seasonal Influenza Epidemics in Korea through Query Search (인터넷 검색어를 활용한 계절적 유행성 독감 발생 감지)

  • Kwon, Chi-Myung;Hwang, Sung-Won;Jung, Jae-Un
    • Journal of the Korea Society for Simulation
    • /
    • v.23 no.4
    • /
    • pp.31-39
    • /
    • 2014
  • Seasonal influenza epidemics cause 3 to 5 millions severe illness and 250,000 to 500,000 deaths worldwide each year. To prepare better controls on severe influenza epidemics, many studies have been proposed to achieve near real-time surveillance of the spread of influenza. Korea CDC publishes clinical data of influenza epidemics on a weekly basis typically with a 1-2-week reporting lag. To provide faster detection of epidemics, recently approaches using unofficial data such as news reports, social media, and search queries are suggested. Collection of such data is cheap in cost and is realized in near real-time. This research aims to develop regression models for early detecting the outbreak of the seasonal influenza epidemics in Korea with keyword query information provided from the Naver (Korean representative portal site) trend services for PC and mobile device. We selected 20 key words likely to have strong correlations with influenza-like illness (ILI) based on literature review and proposed a logistic regression model and a multiple regression model to predict the outbreak of ILI. With respect of model fitness, the multiple regression model shows better results than logistic regression model. Also we find that a mobile-based regression model is better than PC-based regression model in estimating ILI percentages.

A study on entertainment TV show ratings and the number of episodes prediction (국내 예능 시청률과 회차 예측 및 영향요인 분석)

  • Kim, Milim;Lim, Soyeon;Jang, Chohee;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.809-825
    • /
    • 2017
  • The number of TV entertainment shows is increasing. Competition among programs in the entertainment market is intensifying since cable channels air many entertainment TV shows. There is now a need for research on program ratings and the number of episodes. This study presents predictive models for entertainment TV show ratings and number of episodes. We use various data mining techniques such as linear regression, logistic regression, LASSO, random forests, gradient boosting, and support vector machine. The analysis results show that the average program ratings before the first broadcast is affected by broadcasting company, average ratings of the previous season, starting year and number of articles. The average program ratings after the first broadcast is influenced by the rating of the first broadcast, broadcasting company and program type. We also found that the predicted average ratings, starting year, type and broadcasting company are important variables in predicting of the number of episodes.

A Bike Mode Share Estimation Model and Analysis of the Bike Demand Factor Effects (자전거 수단분담률 추정모형 구축 및 자전거 수요요인분석)

  • Lee, Gyu-Jin;Choe, Gi-Ju
    • Journal of Korean Society of Transportation
    • /
    • v.28 no.3
    • /
    • pp.145-155
    • /
    • 2010
  • As the green transportation mode, revitalization of bike usage attracts remarkable public attention. For the acquirement of effective outcome, however, the concrete and close analysis about bike utilization characteristics should be arranged first. One result by MLTM(2009) is support this opinion; the bike mode share has been decreased whereas 9,170km of the bicycle path was improved(1995~2007). This study analyzed the bike mode share classified by trip types by using the 303,308 data of Household Travel Survey of Seoul Metropolitan Area, 2006. The highest mode share rate was induced by the institute attendee and Officetel resident as 3.75% and 3.13%, respectively. Also this study established the bike mode share estimation model of Seoul by logistic regression, and analyzed related factors and level of effectiveness related bike demand by calculation of odds ratio in terms of logistic regression coefficients. In conclusion, short trips, institutes district, parks, and Officetel residential area oriented policy should be effective on the revitalization of bike usage.

Reanalysis of 2002 Donation Frequency Data: Corrections and Supplements (2002년 기부횟수 자료의 재분석: 수정 및 보완)

  • Kim, Byung Soo;Lee, Juhyung;Kim, Inyoung;Park, Su-Bum;Park, Tae-Kyu
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.743-753
    • /
    • 2014
  • Kim et al. (2006) and Kim et al. (2009) reported a set of explanatory variables affecting donation frequency when they analyzed nationwide survey data on donations collected in 2002 by Volunteer 21, a nonprofit organization in Korea. The primary purpose of this paper is to correct computational errors found in Kim et al. (2006) and Kim et al. (2009), to rectify major results in the Tables and Figures and to supplement Kim et al. (2009) by providing new results. We add two logistic regressions to the ZIP and a mixture of two Poisson regressions of Kim et al. (2009). Through these two logistic regressions we could detect a set of explanatory variables affecting donation activity (0 or 1) and another set of explanatory variables, in which the volunteer (0, 1) variable is common, discriminating the infrequent donor group from the frequent donor group.

Development of a Logistic Regression Model for Analyzing Site Characteristics of Tombs Surrounding Expressway in Aerial Photographs (항공사진에 나타난 고속국도 주변 묘지의 입지 분석을 위한 로지스틱 회귀모형의 개발)

  • Han, Hee;Seol, A-Ra;Chung, JooSang
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.11 no.4
    • /
    • pp.193-202
    • /
    • 2008
  • The objectives of this study are to analyze the spatial site characteristics of existing tombs and the change in the pattern of spatial distributions of tombs over time. The spatial distributions of tombs located in Honam province along the Honam expressway were investigated by interpreting digital aerial photographs taken in two different points of time; 1990 and 2000. According to the results of the study, the tombs newly observed in 2000 photos were located closer to roads and villages than those found in the photos of 1990. This is a finding indicating that the accessibility of tombs has been more important consideration in determining the location of tomb sites. Also found were the gentle slopes of southern aspects to be favored as tomb sites. Based on the data sets of tombs locations and their topographic site characteristics, the probability function of tombs appearance in the study area was derived using the logistic regression analysis technique. As a result, tomb sites were classified as 74.7% by logistic regression. All of six input factors (elevation, slope, aspect, distance from the roads, the town and the stream, respectively) affected the probability of tombs appearance significantly.

  • PDF

A Study of Effect on the Smoking Status using Multilevel Logistic Model (다수준 로지스틱 모형을 이용한 흡연 여부에 미치는 영향 분석)

  • Lee, Ji Hye;Heo, Tae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.1
    • /
    • pp.89-102
    • /
    • 2014
  • In this study, we analyze the effect on the smoking status in the Seoul Metropolitan area using a multilevel logistic model with Community Health Survey data from the Korea Centers for Disease Control and Prevention. Intraclass correlation coefficient (ICC), profiling analysis and two types of predicted value were used to determine the appropriate multilevel analysis level. Sensitivity, specificity, percentage of correctly classified observations (PCC) and ROC curve evaluated model performance. We showed the applicability for multilevel analysis allowed for the possibility that different factors contribute to within group and between group variability using survey data.

의사결정나무를 이용한 개인휴대통신 해지자 분석

  • 최종후;서두성
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.377-380
    • /
    • 1998
  • 본 논문에서는 최근 데이터마이닝의 도구로 활발하게 소개되고 있는 의사결정나무 분석을 이용하여 개인휴대통신의 해지자 분석을 실시한다. 또한 로지스틱 회귀모형을 이용하여 가입고객의 해지 가능성에 대한 점수화를 시도한다.

  • PDF

A Study on the Number of Domestic Food Delivery Services (국내 배달음식 이용건수 분석 및 예측)

  • Kwon, Jaeyoung;Kim, Sinae;Park, Eungee;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.977-990
    • /
    • 2015
  • Food delivery services are well developed in the Republic of Korea, The increase of one person households and the success of app applications influence delivery services these days. We consider a prediction model for the food delivery service based on weather and dates to predict the number of food delivery services in 2014 using various data mining techniques. We use linear regression, random forest, gradient boosting, support vector machines, neural networks, and logistic regression to find the best prediction model. There are four categories of food delivery services and we consider two methods. For the first method, we estimate the total number of delivery services and the posterior probabilities of each delivery service. For the second method, we use different models for each category and combine them to estimate the total number of delivery services. The neural network and linear regression model perform best in the first method, this is followed by the neural network which is the best for the second method. The result shows that we can estimate the number of deliveries accurately based on dates and weather information.

The Effect of Overdesign on Titan Rocket Engine Reliability and Development Cost (과설계가 타이탄 로켓엔진의 신뢰도 및 개발비용에 미치는 영향)

  • Kim, Kyungmee O.;Hwang, Junwoo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.43 no.4
    • /
    • pp.334-340
    • /
    • 2015
  • Engine derating is often considered for reliability benefits because lower power operation reduces its failure probability. To be derated during operation, however, the engine must be initially overdesigned. The engine overdesign is cost effective only if reliability increased from derating is enough to offset the initial increase in the development cost caused from the overdesign. The purpose of this paper is to provide an analytical model to consider a trade-off between the engine overdesign and derating. We use a logistic regression model to explain reliability growth in the number of hot firing tests for a fixed power level. Using the Transcost model with the reliability growth model, we show that 10% overdesign of Titan rocket engine decreases its development cost by about 9% and 23% depending on the reliability requirement. We also point out that such a cost reduction depends on the fuel type a rocket uses.

Comparison analysis of big data integration models (빅데이터 통합모형 비교분석)

  • Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.755-768
    • /
    • 2017
  • As Big Data becomes the core of the fourth industrial revolution, big data-based processing and analysis capabilities are expected to influence the company's future competitiveness. Comparative studies of RHadoop and RHIPE that integrate R and Hadoop environment, have not been discussed by many researchers although RHadoop and RHIPE have been discussed separately. In this paper, we constructed big data platforms such as RHadoop and RHIPE applicable to large scale data and implemented the machine learning algorithms such as multiple regression and logistic regression based on MapReduce framework. We conducted a study on performance and scalability with those implementations for various sample sizes of actual data and simulated data. The experiments demonstrated that our RHadoop and RHIPE can scale well and efficiently process large data sets on commodity hardware. We showed RHIPE is faster than RHadoop in almost all the data generally.