• Title/Summary/Keyword: 로지스틱회귀분석기법

Search Result 155, Processing Time 0.027 seconds

Purchase Prediction Model using the Support Vector Machine (Support Vector Machine을 이용한 고객구매예측모형)

  • Ahn, Hyun-Chul;Han, In-Goo;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.3
    • /
    • pp.69-81
    • /
    • 2005
  • As the competition in business becomes severe, companies are focusing their capacity on customer relationship management (CRM) for survival. One of the important issues in CRM is to build a purchase prediction model, which classifies customers into either purchasing or non-purchasing groups. Until now, various techniques for building purchase prediction models have been proposed. However, they have been criticized because their performances are generally low, or it requires much effort to build and maintain them. Thus, in this study, we propose the support vector machine (SVM) a tool for building a purchase prediction model. The SVM is known as the technique that not only produces accurate prediction results but also enables training with the small sample size. To validate the usefulness of SVM, we apply it and some of other comparative techniques to a real-world purchase prediction case. Experimental results show that SVM outperforms all the comparative models including logistic regression and artificial neural networks.

  • PDF

A Study on the Drug Classification Using Machine Learning Techniques (머신러닝 기법을 이용한 약물 분류 방법 연구)

  • Anmol Kumar Singh;Ayush Kumar;Adya Singh;Akashika Anshum;Pradeep Kumar Mallick
    • Advanced Industrial SCIence
    • /
    • v.3 no.2
    • /
    • pp.8-16
    • /
    • 2024
  • This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.

Study on Development of Classification Model and Implementation for Diagnosis System of Sasang Constitution (사상체질 분류모형 개발 및 진단시스템의 구현에 관한 연구)

  • Beum, Soo-Gyun;Jeon, Mi-Ran;Oh, Am-Suk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.08a
    • /
    • pp.155-159
    • /
    • 2008
  • In this thesis, in order to develop a new classification model of Sasang Constitutional medical types, which is helpful for improving the accuracy of diagnosis of medical types. various data-mining classification models such as discriminant analysis. decision trees analysis, neural networks analysis, logistics regression analysis, clustering analysis which are main classification methods were applied to the questionnaires of medical type classification. In this manner, a model which scientifically classifies constitutional medical types in the field of Sasang Constitutional Medicine, one of a traditional Korean medicine, has been developed. Also, the above-mentioned analysis models were systematically compared and analyzed. In this study, a classification of Sasang constitutional medical types was developed based on the discriminate analysis model and decision trees analysis model of which accuracy is relatively high, of which analysis procedure is easy to understand and to explain and which are easy to implement. Also, a diagnosis system of Sasang constitution was implemented applying the two analysis models.

  • PDF

A quantitative study on patterns of terrorist bombing incidents (계량분석을 통한 폭탄테러사건의 패턴분석)

  • Yun, Min-Woo
    • Korean Security Journal
    • /
    • no.36
    • /
    • pp.317-347
    • /
    • 2013
  • This study focuses on the characteristics of terrorist bombing incidents and causal factors on terrorist bombing incidents and number of casualty per incident in Afghanistan though statistical quantitative analysis. For doing so, the bombing data from GTD(Global Terrorism Database) of START program occurred from January 1st 2002 until December 31st 2011 was used. By using descriptive analysis, chi-square, and logistic regression analysis, characteristics of bombing incidents and causal factors on the frequency of incidents and the number of casualty were identified. According to the analysis results, a clear pattern was appeared in terrorist bombing incidents. This result suggests that terrorists rationally and strategically calculate bombing operations and therefore terrorist bombing incidents and number of casualty per incident are conditioned or affected by time, season, Pashtun tribal entity, production level of drugs, the characteristics of targets.

  • PDF

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

Ensemble Machine Learning Model Based YouTube Spam Comment Detection (앙상블 머신러닝 모델 기반 유튜브 스팸 댓글 탐지)

  • Jeong, Min Chul;Lee, Jihyeon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.5
    • /
    • pp.576-583
    • /
    • 2020
  • This paper proposes a technique to determine the spam comments on YouTube, which have recently seen tremendous growth. On YouTube, the spammers appeared to promote their channels or videos in popular videos or leave comments unrelated to the video, as it is possible to monetize through advertising. YouTube is running and operating its own spam blocking system, but still has failed to block them properly and efficiently. Therefore, we examined related studies on YouTube spam comment screening and conducted classification experiments with six different machine learning techniques (Decision tree, Logistic regression, Bernoulli Naive Bayes, Random Forest, Support vector machine with linear kernel, Support vector machine with Gaussian kernel) and ensemble model combining these techniques in the comment data from popular music videos - Psy, Katy Perry, LMFAO, Eminem and Shakira.

The Effects of Ecological Variables on Volunteering among Older Adults: The Applications of General Ecological Theory of Aging (노인자원봉사활동에 있어서 생태환경 변수의 효과: 노화의 일반생태학 이론을 적용하여)

  • Lee, Hyunkee
    • 한국노년학
    • /
    • v.32 no.3
    • /
    • pp.777-800
    • /
    • 2012
  • This paper aims to estimate the effects of environmental variables on volunteering among older persons and decide relationships between independent and dependent variables. The thesis conceptually points out that the integrated theory of resources too much emphasizes the important roles of human, social and cultural capital, but overlooks the influences of ecological environments in explaining volunteering among the older persons. And the thesis tries to apply the general ecological theory of aging to explaining volunteering of older people together with resource frameworks, and to estimate the effects of ecological environment variables on volunteerism for senior citizens. Using a micro data of 2009 National Social Survey by Statistics Korea, the paper screens out 10,268 subjects who are believed to socially retire and be above 55 years older. The multiple OLS regression and binomial logistic regression techniques are used to estimate the effects of ecological environments and resources on volunteering. The analysis results show that all of environmental and resource variables are related to volunteering at the level of p<.000. This means that environmental variables have independent effects on the volunteerism, controlling for resource variables. This results suggest that both theories have empirical evidences in explaining volunteerism in Korea. Also, at the end of paper, theoretical and policy implications for practices and future studies are discussed.

Scheduling System using CSP leer Effective Assignment of Repair Warrant Job (효율적인 A/S작업 배정을 위한 CSP기반의 스케줄링 시스템)

  • 심명수;조근식
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.11a
    • /
    • pp.247-256
    • /
    • 2000
  • 오늘날의 기업은 상품을 판매하는 것 뿐만 아니라 기업의 신용과 이미지를 위해 그 상품에 대한 사후처리(After Service) 업무에 많은 투자를 하고 있다. 이러한 양질의 사후서비스를 고객에게 공급하기 위해서는 많은 인력을 합리적으로 관리해야 하고 요청되는 고장수리 서비스 업무를 빠르게 해결하기 위해서는 업무를 인력들에게 합리적으로 배정을 하고 회사의 비용을 최소화하면서 정해진 시간에 요청된 작업을 처리하기 위해서는 인력들에게 작업을 배정하고 스케줄링하는 문제가 발생된다. 본 논문에서는 이러한 문제를 해결하기 위해 화학계기의 A/S 작업을 인력에게 합리적으로 배정하는 스케줄링 시스템에 관한 연구이다. 먼저 스케줄링 모델을 HP 사의 화학분석 및 시스템을 판매, 유지보수 해 주는 "영진과학(주)"회사의 작업 스케줄을 분석하여 필요한 도메인과 고객서비스전략과 인력관리전략에서 제약조건을 추출하였고 여기에 스케줄링 문제를 해결하기 위한 방법으로 제약만족문제(CSP) 해결기법인 도메인 여과기법을 적용하였다. 도메인 여과기법은 제약조건에 의해 변수가 갖는 도메인의 불필요한 부분을 여과하는 것으로 제약조건과 관련되어 있는 변수의 도메인이 축소되는 것이다. 또한, 스케줄링을 하는데에 있어서 비용적인 측면에서의 스케줄링방법과 고객 만족도에서의 스케줄링 방법을 비교하여 가장 이상적인 해를 찾는데 트래이드오프(Trade-off)를 이용하여 최적의 해를 구했으며 실험을 통해 인력에게 더욱 효율적으로 작업들을 배정 할 수 있었고 또한, 정해진 시간에 많은 작업을 처리 할 수 있었으며 작업을 처리하는데 있어 소요되는 비용을 감소하는 결과를 얻을 수 있었다. 검증하였다.를, 지지도(support), 신뢰도(confidence), 리프트(lift), 컨빅션(conviction)등의 관계를 통해 다양한 방법으로 모색해본다. 이 연구에서 제안하는 이러한 개념계층상의 흥미로운 부분의 탐색은, 전자 상거래에서의 CRM(Customer Relationship Management)나 틈새시장(niche market) 마케팅 등에 적용가능하리라 여겨진다.선의 효과가 나타났다. 표본기업들을 훈련과 시험용으로 구분하여 분석한 결과는 전체적으로 재무/비재무적 지표를 고려한 인공신경망기법의 예측적중률이 높은 것으로 나타났다. 즉, 로지스틱회귀 분석의 재무적 지표모형은 훈련, 시험용이 84.45%, 85.10%인 반면, 재무/비재무적 지표모형은 84.45%, 85.08%로서 거의 동일한 예측적중률을 가졌으나 인공신경망기법 분석에서는 재무적 지표모형이 92.23%, 85.10%인 반면, 재무/비재무적 지표모형에서는 91.12%, 88.06%로서 향상된 예측적중률을 나타내었다.ting LMS according to increasing the step-size parameter $\mu$ in the experimentally computed. learning curve. Also we find that convergence speed of proposed algorithm is increased by (B+1) time proportional to B which B is the number of recycled data buffer without complexity

  • PDF

A Study on the Impact of Researcher's Commercialization Support on the Success of Commercialization after Technology Transfer: A Case of ETRI (기술이전 이후 연구자의 사업화 지원이 사업화 성공에 미치는 영향: ETRI의 사례)

  • Lee, Sangmin;Cho, Keuntae
    • Journal of Technology Innovation
    • /
    • v.25 no.2
    • /
    • pp.35-55
    • /
    • 2017
  • The objective of the study is to examine the impact of researcher's commercialization support on the success of commercialization after technology transfer of public research institutes. Logistic regression analysis was used to analyze the effect of researcher's commercialization support on commercialization success with 204 license agreements from ETRI. As a result, it has been empirically verified that there is a significant impact to the success of the commercialization of transferred technology when the researcher is participating in the commercialization process and transfers the latent knowledge gained in the invention process to the licensed company. The result is suggestive of the effective directions of government's and public research institutes' policy for SME support.

Exploring the Sentiment Analysis of Electric Vehicles Social Media Data by Using Feature Selection Methods (속성선택방법을 이용한 전기자동차 소셜미디어 데이터의 감성분석 연구)

  • Costello, Francis Joseph;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.18 no.2
    • /
    • pp.249-259
    • /
    • 2020
  • This study presents a recently obtained social media data set based upon the case study of Electric Vehicles (EV) and looks to implement a sentiment analysis (SA) in order to gain insights. This study uses two methods in order to fully analyze the public's sentiment on EVs. First, we implement a SA tool in which we used to extract the sentiment of comments. Next we labeled the data with these sentiments obtained and classified them. While performing classification we found the problem of dimensionality and also explored the use of feature selection (FS) models in order to reduce the data set's dimensionality. We found that the use of three FS models (Chi Squared, Information Gain and ReliefF) showed the most promising results when used alongside a logistic and support vector machines classification algorithm. the contributions of this paper are in providing an real-world example of social media text analytics which can be adopted in many other areas of research and business. Moving forward researchers can use the methodological approach in this paper to further refine and improve their own case uses in text analytics.