• 제목/요약/키워드: logistic classification

검색결과 376건 처리시간 0.029초

Fibromyalgia diagnostic model derived from combination of American College of Rheumatology 1990 and 2011 criteria

  • Ghavidel-Parsa, Banafsheh;Bidari, Ali;Hajiabbasi, Asghar;Shenavar, Irandokht;Ghalehbaghi, Babak;Sanaei, Omid
    • The Korean Journal of Pain
    • /
    • 제32권2호
    • /
    • pp.120-128
    • /
    • 2019
  • Background: We aimed to explore the American College of Rheumatology (ACR) 1990 and 2011 fibromyalgia (FM) classification criteria's items and the components of Fibromyalgia Impact Questionnaire (FIQ) to identify features best discriminating FM features. Finally, we developed a combined FM diagnostic (C-FM) model using the FM's key features. Methods: The means and frequency on tender points (TPs), ACR 2011 components and FIQ items were calculated in the FM and non-FM (osteoarthritis [OA] and non-OA) patients. Then, two-step multiple logistic regression analysis was performed to order these variables according to their maximal statistical contribution in predicting group membership. Partial correlations assessed their unique contribution, and two-group discriminant analysis provided a classification table. Using receiver operator characteristic analyses, we determined the sensitivity and specificity of the final model. Results: A total of 172 patients with FM, 75 with OA and 21 with periarthritis or regional pain syndromes were enrolled. Two steps multiple logistic regression analysis identified 8 key features of FM which accounted for 64.8% of variance associated with FM group membership: lateral epicondyle TP with variance percentages (36.9%), neck pain (14.5%), fatigue (4.7%), insomnia (3%), upper back pain (2.2%), shoulder pain (1.5%), gluteal TP (1.2%), and FIQ fatigue (0.9%). The C-FM model demonstrated a 91.4% correct classification rate, 91.9% for sensitivity and 91.7% for specificity. Conclusions: The C-FM model can accurately detect FM patients among other pain disorders. Re-inclusion of TPs along with saving of FM main symptoms in the C-FM model is a unique feature of this model.

Doc2Vec 모형에 기반한 자기소개서 분류 모형 구축 및 실험 (Self Introduction Essay Classification Using Doc2Vec for Efficient Job Matching)

  • 김영수;문현실;김재경
    • 한국IT서비스학회지
    • /
    • 제19권1호
    • /
    • pp.103-112
    • /
    • 2020
  • Job seekers are making various efforts to find a good company and companies attempt to recruit good people. Job search activities through self-introduction essay are nowadays one of the most active processes. Companies spend time and cost to reviewing all of the numerous self-introduction essays of job seekers. Job seekers are also worried about the possibility of acceptance of their self-introduction essays by companies. This research builds a classification model and conducted an experiments to classify self-introduction essays into pass or fail using deep learning and decision tree techniques. Real world data were classified using stratified sampling to alleviate the data imbalance problem between passed self-introduction essays and failed essays. Documents were embedded using Doc2Vec method developed from existing Word2Vec, and they were classified using logistic regression analysis. The decision tree model was chosen as a benchmark model, and K-fold cross-validation was conducted for the performance evaluation. As a result of several experiments, the area under curve (AUC) value of PV-DM results better than that of other models of Doc2Vec, i.e., PV-DBOW and Concatenate. Furthmore PV-DM classifies passed essays as well as failed essays, while PV_DBOW can not classify passed essays even though it classifies well failed essays. In addition, the classification performance of the logistic regression model embedded using the PV-DM model is better than the decision tree-based classification model. The implication of the experimental results is that company can reduce the cost of recruiting good d job seekers. In addition, our suggested model can help job candidates for pre-evaluating their self-introduction essays.

데이터 마이닝을 이용한 입원 암 환자 간호 중증도 예측모델 구축 (An Analysis of Nursing Needs for Hospitalized Cancer Patients;Using Data Mining Techniques)

  • 박선아
    • 종양간호연구
    • /
    • 제5권1호
    • /
    • pp.3-10
    • /
    • 2005
  • Back ground: Nurses now occupy one third of all hospital human resources. Therefore, efficient management of nursing manpower is getting more important. While it is very clear that nursing workload requirement analysis and patient severity classification should be done first for the efficient allocation of nursing workforce, these processes have been conducted manually with ad hoc rule. Purposes: This study was tried to make a predict model for patient classification according to nursing need. We tried to find the easier and faster method to classify nursing patients that can help efficient management of nursing manpower. Methods: The nursing patient classifications data of the hospitalized cancer patients in one of the biggest cancer center in Korea during 2003.1.1-2003.12.31 were assessed by trained nurses. This study developed a prediction model and analyzing nursing needs by data mining techniques. Patients were classified by three different data mining techniques, (Logistic regression, Decision tree and Neural network) and the results were assessed. Results: The data set was created using 165,073 records of 2,228 patients classification database. Main explaining variables were as follows in 3 different data mining techniques. 1) Logistic regression : age, month and section. 2) Decision tree : section, month, age and tumor. 3) Neural network : section, diagnosis, age, sex, metastasis, hospital days and month. Among these three techniques, neural network showed the best prediction power in ROC curve verification. As the result of the patient classification prediction model developed by neural network based on nurse needs, the prediction accuracy was 84.06%. Conclusion: The patient classification prediction model was developed and tested in this study using real patients data. The result can be employed for more accurate calculation of required nursing staff and effective use of labor force.

  • PDF

음성신호를 이용한 기계학습 기반 피로도 분류 모델 (Fatigue Classification Model Based On Machine Learning Using Speech Signals)

  • 이수화;권철홍
    • 문화기술의 융합
    • /
    • 제8권6호
    • /
    • pp.741-747
    • /
    • 2022
  • 피로는 개인의 능력을 저하되게 하여 업무 수행을 어렵게 하며, 피로가 누적되면 집중력이 저하되어 안전사고를 초래할 가능성이 증가하게 된다. 피로에 대한 자각은 주관적이나, 실제 현장에서는 피로의 수준을 정량적으로 측정할 필요가 있다. 기존 연구에서 피로 수준은 다원적 피로 척도와 같은 주관적 평가에, 생체신호 분석 등의 객관적지표를 추가하여 전문가의 판단으로 측정하는 방식이 제안되었으나, 이러한 방법은 일상생활에서 실시간으로 피로도를 평가하기 어렵다. 본 논문은 현장에서 녹음한 음성 데이터를 이용하여 실시간으로 작업자의 피로 수준을 판정하는 피로도 분류 모델에 관한 연구이다. 현장에서 수집한 음성 데이터를 이용하여 로지스틱 분류, 서포트 벡터 머신, 랜덤 포레스트 등의 기계학습 모델을 학습시킨다. 성능을 평가한 결과, 정확도가 0.677 ~ 0.758로 우수한 성능을 보여주었고, 이 중에서 로지스틱 분류가 가장 우수한 성능을 나타냈다. 실험 결과로부터 음성신호를 이용하여 피로도를 분류하는 것이 가능하다는 것을 알 수 있다.

도산예측을 위한 유전 알고리듬 기반 이진분류기법의 개발 (A GA-based Binary Classification Method for Bankruptcy Prediction)

  • 민재형;정철우
    • 한국경영과학회지
    • /
    • 제33권2호
    • /
    • pp.1-16
    • /
    • 2008
  • The purpose of this paper is to propose a new binary classification method for predicting corporate failure based on genetic algorithm, and to validate its prediction power through empirical analysis. Establishing virtual companies representing bankrupt companies and non-bankrupt ones respectively, the proposed method measures the similarity between the virtual companies and the subject for prediction, and classifies the subject into either bankrupt or non-bankrupt one. The values of the classification variables of the virtual companies and the weights of the variables are determined by the proper model to maximize the hit ratio of training data set using genetic algorithm. In order to test the validity of the proposed method, we compare its prediction accuracy with ones of other existing methods such as multi-discriminant analysis, logistic regression, decision tree, and artificial neural network, and it is shown that the binary classification method we propose in this paper can serve as a premising alternative to the existing methods for bankruptcy prediction.

Classification of COVID-19 Disease: A Machine Learning Perspective

  • Kinza Sardar
    • International Journal of Computer Science & Network Security
    • /
    • 제24권3호
    • /
    • pp.107-112
    • /
    • 2024
  • Nowadays the deadly virus famous as COVID-19 spread all over the world starts from the Wuhan China in 2019. This disease COVID-19 Virus effect millions of people in very short time. There are so many symptoms of COVID19 perhaps the Identification of a person infected with COVID-19 virus is really a difficult task. Moreover it's a challenging task to identify whether a person or individual have covid test positive or negative. We are developing a framework in which we used machine learning techniques..The proposed method uses DecisionTree, KNearestNeighbors, GaussianNB, LogisticRegression, BernoulliNB , RandomForest , Machine Learning methods as the classifier for diagnosis of covid ,however, 5-fold and 10-fold cross-validations were applied through the classification process. The experimental results showed that the best accuracy obtained from Decision Tree classifiers. The data preprocessing techniques have been applied for improving the classification performance. Recall, accuracy, precision, and F-score metrics were used to evaluate the classification performance. In future we will improve model accuracy more than we achieved now that is 93 percent by applying different techniques

속성값 기반의 정규화된 로지스틱 회귀분석 모델 (Value Weighted Regularized Logistic Regression Model)

  • 이창환;정미나
    • 정보과학회 논문지
    • /
    • 제43권11호
    • /
    • pp.1270-1274
    • /
    • 2016
  • 로지스틱 회귀분석은 통계학 등의 분야에서 예측을 위한 기술 혹은 변수 간의 상관관계를 설명하기 위하여 오랫동안 사용되어 왔다. 이러한 로지스틱 회귀분석 방법에서 현재 각 속성들은 목적 값에 대하여 동일한 중요도를 가지고 있다. 본 연구에서는 이러한 가중치 계산을 좀더 세분화하여 각 속성의 값이 서로 다른 중요도를 가지는 새로운 학습 방법을 제시한다. 알고리즘의 성능을 최대화하는 각 속성값 가중치의 값을 계산하기 위하여 점진적 하강법을 이용하여 개발하였다. 본 연구에서 제안된 방법은 다양한 데이터를 이용하여 실험하였고 속성값 기반 로지스틱 회귀분석 방법은 기존의 로지스틱 회귀분석보다 우수한 학습 능력을 보임을 알 수 있었다.

로지스틱 회귀모형을 이용한 유족연금 수급 분석 (Analysis on the Survivor's Pension Payment with Logistic Regression Model)

  • 김미정;김진형
    • 응용통계연구
    • /
    • 제21권2호
    • /
    • pp.183-200
    • /
    • 2008
  • 국민연금의 효율적인 운영을 위하여 고령화, 저출산과 같은 사회현상에 대비한 연금 관리를 위한 연구가 요구되고 있다. 본 연구는 유족연금의 발생을 예측하고 유족연금의 발생가능성 정도에 따라 대상자들을 분류하기 위한 통계적 모델을 제안하기 위하여 두 단계의 로지스틱 분석을 실시하였다. 첫 단계의 분석으로부터, 전체 대상자에 대하여 유족연금의 발생에 영향을 주는 주요인의 특성과 국민연금의 종류를 파악하고 이를 대상으로 유족연금의 발생에 대한 로지스틱 회귀모형을 적용하되 대상자를 합리적으로 등급화하기 위한 모델을 제안하고 이를 일반적인 로지스틱모델과 비교하였다. 정확도, 민감도, 특이도와 사후 확률의 분포를 비교하고 K-S통계량을 통하여 등급의 타당성 평가와 리프트 그래프를 통한 모델의 예측력평가를 함으로써 합리적 등급분류를 통한 대상자관리가 가능한 통계적 모델임을 보였다. 예측된 통계적 모델을 적용하여 유족연금 수급유무와 등급별 분류, 등급에 따른 유족연금액 예측을 통하여 효율적인 연금관리 방안을 제안할 수 있다.

소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법 (Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts)

  • 김소현;김한준
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제6권5호
    • /
    • pp.279-284
    • /
    • 2017
  • 빅데이터 시대를 맞이하여 텍스트마이닝과 오피니언마이닝의 활용도가 커지고 있는 시점에서 소셜 네트워크 서비스로부터 유용한 정보를 추출하는 작업은 매우 중요한 연구 주제 중 하나이다. 이에 본 논문은 블로그 HTML 문서에서 주요 본문을 찾는 로지스틱 회귀 앙상블 기법을 제안한다. 먼저, 블로그 HTML 태그에서 구조적 특징, 텍스트 특징을 추출한다. 그 다음, 블로그 HTML 문서에서 추출한 태그 특징에 로지스틱 회귀 및 앙상블 기법을 적용하여 본문을 포함하는 태그를 분류하는 모델을 구성한다. 본 연구의 중요한 발견 중 하나는 태그의 깊이 특징을 이용하여 주요 본문을 찾을 수 있다는 점이다. 다양한 주제의 국내 블로그 데이터를 이용한 실험에서 태그 분류 정확도가 99%, 본문을 찾아낸 문서의 비율이 80.5%로 평가되었다.

A Comparative Study of Predictive Factors for Passing the National Physical Therapy Examination using Logistic Regression Analysis and Decision Tree Analysis

  • Kim, So Hyun;Cho, Sung Hyoun
    • Physical Therapy Rehabilitation Science
    • /
    • 제11권3호
    • /
    • pp.285-295
    • /
    • 2022
  • Objective: The purpose of this study is to use logistic regression and decision tree analysis to identify the factors that affect the success or failurein the national physical therapy examination; and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 76,727 subjects from the physical therapy national examination data provided by the Korea Health Personnel Licensing Examination Institute. The target variable was pass or fail, and the input variables were gender, age, graduation status, and examination area. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In the logistic regression analysis, subjects in their 20s (Odds ratio, OR=1, reference), expected to graduate (OR=13.616, p<0.001) and from the examination area of Jeju-do (OR=3.135, p<0.001), had a high probability of passing. In the decision tree, the predictive factors for passing result had the greatest influence in the order of graduation status (x2=12366.843, p<0.001) and examination area (x2=312.446, p<0.001). Logistic regression analysis showed a specificity of 39.6% and sensitivity of 95.5%; while decision tree analysis showed a specificity of 45.8% and sensitivity of 94.7%. In classification accuracy, logistic regression and decision tree analysis showed 87.6% and 88.0% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. Additionally, whether actual test takers passed the national physical therapy examination could be determined, by applying the constructed prediction model and prediction rate.