• 제목/요약/키워드: Decision -making Tree

검색결과 196건 처리시간 0.026초

엔트로피 점수를 이용한 감성분석 분류알고리즘의 수행도 평가 (Evaluation of Classification Algorithm Performance of Sentiment Analysis Using Entropy Score)

  • 박만희
    • 한국정보통신학회논문지
    • /
    • 제22권9호
    • /
    • pp.1153-1158
    • /
    • 2018
  • 다양한 온라인 고객 평가 및 소셜 미디어 정보는 고객의 의사결정에 영향을 미치기 때문에 기업에게 매우 중요한 정보 출처라고 할 수 있다. 설문 조사를 통해 고객의 다양한 요구와 불만 사항을 파악하는 데는 많은 비용과 시간적인 제약이 발생하고 있다. 온라인 쇼핑몰의 고객 후기 데이터는 제품에 대한 고객들의 감성을 분석할 수 있는 이상적인 자료를 제공하고 있다. 본 연구에서는 삼성과 애플 스마폰에 대한 감성분석을 위해 아마존 쇼핑몰로부터 고객 리뷰 데이터를 수집하였다. 선행 연구에서 대표적인 감성분석 기법으로 사용된 5가지 분류 알고리즘을 적용하였다. 5가지 분류알고리즘은 support vector machines, bagging, random forest, classification or regression tree, maximum entropy 등이다. 본 연구에서는 분류 알고리즘의 수행도를 종합적으로 평가할 수 있는 entropy score를 제안하였다. Entropy score를 이용하여 5가지 알고리즘을 평가한 결과에 따르면 support vector machines 알고리즘의 entropy score가 가장 높은 것으로 분석되었다.

Union and Division using Technique in Fingerprint Recognition Identification System

  • Park, Byung-Jun;Park, Jong-Min;Lee, Jung-Oh
    • Journal of information and communication convergence engineering
    • /
    • 제5권2호
    • /
    • pp.140-143
    • /
    • 2007
  • Fingerprint Recognition System is made up of Off-line treatment and On-line treatment; the one is registering all the information of there trieving features which are retrieved in the digitalized fingerprint getting out of the analog fingerprint through the fingerprint acquisition device and the other is the treatment making the decision whether the users are approved to be accessed to the system or not with matching them with the fingerprint features which are retrieved and database from the input fingerprint when the users are approaching the system to use. In matching between On-line and Off-line treatment, the most important thing is which features we are going to use as the standard. Therefore, we have been using "Delta" and "Core" as this standard until now, but there might have been some deficits not to exist in every person when we set them up as the standards. In order to handle the users who do not have those features, we are still using the matching method which enables us to make up of the spanning tree or the triangulation with the relations of the spanned feature. However, there are some overheads of the time on these methods and it is not sure whether they make the correct matching or not. In this paper, introduces a new data structure, called Union and Division, representing binary fingerprint image. Minutiae detecting procedure using Union and Division takes, on the average, 32% of the consuming time taken by a minutiae detecting procedure without using Union and Division.

지역사회 주민의 심폐소생술 수행 자신감 예측요인 (Predicting Factors on Performance Confidence of Cardiopulmonary Resuscitation in Community Members)

  • 이수진
    • 디지털콘텐츠학회 논문지
    • /
    • 제19권9호
    • /
    • pp.1699-1705
    • /
    • 2018
  • 본 연구는 우리나라 지역주민의 심폐소생술 수행능력 자신감의 특성을 파악하기 위하여 지역사회건강조사 자료를 이차분석한 서술적 조사연구 이다. 연구 대상은 2014년, 2016년에 지역사회건강조사 전체 조사대상자 중 심폐소생술 인지군 357,176명이며 수집된 자료는 SPSS WIN 25.0 프로그램을 이용하여 복합표본 빈도분석과 의사결정나무 분석을 수행하였다. 본 연구의 결과 우리나라 지역주민의 심폐소생술 수행능력 자신감은 심폐소생술 교육 경험이 있을 경우, 최근 2년 동안 마네킹 실습 경험이 있을 경우, 심폐소생술 교육경험이 2년 이내일 경우, 남성, 41.5세 이하일 경우에 높은 것으로 나타났다.

고객의 소리(VOC) 데이터를 활용한 서비스 처리 시간 예측방법 (A Method of Predicting Service Time Based on Voice of Customer Data)

  • 김정훈;권오병
    • 한국IT서비스학회지
    • /
    • 제15권1호
    • /
    • pp.197-210
    • /
    • 2016
  • With the advent of text analytics, VOC (Voice of Customer) data become an important resource which provides the managers and marketing practitioners with consumer's veiled opinion and requirements. In other words, making relevant use of VOC data potentially improves the customer responsiveness and satisfaction, each of which eventually improves business performance. However, unstructured data set such as customers' complaints in VOC data have seldom used in marketing practices such as predicting service time as an index of service quality. Because the VOC data which contains unstructured data is too complicated form. Also that needs convert unstructured data from structure data which difficult process. Hence, this study aims to propose a prediction model to improve the estimation accuracy of the level of customer satisfaction by combining unstructured from textmining with structured data features in VOC. Also the relationship between the unstructured, structured data and service processing time through the regression analysis. Text mining techniques, sentiment analysis, keyword extraction, classification algorithms, decision tree and multiple regression are considered and compared. For the experiment, we used actual VOC data in a company.

의미적 토픽 기반 지식모델의 통합에 관한 연구 (A study on integration of semantic topic based Knowledge model)

  • 전승수;이상진;배상태
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2012년도 한국컴퓨터종합학술대회논문집 Vol.39 No.1(B)
    • /
    • pp.181-183
    • /
    • 2012
  • 최근 자연어 및 정형언어 처리, 인공지능 알고리즘 등을 활용한 효율적인 의미 기반 지식모델의 생성과 분석 방법이 제시되고 있다. 이러한 의미 기반 지식모델은 효율적 의사결정트리(Decision Making Tree)와 특정 상황에 대한 체계적인 문제해결(Problem Solving) 경로 분석에 활용된다. 특히 다양한 복잡계 및 사회 연계망 분석에 있어 정적 지표 생성과 회귀 분석, 행위적 모델을 통한 추이분석, 거시예측을 지원하는 모의실험(Simulation) 모형의 기반이 된다. 본 연구에서는 이러한 의미 기반 지식모델을 통합에 있어 텍스트 마이닝을 통해 도출된 토픽(Topic) 모델 간 통합 방법과 정형적 알고리즘을 제시한다. 이를 위해 먼저, 텍스트 마이닝을 통해 도출되는 키워드 맵을 동치적 지식맵으로 변환하고 이를 의미적 지식모델로 통합하는 방법을 설명한다. 또한 키워드 맵으로부터 유의미한 토픽 맵을 투영하는 방법과 의미적 동치 모델을 유도하는 알고리즘을 제안한다. 통합된 의미 기반 지식모델은 토픽 간의 구조적 규칙과 정도 중심성, 근접 중심성, 매개 중심성 등 관계적 의미분석이 가능하며 대규모 비정형 문서의 의미 분석과 활용에 실질적인 기반 연구가 될 수 있다.

Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

  • Yeom, Ha-Neul;Hwang, Myunggwon;Hwang, Mi-Nyeong;Jung, Hanmin
    • Journal of Information Science Theory and Practice
    • /
    • 제2권3호
    • /
    • pp.29-39
    • /
    • 2014
  • In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.

입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구 (The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction)

  • 박정수
    • 한국물환경학회지
    • /
    • 제37권5호
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

Estimation of various amounts of kaolinite on concrete alkali-silica reactions using different machine learning methods

  • Aflatoonian, Moein;Mirhosseini, Ramin Tabatabaei
    • Structural Engineering and Mechanics
    • /
    • 제83권1호
    • /
    • pp.79-92
    • /
    • 2022
  • In this paper, the impact of a vernacular pozzolanic kaolinite mine on concrete alkali-silica reaction and strength has been evaluated. For making the samples, kaolinite powder with various levels has been used in the quality specification test of aggregates based on the ASTM C1260 standard in order to investigate the effect of kaolinite particles on reducing the reaction of the mortar bars. The compressive strength, X-Ray Diffraction (XRD) and Scanning Electron Microscope (SEM) experiments have been performed on concrete specimens. The obtained results show that addition of kaolinite powder to concrete will cause a pozzolanic reaction and decrease the permeability of concrete samples comparing to the reference concrete specimen. Further, various machine learning methods have been used to predict ASR-induced expansion per different amounts of kaolinite. In the process of modeling methods, optimal method is considered to have the lowest mean square error (MSE) simultaneous to having the highest correlation coefficient (R). Therefore, to evaluate the efficiency of the proposed model, the results of the support vector machine (SVM) method were compared with the decision tree method, regression analysis and neural network algorithm. The results of comparison of forecasting tools showed that support vector machines have outperformed the results of other methods. Therefore, the support vector machine method can be mentioned as an effective approach to predict ASR-induced expansion.

기계학습을 이용한 염화물 확산계수 예측모델 개발 (Development of Prediction Model of Chloride Diffusion Coefficient using Machine Learning)

  • 김현수
    • 한국공간구조학회논문집
    • /
    • 제23권3호
    • /
    • pp.87-94
    • /
    • 2023
  • Chloride is one of the most common threats to reinforced concrete (RC) durability. Alkaline environment of concrete makes a passive layer on the surface of reinforcement bars that prevents the bar from corrosion. However, when the chloride concentration amount at the reinforcement bar reaches a certain level, deterioration of the passive protection layer occurs, causing corrosion and ultimately reducing the structure's safety and durability. Therefore, understanding the chloride diffusion and its prediction are important to evaluate the safety and durability of RC structure. In this study, the chloride diffusion coefficient is predicted by machine learning techniques. Various machine learning techniques such as multiple linear regression, decision tree, random forest, support vector machine, artificial neural networks, extreme gradient boosting annd k-nearest neighbor were used and accuracy of there models were compared. In order to evaluate the accuracy, root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE) and coefficient of determination (R2) were used as prediction performance indices. The k-fold cross-validation procedure was used to estimate the performance of machine learning models when making predictions on data not used during training. Grid search was applied to hyperparameter optimization. It has been shown from numerical simulation that ensemble learning methods such as random forest and extreme gradient boosting successfully predicted the chloride diffusion coefficient and artificial neural networks also provided accurate result.

Ensemble Deep Learning Model using Random Forest for Patient Shock Detection

  • Minsu Jeong;Namhwa Lee;Byuk Sung Ko;Inwhee Joe
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권4호
    • /
    • pp.1080-1099
    • /
    • 2023
  • Digital healthcare combined with telemedicine services in the form of convergence with digital technology and AI is developing rapidly. Digital healthcare research is being conducted on many conditions including shock. However, the causes of shock are diverse, and the treatment is very complicated, requiring a high level of medical knowledge. In this paper, we propose a shock detection method based on the correlation between shock and data extracted from hemodynamic monitoring equipment. From the various parameters expressed by this equipment, four parameters closely related to patient shock were used as the input data for a machine learning model in order to detect the shock. Using the four parameters as input data, that is, feature values, a random forest-based ensemble machine learning model was constructed. The value of the mean arterial pressure was used as the correct answer value, the so called label value, to detect the patient's shock state. The performance was then compared with the decision tree and logistic regression model using a confusion matrix. The average accuracy of the random forest model was 92.80%, which shows superior performance compared to other models. We look forward to our work playing a role in helping medical staff by making recommendations for the diagnosis and treatment of complex and difficult cases of shock.