• Title/Summary/Keyword: Decision Tree Technique

Search Result 208, Processing Time 0.03 seconds

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.

A Comparison of the Land Cover Data Sets over Asian Region: USGS, IGBP, and UMd (아시아 지역 지면피복자료 비교 연구: USGS, IGBP, 그리고 UMd)

  • Kang, Jeon-Ho;Suh, Myoung-Seok;Kwak, Chong-Heum
    • Atmosphere
    • /
    • v.17 no.2
    • /
    • pp.159-169
    • /
    • 2007
  • A comparison of the three land cover data sets (United States Geological Survey: USGS, International Geosphere Biosphere Programme: IGBP, and University of Maryland: UMd), derived from 1992-1993 Advanced Very High Resolution Radiometer(AVHRR) data sets, was performed over the Asian continent. Preprocesses such as the unification of map projection and land cover definition, were applied for the comparison of the three different land cover data sets. Overall, the agreement among the three land cover data sets was relatively high for the land covers which have a distinct phenology, such as urban, open shrubland, mixed forest, and bare ground (>45%). The ratios of triple agreement (TA), couple agreement (CA) and total disagreement (TD) among the three land cover data sets are 30.99%, 57.89% and 8.91%, respectively. The agreement ratio between USGS and IGBP is much greater (about 80%) than that (about 32%) between USGS and UMd (or IGBP and UMd). The main reasons for the relatively low agreement among the three land cover data sets are differences in 1) the number of land cover categories, 2) the basic input data sets used for the classification, 3) classification (or clustering) methodologies, and 4) level of preprocessing. The number of categories for the USGS, IGBP and UMd are 24, 17 and 14, respectively. USGS and IGBP used only the 12 monthly normalized difference vegetation index (NDVI), whereas UMd used the 12 monthly NDVI and other 29 auxiliary data derived from AVHRR 5 channels. USGS and IGBP used unsupervised clustering method, whereas UMd used the supervised technique, decision tree using the ground truth data derived from the high resolution Landsat data. The insufficient preprocessing in USGS and IGBP compared to the UMd resulted in the spatial discontinuity and misclassification.

Automated Scoring of Scientific Argumentation Using Expert Morpheme Classification Approaches (전문가의 형태소 분류를 활용한 과학 논증 자동 채점)

  • Lee, Manhyoung;Ryu, Suna
    • Journal of The Korean Association For Science Education
    • /
    • v.40 no.3
    • /
    • pp.321-336
    • /
    • 2020
  • We explore automated scoring models of scientific argumentation. We consider how a new analytical approach using a machine learning technique may enhance the understanding of spoken argumentation in the classroom. We sampled 2,605 utterances that occurred during a high school student's science class on molecular structure and classified the utterances into five argumentative elements. Next, we performed Text Preprocessing for the classified utterances. As machine learning techniques, we applied support vector machines, decision tree, random forest, and artificial neural network. For enhancing the identification of rebuttal elements, we used a heuristic feature-engineering method that applies experts' classification of morphemes of scientific argumentation.

Analyzing vocational outcomes of people with hearing impairments : A data mining approach (청각장애인의 취업결정요인 분석 연구 -데이터마이닝 기법(Exhaustive CHAID)의 적용)

  • Shin, Hyun-Uk
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.449-459
    • /
    • 2015
  • The purpose of this study was to examine demographic, human capital and service factors affecting employment outcomes of people with hearing impairments. The total of 422 individuals (age from 20 years to 65 years) with hearing impairments were collected from the Panel Survey of Employment for the Disabled from Korea Employment Agency for the Disabled. The dependent variable is employment outcomes. The predictor variables include a set of personal history, human capital and rehabilitation service variables. The chi-squared automatic interaction detector (CHAID) analysis revealed that the status of the national basic livelihood security played a determining role in predicting the employment of people with hearing impairments. Also, it was found that the three factors of the status on the national basic livelihood security, needed help about activities of dailey living, licenses & employment service factors created bigger synergy effect when they inter-complemented one another.

Power Consumption Forecasting Scheme for Educational Institutions Based on Analysis of Similar Time Series Data (유사 시계열 데이터 분석에 기반을 둔 교육기관의 전력 사용량 예측 기법)

  • Moon, Jihoon;Park, Jinwoong;Han, Sanghoon;Hwang, Eenjun
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.954-965
    • /
    • 2017
  • A stable power supply is very important for the maintenance and operation of the power infrastructure. Accurate power consumption prediction is therefore needed. In particular, a university campus is an institution with one of the highest power consumptions and tends to have a wide variation of electrical load depending on time and environment. For this reason, a model that can accurately predict power consumption is required for the effective operation of the power system. The disadvantage of the existing time series prediction technique is that the prediction performance is greatly degraded because the width of the prediction interval increases as the difference between the learning time and the prediction time increases. In this paper, we first classify power data with similar time series patterns considering the date, day of the week, holiday, and semester. Next, each ARIMA model is constructed based on the classified data set and a daily power consumption forecasting method of the university campus is proposed through the time series cross-validation of the predicted time. In order to evaluate the accuracy of the prediction, we confirmed the validity of the proposed method by applying performance indicators.

An In-depth Survey Analysis Applying Data Mining Techniques (데이터마이닝을 이용한 설문조사의 심층 분석)

  • Kim, Wan-Seop;Lee, Soo-Won
    • Journal of Engineering Education Research
    • /
    • v.9 no.4
    • /
    • pp.71-82
    • /
    • 2006
  • To accomplish the educational objectives of a department, a system for CQI(Continuous Quality Improvement) is necessary. Improving the educational system by survey analysis is one of the most important factors for accomplishing the educational objectives. In general, survey analysis is carried out by using statistical distribution on an attribute or correlation analysis between two attributes. However, these analysis schemes have a limitation that they cannot find relations among various attributes. In this paper, an in-depth survey analysis method applying data mining techniques is presented. Data mining is a technique for extracting interesting knowledges from a large set of data. Survey from undergraduate students in the School of Computing of Soongsil University is analyzed in this paper by using a data mining tool, called Clementine. Results of Clementine analysis show the relationship between 'grade', and other attributes hierarchically, and provide useful information that can be applied in student consulting and program improvement.

The Variation Factors of Severity-Adjusted Length of Stay in CABG (관상동맥우회술 시행환자의 중증도 보정 재원일수 변이에 관한 연구)

  • Kim, Sun-Ja;Kang, Sung-Hong;Kim, Won-Joong;Kim, Yoo-Mi
    • Journal of Korean Society for Quality Management
    • /
    • v.39 no.3
    • /
    • pp.391-399
    • /
    • 2011
  • Our study was carried out to analyze the variation factors of severity-adjusted length of stay(LOS) in coronary artery bypass graft(CABG). The subjects were 932 CABG inpatients of the Korean National Hospital Discharge In-depth Injury Survey from 2004 through 2008. The data were analyzed using $x^2$ test and the severity-adjusted model was developed using data mining technique. The results of the study were as follows: male(71.1%), older than 61 years of age(61.6%), more than 500 beds(92.8%) and admitting via ambulatory care(70.0%) appeared to have higher rate than otherwise. In-hospital mortality of CABG inpatients was 2.8%. In addition, 46.4% of the patients received their care in other residence. The angina pectoris(45.6%) was found to be the highest in principle diagnosis, followed by chronic ischemic heart disease(36.9%) and acute myocardial infarction(12.0%). We developed severity-adjusted LOS model using the variables such as gender, age and comorbidity. Comparison of adjusted values in predicted LOS revealed that there were significant variations in LOS by location of hospital, bed size, and whether patients received the care in their residences. The variations of LOS can be explained as the indirect indicator for quality variation of medical process. It is suggested that the severity-adjusted LOS model developed in this study should be utilized as a useful method for benchmarking in hospital and it is necessary that national standard clinical practice guideline should be developed.

Service Restoration In Distribution Networks Using Cyclic Best-First Search (순환적 최적우선탐색을 이용한 배전계통의 정전복구)

  • Choi, Sang-Yule
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.18 no.5
    • /
    • pp.162-168
    • /
    • 2004
  • Service restoration is an emergency control in distribution control centers to restore out-of-service area as soon as possible when a fault occurs in distribution networks. Therefore, it requires fast computation time and high quality solutions for load balancing. In this paper, a load balance index and heuristic guided best-first search are proposed for these problem. The proposed algorithm consists of two parts. One is to set up a decision tree to represent the various switching operations available. Another is to identify the most effective the set of switches using proposed search technique and a feeder load balance index. Test results on the KEPCO's 108 bus distribution system show that the performance is efficient and robust.

Identification of major risk factors association with respiratory diseases by data mining (데이터마이닝 모형을 활용한 호흡기질환의 주요인 선별)

  • Lee, Jea-Young;Kim, Hyun-Ji
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.373-384
    • /
    • 2014
  • Data mining is to clarify pattern or correlation of mass data of complicated structure and to predict the diverse outcomes. This technique is used in the fields of finance, telecommunication, circulation, medicine and so on. In this paper, we selected risk factors of respiratory diseases in the field of medicine. The data we used was divided into respiratory diseases group and health group from the Gyeongsangbuk-do database of Community Health Survey conducted in 2012. In order to select major risk factors, we applied data mining techniques such as neural network, logistic regression, Bayesian network, C5.0 and CART. We divided total data into training and testing data, and applied model which was designed by training data to testing data. By the comparison of prediction accuracy, CART was identified as best model. Depression, smoking and stress were proved as the major risk factors of respiratory disease.

A Contrast Enhancement Method using the Contrast Measure in the Laplacian Pyramid for Digital Mammogram (디지털 맘모그램을 위한 라플라시안 피라미드에서 대비 척도를 이용한 대비 향상 방법)

  • Jeon, Geum-Sang;Lee, Won-Chang;Kim, Sang-Hee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.2
    • /
    • pp.24-29
    • /
    • 2014
  • Digital mammography is the most common technique for the early detection of breast cancer. To diagnose the breast cancer in early stages and treat efficiently, many image enhancement methods have been developed. This paper presents a multi-scale contrast enhancement method in the Laplacian pyramid for the digital mammogram. The proposed method decomposes the image into the contrast measures by the Gaussian and Laplacian pyramid, and the pyramid coefficients of decomposed multi-resolution image are defined as the frequency limited local contrast measures by the ratio of high frequency components and low frequency components. The decomposed pyramid coefficients are modified by the contrast measure for enhancing the contrast, and the final enhanced image is obtained by the composition process of the pyramid using the modified coefficients. The proposed method is compared with other existing methods, and demonstrated to have quantitatively good performance in the contrast measure algorithm.