• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.041 seconds

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Development of Needs Extraction Algorithm Fitting for Individuals in Care Management for the Elderly in Home (재가노인 사례관리의 욕구사정 정확도 향상을 위한 욕구추출 알고리즘 개발 - 데이터 마이닝 분석기법을 활용하여 -)

  • Kim, Young-Sook;Jung, Kook-In;Park, So-Rah
    • Korean Journal of Social Welfare
    • /
    • v.60 no.1
    • /
    • pp.187-209
    • /
    • 2008
  • The authors developed 28 needs assessment tools for integrated assessment centered on needs, which is the core element in care management for the elderly in home. Also, the authors collected the assessment data of 676 elderly persons in home from 120 centers under the Korea Association of Senior Welfare Centers by using the needs assessment tools, and finally developed needs extraction algorithm through decision tree analysis in data mining to identify their actual needs and provide social welfare service suitable for such needs. The needs extraction algorithm for 28 needs of the elderly in home are summarized in

    . The Need No. 8 "Having need of help in going out" of the decision-making model, for example, was divided into 80.3% of asking for help and 11.4% not asking for help with Appeal No. 23 as a major variable. The need increased by 87.9% when the elderly appealed for help to go out and they had a caregiver but decreased by 47.4% when they had no caregiver. When the elderly asked for help in going out, they had a caregiver, and they needed complete help in cleaning, their need of help in going out was shown as 94.2%. However, seen from their answer that they needed complete help in bathing of ADL even if they did not ask for help in going out, it was found that the need of help in going out sharply increased from 11.4% to 80.0%. On the other hand, when they needed partial help or self-supported in bathing, the potential for them to be classified as asking for help in going out was shown to be low as 7.7%. In the said decision-making model, the number of cases for parent node and child node was designated as 50 and 25, respectively, with level 5 of the maximum tree depth as stopping rule. By this, it was shown that their decision-making was found to be effective as 182.13% for the need "Having need of help in going out". The algorithm presented in this study can be useful as systematic and scientific fundamental data in assessment of needs of the elderly in home.

  • PDF
  • Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

    • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
      • Journal of Intelligence and Information Systems
      • /
      • v.24 no.3
      • /
      • pp.21-44
      • /
      • 2018
    • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

    The Analysis of Factors which Affect Business Survey Index Using Regression Trees (회귀나무를 이용한 기업경기실사지수의 영향요인 분석)

    • Chang, Young-Jae
      • The Korean Journal of Applied Statistics
      • /
      • v.23 no.1
      • /
      • pp.63-71
      • /
      • 2010
    • Business entrepreneurs reflect their views of domestic and foreign economic activities on their operation for the growth of their business. The decision, forecasting, and planning based on their economic sentiment affect business operation such as production, investment, and hiring and consequently affect condition of national economy. Business survey index(BSI) is compiled to get the information of business entrepreneurs' economic sentiment for the analysis of business condition. BSI has been used as an important variable in the short-term forecasting models for business cycle analysis, especially during the the period of extreme business fluctuations. Recent financial crisis has arised extreme business fluctuations similar to those caused by currency crisis at the end of 1997, and brought back the importance of BSI as a variable for the economic forecasting. In this paper, the meaning of BSI as an economic sentiment index is reviewed and a GUIDE regression tree is constructed to find out the factors which affect on BSI. The result shows that the variables related to the stability of financial market such as kospi index(Korea composite stock price index) and exchange rate as well as manufacturing operation ratio and consumer goods sales are main factors which affect business entrepreneurs' economic sentiment.

    Comparative analysis of Machine-Learning Based Models for Metal Surface Defect Detection (머신러닝 기반 금속외관 결함 검출 비교 분석)

    • Lee, Se-Hun;Kang, Seong-Hwan;Shin, Yo-Seob;Choi, Oh-Kyu;Kim, Sijong;Kang, Jae-Mo
      • Journal of the Korea Institute of Information and Communication Engineering
      • /
      • v.26 no.6
      • /
      • pp.834-841
      • /
      • 2022
    • Recently, applying artificial intelligence technologies in various fields of production has drawn an upsurge of research interest due to the increase for smart factory and artificial intelligence technologies. A great deal of effort is being made to introduce artificial intelligence algorithms into the defect detection task. Particularly, detection of defects on the surface of metal has a higher level of research interest compared to other materials (wood, plastics, fibers, etc.). In this paper, we compare and analyze the speed and performance of defect classification by combining machine learning techniques (Support Vector Machine, Softmax Regression, Decision Tree) with dimensionality reduction algorithms (Principal Component Analysis, AutoEncoders) and two convolutional neural networks (proposed method, ResNet). To validate and compare the performance and speed of the algorithms, we have adopted two datasets ((i) public dataset, (ii) actual dataset), and on the basis of the results, the most efficient algorithm is determined.

    Impact of Diverse Document-evaluation Measure-based Searching Methods in Big Data Search Accuracy (빅데이터 검색 정확도에 미치는 다양한 측정 방법 기반 검색 기법의 효과)

    • Kim, Ji young;Han, DaHyeon;Kim, Jongkwon
      • Journal of KIISE
      • /
      • v.44 no.5
      • /
      • pp.553-558
      • /
      • 2017
    • With the rapid growth of Big Data, research on extracting meaningful information is being pursued by both academia and industry. Especially, data characteristics derived from analysis, and researcher intention are key factors for search algorithms to obtain accurate output. Therefore, reflecting both data characteristics and researcher intention properly is the final goal of data analysis research. The data analyzed properly can help users to increase loyalty to the service provided by company, and to utilize information more effectively and efficiently. In this paper, we explore various methods of document-evaluation, so that we can improve the accuracy of searching article one of the most frequently searches used in real life. We also analyze the experiment result, and suggest the proper manners to use various methods.

    Medical Services Specialization strategies of the Regional Public Hospital through Customer Segmentation (고객세분화를 통한 지방의료원의 의료서비스 전문화 전략)

    • Lee, Jin-Woo
      • Journal of the Korea Academia-Industrial cooperation Society
      • /
      • v.16 no.7
      • /
      • pp.4641-4650
      • /
      • 2015
    • This study aims to further strengthen the medical expertise to offer specialized medical care specialization strategies to gain a competitive edge through the customer segmentation of the Regional Public Hospital. Investigation period was selected to study the inpatients 26,658 people January to December 2013. The method of analysis are Cluster analysis and Decision Tree Analysis. In conclusion, female, age over 60, and diseases in musculoskeletal system and connective tissue were commonly selected as identifiers of the target market of Regional Public Hospital. Customers in this target market are loyal to specialized medical service and keeping continuous relationship with these customers through communication and monitoring of results of provided medical service would be important because the effect of word of mouth propagated to other group of customers having equivalent scale of consumption is expected. And the concentration of the scope of medical service of Regional Public Hospital and the collaboration and mutual reliance of medical service under the strategic alliance with other institutions and private hospitals are also needed.

    ITS : Intelligent Tissue Mineral Analysis Medical Information System (ITS : 지능적 Tissue Mineral Analysis 의료 정보 시스템)

    • Cho, Young-Im
      • Journal of the Korean Institute of Intelligent Systems
      • /
      • v.15 no.2
      • /
      • pp.257-263
      • /
      • 2005
    • There are some problems in TMA. There are no databases in Korea which can be independently and specially analyzed the TMA results. Even there are some medical databases, some of them are low level databases which are related to TMA, so they can not serve medical services to patients as well as doctors. Moreover, TMA results are based on the database of american health and mineral standards, it is possibly mislead oriental, especially korean, mineral standards. The purposes of this paper is to develope the first Intelligent TMA Information System(ITS) which makes clear the problems mentioned earlier ITS can analyze TMA data with multiple stage decision tree classifier. It is also constructed with multiple fuzzy rule base and hence analyze the complex data from Korean database by fuzzy inference methods.

    Determinants of student course evaluation using hierarchical linear model (위계적 선형모형을 이용한 강의평가 결정요인 분석)

    • Cho, Jang Sik
      • Journal of the Korean Data and Information Science Society
      • /
      • v.24 no.6
      • /
      • pp.1285-1296
      • /
      • 2013
    • The fundamental concerns of this paper are to analyze the effects of student course evaluation using subject characteristic and student characteristic variables. We use a 2-level hierarchical linear model since the data structure of subject characteristic and student characteristic variables is multilevel. Four models we consider are as follows; (1) null model, (2) random coefficient model, (3) mean as outcomes model, (4) intercepts and slopes as outcomes model. The results of the analysis were given as follows. First, the result of null model was that subject characteristics effects on course evaluation had much larger than student characteristics. Second, the result of conditional model specifying subject and student level predictors revealed that class size, grade, tenure, mean GPA of the class, native class for level-1, and sex, department category, admission method, mean GPA of the student for level-2 had statistically significant effects on course evaluation. The explained variance was 13% in subject level, 13% in student level.

    A Study on the Development of Web-based Expert System for Urban Transit (웹 기반의 도시철도 전문가시스템 개발에 관한 연구)

    • Kim Hyunjun;Bae Chulho;Kim Sungbin;Lee Hoyong;Kim Moonhyun;Suh Myungwon
      • Transactions of the Korean Society of Automotive Engineers
      • /
      • v.13 no.5
      • /
      • pp.163-170
      • /
      • 2005
    • Urban transit is a complex system that is combined electrically and mechanically, it is necessary to construct maintenance system for securing safety accompanying high-speed driving and maintaining promptly. Expert system is a computer program which uses numerical or non-numerical domain-specific knowledge to solve problems. In this research, we intend to develop the expert system which diagnose failure causes quickly and display measures. For the development of expert system, standardization of failure code classification system and creation of BOM(Bill Of Materials) have been first performed. Through the analysis of failure history and maintenance manuals, knowledge base has been constructed. Also, for retrieving the procedure of failure diagnosis and repair linking with the knowledge base, we have built RBR(Rule Based Reasoning) engine by pattern matching technique and CBR(Case Based Reasoning) engine by similarity search method. This system has been developed based on web to maximize the accessibility.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.