• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.028 seconds

A Method of Predicting Service Time Based on Voice of Customer Data (고객의 소리(VOC) 데이터를 활용한 서비스 처리 시간 예측방법)

  • Kim, Jeonghun;Kwon, Ohbyung
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.197-210
    • /
    • 2016
  • With the advent of text analytics, VOC (Voice of Customer) data become an important resource which provides the managers and marketing practitioners with consumer's veiled opinion and requirements. In other words, making relevant use of VOC data potentially improves the customer responsiveness and satisfaction, each of which eventually improves business performance. However, unstructured data set such as customers' complaints in VOC data have seldom used in marketing practices such as predicting service time as an index of service quality. Because the VOC data which contains unstructured data is too complicated form. Also that needs convert unstructured data from structure data which difficult process. Hence, this study aims to propose a prediction model to improve the estimation accuracy of the level of customer satisfaction by combining unstructured from textmining with structured data features in VOC. Also the relationship between the unstructured, structured data and service processing time through the regression analysis. Text mining techniques, sentiment analysis, keyword extraction, classification algorithms, decision tree and multiple regression are considered and compared. For the experiment, we used actual VOC data in a company.

Self-Supervised Document Representation Method

  • Yun, Yeoil;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.187-197
    • /
    • 2020
  • Recently, various methods of text embedding using deep learning algorithms have been proposed. Especially, the way of using pre-trained language model which uses tremendous amount of text data in training is mainly applied for embedding new text data. However, traditional pre-trained language model has some limitations that it is hard to understand unique context of new text data when the text has too many tokens. In this paper, we propose self-supervised learning-based fine tuning method for pre-trained language model to infer vectors of long-text. Also, we applied our method to news articles and classified them into categories and compared classification accuracy with traditional models. As a result, it was confirmed that the vector generated by the proposed model more accurately expresses the inherent characteristics of the document than the vectors generated by the traditional models.

An Exploratory Analysis of Online Discussion of Library and Information Science Professionals in India using Text Mining

  • Garg, Mohit;Kanjilal, Uma
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.3
    • /
    • pp.40-56
    • /
    • 2022
  • This paper aims to implement a topic modeling technique for extracting the topics of online discussions among library professionals in India. Topic modeling is the established text mining technique popularly used for modeling text data from Twitter, Facebook, Yelp, and other social media platforms. The present study modeled the online discussions of Library and Information Science (LIS) professionals posted on Lis Links. The text data of these posts was extracted using a program written in R using the package "rvest." The data was pre-processed to remove blank posts, posts having text in non-English fonts, punctuation, URLs, emails, etc. Topic modeling with the Latent Dirichlet Allocation algorithm was applied to the pre-processed corpus to identify each topic associated with the posts. The frequency analysis of the occurrence of words in the text corpus was calculated. The results found that the most frequent words included: library, information, university, librarian, book, professional, science, research, paper, question, answer, and management. This shows that the LIS professionals actively discussed exams, research, and library operations on the forum of Lis Links. The study categorized the online discussions on Lis Links into ten topics, i.e. "LIS Recruitment," "LIS Issues," "Other Discussion," "LIS Education," "LIS Research," "LIS Exams," "General Information related to Library," "LIS Admission," "Library and Professional Activities," and "Information Communication Technology (ICT)." It was found that the majority of the posts belonged to "LIS Exam," followed by "Other Discussions" and "General Information related to the Library."

Text Document Categorization using FP-Tree (FP-Tree를 이용한 문서 분류 방법)

  • Park, Yong-Ki;Kim, Hwang-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.11
    • /
    • pp.984-990
    • /
    • 2007
  • As the amount of electronic documents increases explosively, automatic text categorization methods are needed to identify those of interest. Most methods use machine learning techniques based on a word set. This paper introduces a new method, called FPTC (FP-Tree based Text Classifier). FP-Tree is a data structure used in data-mining. In this paper, a method of storing text sentence patterns in the FP-Tree structure and classifying text using the patterns is presented. In the experiments conducted, we use our algorithm with a #Mutual Information and Entropy# approach to improve performance. We also present an analysis of the algorithm via an ordinary differential categorization method.

A study on cultural characteristics of foreign tourists visiting Korea based on text mining of online review (온라인 리뷰의 텍스트 마이닝에 기반한 한국방문 외국인 관광객의 문화적 특성 연구)

  • Yao, Ziyan;Kim, Eunmi;Hong, Taeho
    • The Journal of Information Systems
    • /
    • v.29 no.4
    • /
    • pp.171-191
    • /
    • 2020
  • Purpose The study aims to compare the online review writing behavior of users in China and the United States through text mining on online reviews' text content. In particular, existing studies have verified that there are differences in online reviews between different cultures. Therefore, the purpose of this study is to compare the differences between reviews written by Chinese and American tourists by analyzing text contents of online reviews based on cultural theory. Design/methodology/approach This study collected and analyzed online review data for hotels, targeting Chinese and US tourists who visited Korea. Then, we analyzed review data through text mining like sentiment analysis and topic modeling analysis method based on previous research analysis. Findings The results showed that Chinese tourists gave higher ratings and relatively less negative ratings than American tourists. And American tourists have more negative sentiments and emotions in writing online reviews than Chinese tourists. Also, through the analysis results using topic modeling, it was confirmed that Chinese tourists mentioned more topics about the hotel location, room, and price, while American tourists mentioned more topics about hotel service. American tourists also mention more topics about hotels than Chinese tourists, indicating that American tourists tend to provide more information through online reviews.

Social Media Fake News in India

  • Al-Zaman, Md. Sayeed
    • Asian Journal for Public Opinion Research
    • /
    • v.9 no.1
    • /
    • pp.25-47
    • /
    • 2021
  • This study analyzes 419 fake news items published in India, a fake-news-prone country, to identify the major themes, content types, and sources of social media fake news. The results show that fake news shared on social media has six major themes: health, religion, politics, crime, entertainment, and miscellaneous; eight types of content: text, photo, audio, and video, text & photo, text & video, photo & video, and text & photo & video; and two main sources: online sources and the mainstream media. Health-related fake news is more common only during a health crisis, whereas fake news related to religion and politics seems more prevalent, emerging from online media. Text & photo and text & video have three-fourths of the total share of fake news, and most of them are from online media: online media is the main source of fake news on social media as well. On the other hand, mainstream media mostly produces political fake news. This study, presenting some novel findings that may help researchers to understand and policymakers to control fake news on social media, invites more academic investigations of religious and political fake news in India. Two important limitations of this study are related to the data source and data collection period, which may have an impact on the results.

Prediction of Physical Examination Demand Using Text Mining (텍스트 마이닝을 이용한 건강검진 수요 예측)

  • Park, Kyungbo;Kim, Mi Ryang
    • Journal of Information Technology Services
    • /
    • v.21 no.5
    • /
    • pp.95-106
    • /
    • 2022
  • Recently, physical examinations have become an important strategy to reduce costs for individuals and society. Pre-physical counseling is important for an effective physical examination. However, incomplete counseling is being conducted because the demand for physical examinations is not predicted. Therefore, in this study, the demand for physical examination was predicted using text mining and stepwise regression. As a result of the analysis, the most recent text data showed a high explanatory power of the demand for physical examination. Also, large amounts of data have high explanatory power. In addition, it was found that the high frequency of the text "health food" reduces the number of health examination customers. And the higher the frequency of the text of the word "food", the lower the number of physical examination customers. However, when the word "wild ginseng" was exposed a lot on Twitter, the number of physical examination customers visiting hospitals increased. In other words, customers consume efficiently by comparing the health examination price with the price of consumer goods. The proposed research framework can help predict demand in other industries.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Deep-Learning Approach for Text Detection Using Fully Convolutional Networks

  • Tung, Trieu Son;Lee, Gueesang
    • International Journal of Contents
    • /
    • v.14 no.1
    • /
    • pp.1-6
    • /
    • 2018
  • Text, as one of the most influential inventions of humanity, has played an important role in human life since ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications such as the text data extracted from images that can provide information for automatic annotation, indexing, language translation, and the assistance systems for impaired persons. Therefore, natural-scene text detection with active research topics regarding computer vision and document analysis is very important. Previous methods have poor performances due to numerous false-positive and true-negative regions. In this paper, a fully-convolutional-network (FCN)-based method that uses supervised architecture is used to localize textual regions. The model was trained directly using images wherein pixel values were used as inputs and binary ground truth was used as label. The method was evaluated using ICDAR-2013 dataset and proved to be comparable to other feature-based methods. It could expedite research on text detection using deep-learning based approach in the future.

On supporting full-text retrievals in XML query

  • Hong, Dong-Kweon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.7 no.4
    • /
    • pp.274-278
    • /
    • 2007
  • As XML becomes the standard of digital data exchange format we need to manage a lot of XML data effectively. Unlike tables in relational model XML documents are not structural. That makes it difficult to store XML documents as tables in relational model. To solve these problems there have been significant researches in relational database systems. There are two kinds of approaches: 1) One way is to decompose XML documents so that elements of XML match fields of relational tables. 2) The other one stores a whole XML document as a field of relational table. In this paper we adopted the second approach to store XML documents because sometimes it is not easy for us to decompose XML documents and in some cases their element order in documents are very meaningful. We suggest an efficient table schema to store only inverted index as tables to retrieve required data from XML data fields of relational tables and shows SQL translations that correspond to XML full-text retrievals. The functionalities of XML retrieval are based on the W3C XQuery which includes full-text retrievals. In this paper we show the superiority of our method by comparing the performances in terms of a response time and a space to store inverted index. Experiments show our approach uses less space and shows faster response times.