• Title/Summary/Keyword: Unstructured data mining

Search Result 177, Processing Time 0.025 seconds

Unstructured Data Quantification Scheme Based on Text Mining for User Feedback Extraction (사용자 의견 추출을 위한 텍스트 마이닝 기반 비정형 데이터 정량화 방안)

  • Jo, Jung-Heum;Chung, Yong-Taek;Choi, Seong-Wook;Ok, Changsoo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.4
    • /
    • pp.131-137
    • /
    • 2018
  • People write reviews of numerous products or services on the Internet, in their blogs or community bulletin boards. These unstructured data contain important emotions and opinions about the author's product or service, which can provide important information for future product design or marketing. However, this text-based information cannot be evaluated quantitatively, and thus they are difficult to apply to mathematical models or optimization problems for product design and improvement. Therefore, this study proposes a method to quantitatively extract user's opinion or preference about a specific product or service by utilizing a lot of text-based information existing on the Internet or online. The extracted unstructured text information is decomposed into basic unit words, and positive rate is evaluated by using existing emotional dictionaries and additional lists proposed in this study. This can be a way to effectively utilize unstructured text data, which is being generated and stored in vast quantities, in product or service design. Finally, to verify the effectiveness of the proposed method, a case study was conducted using movie review data retrieved from a portal website. By comparing the positive rates calculated by the proposed framework with user ratings for movies, a guideline on text mining based evaluation of unstructured data is provided.

A Method of Predicting Service Time Based on Voice of Customer Data (고객의 소리(VOC) 데이터를 활용한 서비스 처리 시간 예측방법)

  • Kim, Jeonghun;Kwon, Ohbyung
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.197-210
    • /
    • 2016
  • With the advent of text analytics, VOC (Voice of Customer) data become an important resource which provides the managers and marketing practitioners with consumer's veiled opinion and requirements. In other words, making relevant use of VOC data potentially improves the customer responsiveness and satisfaction, each of which eventually improves business performance. However, unstructured data set such as customers' complaints in VOC data have seldom used in marketing practices such as predicting service time as an index of service quality. Because the VOC data which contains unstructured data is too complicated form. Also that needs convert unstructured data from structure data which difficult process. Hence, this study aims to propose a prediction model to improve the estimation accuracy of the level of customer satisfaction by combining unstructured from textmining with structured data features in VOC. Also the relationship between the unstructured, structured data and service processing time through the regression analysis. Text mining techniques, sentiment analysis, keyword extraction, classification algorithms, decision tree and multiple regression are considered and compared. For the experiment, we used actual VOC data in a company.

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining (텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구)

  • Kim, Joo Young;Kim, Dong soo
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.151-161
    • /
    • 2016
  • In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

Interplay of Text Mining and Data Mining for Classifying Web Contents (웹 컨텐츠의 분류를 위한 텍스트마이닝과 데이터마이닝의 통합 방법 연구)

  • 최윤정;박승수
    • Korean Journal of Cognitive Science
    • /
    • v.13 no.3
    • /
    • pp.33-46
    • /
    • 2002
  • Recently, unstructured random data such as website logs, texts and tables etc, have been flooding in the internet. Among these unstructured data there are potentially very useful data such as bulletin boards and e-mails that are used for customer services and the output from search engines. Various text mining tools have been introduced to deal with those data. But most of them lack accuracy compared to traditional data mining tools that deal with structured data. Hence, it has been sought to find a way to apply data mining techniques to these text data. In this paper, we propose a text mining system which can incooperate existing data mining methods. We use text mining as a preprocessing tool to generate formatted data to be used as input to the data mining system. The output of the data mining system is used as feedback data to the text mining to guide further categorization. This feedback cycle can enhance the performance of the text mining in terms of accuracy. We apply this method to categorize web sites containing adult contents as well as illegal contents. The result shows improvements in categorization performance for previously ambiguous data.

  • PDF

A study on unstructured text mining algorithm through R programming based on data dictionary (Data Dictionary 기반의 R Programming을 통한 비정형 Text Mining Algorithm 연구)

  • Lee, Jong Hwa;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.2
    • /
    • pp.113-124
    • /
    • 2015
  • Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.

A Process Mining using Association Rule and Sequence Pattern (연관규칙과 순차패턴을 이용한 프로세스 마이닝)

  • Chung, So-Young;Kwon, Soo-Tae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.31 no.2
    • /
    • pp.104-111
    • /
    • 2008
  • A process mining is considered to support the discovery of business process for unstructured process model, and a process mining algorithm by using the associated rule and sequence pattern of data mining is developed to extract information about processes from event-log, and to discover process of alternative, concurrent and hidden activities. Some numerical examples are presented to show the effectiveness and efficiency of the algorithm.

An Exploratory Study on the Prediction of Business Survey Index Using Data Mining (기업경기실사지수 예측에 대한 탐색적 연구: 데이터 마이닝을 이용하여)

  • Kyungbo Park;Mi Ryang Kim
    • Journal of Information Technology Services
    • /
    • v.22 no.4
    • /
    • pp.123-140
    • /
    • 2023
  • In recent times, the global economy has been subject to increasing volatility, which has made it considerably more difficult to accurately predict economic indicators compared to previous periods. In response to this challenge, the present study conducts an exploratory investigation that aims to predict the Business Survey Index (BSI) by leveraging data mining techniques on both structured and unstructured data sources. For the structured data, we have collected information regarding foreign, domestic, and industrial conditions, while the unstructured data consists of content extracted from newspaper articles. By employing an extensive set of 44 distinct data mining techniques, our research strives to enhance the BSI prediction accuracy and provide valuable insights. The results of our analysis demonstrate that the highest predictive power was attained when using data exclusively from the t-1 period. Interestingly, this suggests that previous timeframes play a vital role in forecasting the BSI effectively. The findings of this study hold significant implications for economic decision-makers, as they will not only facilitate better-informed decisions but also serve as a robust foundation for predicting a wide range of other economic indicators. By improving the prediction of crucial economic metrics, this study ultimately aims to contribute to the overall efficacy of economic policy-making and decision processes.

Construction Bid Data Analysis for Overseas Projects Based on Text Mining - Focusing on Overseas Construction Project's Bidder Inquiry (텍스트 마이닝을 통한 해외건설공사 입찰정보 분석 - 해외건설공사의 입찰자 질의(Bidder Inquiry) 정보를 대상으로 -)

  • Lee, JeeHee;Yi, June-Seong;Son, JeongWook
    • Korean Journal of Construction Engineering and Management
    • /
    • v.17 no.5
    • /
    • pp.89-96
    • /
    • 2016
  • Most data generated in construction projects is unstructured text data. Unstructured data analysis is very needed in order for effective analysis on large amounts of text-based documents, such as contracts, specifications, and RFI. This study analysed previously performed project's bid related documents (bidder inquiry) in overseas construction projects; as a results of the analysis frequent words in documents, association rules among the words, and various document topics were derived. This study suggests effective text analysis approach for massive documents with short time using text mining technique, and this approach is expected to extend the unstructured text data analysis in construction industry.

Analysis of the Unstructured Traffic Report from Traffic Broadcasting Network by Adapting the Text Mining Methodology (텍스트 마이닝을 적용한 한국교통방송제보 비정형데이터의 분석)

  • Roh, You Jin;Bae, Sang Hoon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.3
    • /
    • pp.87-97
    • /
    • 2018
  • The traffic accident reports that are generated by the Traffic Broadcasting Networks(TBN) are unstructured data. It, however, has the value as some sort of real-time traffic information generated by the viewpoint of the drives and/or pedestrians that were on the roads, the time and spots, not the offender or the victim who caused the traffic accidents. However, the traffic accident reports, which are big data, were not applied to traffic accident analysis and traffic related research commonly. This study adopting text-mining technique was able to provide a clue for utilizing it for the impacts of traffic accidents. Seven years of traffic reports were grasped by this analysis. By analyzing the reports, it was possible to identify the road names, accident spot names, time, and to identify factors that have the greatest influence on other drivers due to traffic accidents. Authors plan to combine unstructured accident data with traffic reports for further study.

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers

  • Jung, Hoon;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4706-4724
    • /
    • 2020
  • With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.