• Title/Summary/Keyword: 온라인 문서

Search Result 215, Processing Time 0.025 seconds

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.

Analyzing Product Reviews by Consumers using Natural Language Processing Techniques (자연어 처리 기법을 이용한 상품평 분석에 관한 연구)

  • Jeon, So-Eun;Lee, Young-Gu;Park, Kyeong-Cheol;Paik, Woo-Jin
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.660-663
    • /
    • 2009
  • Consumers express how they evaluate what they purchased by writing reviews especially when they purchased products online. By analyzing the reviews about a product, it will be possible to find out what the consumers liked and disliked about the product. It will be also possible to identify the general consensus on what matters in purchaing certain product type such as a laptop if many reviews about many instances of a particular product type is analyzed. However, it takes a lot of time to manually analyzing the reviews. Thus, we propose to use two natural language processing oriented computational techniques to analyze a large number of reviews. The techniques are text classification and information extraction. We developed an review analysis system and conducted experiments against the reviews about the laptop computers posted on the Naver information portal.

  • PDF

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining (텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구)

  • Kim, Joo Young;Kim, Dong soo
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.151-161
    • /
    • 2016
  • In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

Analysis of Cause on Difference of ICT Literacy Level according to Gender in Middle School (중학생의 성별에 따른 ICT 리터러시 수준 차이 원인 분석)

  • Ahn, Seonghun
    • Journal of The Korean Association of Information Education
    • /
    • v.21 no.1
    • /
    • pp.1-11
    • /
    • 2017
  • In this paper, I analyzed the cause that girl' ICT literacy score was hiegher than boy's score in middle school after 2010. KERIS had sampled 1% of whole middle school students and measured their ICT level. Therefore, I analyzed the correlation by gender between ICT literacy score and the using habit of ICT based on that result. As a result, the girl's score was higher than the boy's score in field of creating or editing documents, searching information for homework and study, accessing to on-line dictionary for study and enjoying SNS. Also, that difference between girl and boy had little correlation with ICT literacy score. Therefore, I proposed educational method for ICT literacy learning based on the difference of habit to use ICT between boys and girls.

Social Issue Risk Type Classification based on Social Bigdata (소셜 빅데이터 기반 사회적 이슈 리스크 유형 분류)

  • Oh, Hyo-Jung;An, Seung-Kwon;Kim, Yong
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.8
    • /
    • pp.1-9
    • /
    • 2016
  • In accordance with the increased political and social utilization of social media, demands on online trend analysis and monitoring technologies based on social bigdata are also increasing rapidly. In this paper, we define 'risk' as issues which have probability of turn to negative public opinion among big social issues and classify their types in details. To define risk types, we conduct a complete survey on news documents and analyzed characteristics according to issue domains. We also investigate cross-medias analysis to find out how different public media and personalized social media. At the result, we define 58 risk types for 6 domains and developed automatic classification model based on machine learning algorithm. Based on empirical experiments, we prove the possibility of automatic detection for social issue risk in social media.

Cluster-Based Selection of Diverse Query Examples for Active Learning (능동적 학습을 위한 군집화 기반의 다양한 복수 문의 예제 선정 방법)

  • Kang, Jae-Ho;Ryu, Kwang-Ryel;Kwon, Hyuk-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.1
    • /
    • pp.169-189
    • /
    • 2005
  • In order to derive a better classifier with a limited number of training examples, active teaming alternately repeats the querying stage fur category labeling and the subsequent learning stage fur rebuilding the calssifier with the newly expanded training set. To relieve the user from the burden of labeling, especially in an on-line environment, it is important to minimize the number of querying steps as well as the total number of query examples. We can derive a good classifier in a small number of querying steps by using only a small number of examples if we can select multiple of diverse, representative, and ambiguous examples to present to the user at each querying step. In this paper, we propose a cluster-based batch query selection method which can select diverse, representative, and highly ambiguous examples for efficient active learning. Experiments with various text data sets have shown that our method can derive a better classifier than other methods which only take into account the ambiguity as the criterion to select multiple query examples.

  • PDF

A Method of Identifying Ownership of Personal Information exposed in Social Network Service (소셜 네트워크 서비스에 노출된 개인정보의 소유자 식별 방법)

  • Kim, Seok-Hyun;Cho, Jin-Man;Jin, Seung-Hun;Choi, Dae-Seon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.6
    • /
    • pp.1103-1110
    • /
    • 2013
  • This paper proposes a method of identifying ownership of personal information in Social Network Service. In detail, the proposed method automatically decides whether any location information mentioned in twitter indicates the publisher's residence area. Identifying ownership of personal information is necessary part of evaluating risk of opened personal information online. The proposed method uses a set of decision rules that considers 13 features that are lexicographic and syntactic characteristics of the tweet sentences. In an experiment using real twitter data, the proposed method shows better performance (f1-score: 0.876) than the conventional document classification models such as naive bayesian that uses n-gram as a feature set.

Privacy Policy Analysis Techniques Using Deep Learning (딥러닝을 활용한 개인정보 처리방침 분석 기법 연구)

  • Jo, Yong-Hyun;Cha, Young-Kyun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.2
    • /
    • pp.305-312
    • /
    • 2020
  • The Privacy Act stipulates that the privacy policy document, which is a privacy statement, should be disclosed in order to guarantee the rights of the information subjects, and the Fair Trade Commission considers the privacy policy as a condition and conducts an unfair review of the terms and conditions under the Terms and Conditions Control Act. However, the information subjects tend not to read personal information because it is complicated and difficult to understand. Simple and legible information processing policies will increase the probability of participating in online transactions, contributing to the increase in corporate sales and resolving the problem of information asymmetry between operators and information entities. In this study, complex personal information processing policies are analyzed using deep learning, and models are presented for acquiring simplified personal information processing policies that are highly readable by the information subjects. To present the model, the personal information processing policies of 258 domestic companies were established as data sets and analyzed using deep learning technology.

A Study On Security Threat Analysis and Government Solution for Civil Service Online (대민서비스 온라인 보안위협 분석 및 대응방안 연구)

  • Choi, Do-Hyun;Jun, Mun-Seog;Park, Jung-Oh
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.5
    • /
    • pp.1-10
    • /
    • 2014
  • As the number of public institution web sites and civil services based on electronic government has increased from the past until now, there is an increasing demand for security of the overall electronic civil services such as possibility for forgery and falsification of electronic documents. Existing studies proposed security threats and response methods on an electronic government service (G4C) from the perspective of service provider. In this study, the scope of analysis was expanded to analyze security technology used for each service type on 289 web sites providing civil services and to present response methods on security threats. The aim of this paper is to discuss practical responses to civil services and core problems of civil services in electronic government that need to be resolved.