• Title/Summary/Keyword: big data mining

Search Result 679, Processing Time 0.026 seconds

The Effect of Text Consistency between the Review Title and Content on Review Helpfulness (온라인 리뷰의 제목과 내용의 일치성이 리뷰 유용성에 미치는 영향)

  • Li, Qinglong;Kim, Jaekyeong
    • Knowledge Management Research
    • /
    • v.23 no.3
    • /
    • pp.193-212
    • /
    • 2022
  • Many studies have proposed several factors that affect review helpfulness. Previous studies have investigated the effect of quantitative factors (e.g., star ratings) and affective factors (e.g., sentiment scores) on review helpfulness. Online reviews contain titles and contents, but existing studies focus on the review content. However, there is a limitation to investigating the factors that affect review helpfulness based on the review content without considering the review title. However, previous studies independently investigated the effect of review content and title on review helpfulness. However, it may ignore the potential impact of similarity between review titles and content on review helpfulness. This study used text consistency between review titles and content affect review helpfulness based on the mere exposure effect theory. We also considered the role of information clearness, review length, and source reliability. The results show that text consistency between the review title and the content negatively affects the review helpfulness. Furthermore, we found that information clearness and source reliability weaken the negative effects of text consistency on review helpfulness.

Developing a deep learning-based recommendation model using online reviews for predicting consumer preferences: Evidence from the restaurant industry (딥러닝 기반 온라인 리뷰를 활용한 추천 모델 개발: 레스토랑 산업을 중심으로)

  • Dongeon Kim;Dongsoo Jang;Jinzhe Yan;Jiaen Li
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.31-49
    • /
    • 2023
  • With the growth of the food-catering industry, consumer preferences and the number of dine-in restaurants are gradually increasing. Thus, personalized recommendation services are required to select a restaurant suitable for consumer preferences. Previous studies have used questionnaires and star-rating approaches, which do not effectively depict consumer preferences. Online reviews are the most essential sources of information in this regard. However, previous studies have aggregated online reviews into long documents, and traditional machine-learning methods have been applied to these to extract semantic representations; however, such approaches fail to consider the surrounding word or context. Therefore, this study proposes a novel review textual-based restaurant recommendation model (RT-RRM) that uses deep learning to effectively extract consumer preferences from online reviews. The proposed model concatenates consumer-restaurant interactions with the extracted high-level semantic representations and predicts consumer preferences accurately and effectively. Experiments on real-world datasets show that the proposed model exhibits excellent recommendation performance compared with several baseline models.

Achievements of Characterized Education for Healthcare Data Science Initiative (대학 특성화 사업 성과에 관한 연구-보건의료 데이터 사이언티스트 프로그램을 중심으로)

  • Park, HwaGyoo
    • Journal of Service Research and Studies
    • /
    • v.9 no.3
    • /
    • pp.87-99
    • /
    • 2019
  • Healthcare and data science are often linked through finances as the industry attempts to reduce its expenses with the help of large amounts of data. Data science and medicine are rapidly developing, and it is important that they advance together. Data science is a driving force in transition of healthcare systems from treatment-oriented to preventive care in healthcare 3.0 era. It enables customized precision-based medicine that current healthcare systems cannot facilitate, and discovers more cost-effective treatment. Currently, healthcare big data is in the reality of medical institution, public health, medical academia, pharmaceutical sector as well as insurance agency. With this motivation, the medical college of Soonchunhyang university has performed a 'healthcare data science initiative(HDSI)' since 2014. Most of domestic HDSI programs focus on short-term contents such as mentoring and sharing cases for data science. Therefore, it is difficult to provide education tailored to the level of skills and job competency required at the practical site. Soonchunhyang HDSI implemented specialized strategies for improving resilience and response to changes in the IT education of current healthcare with the emphasis on the need for systematic activation of the practical HDSI. The HDSI has been performed as a part of on industry-academic link program in CK-1. Through quantitative and qualitative analysis, this paper discussed the HDSI process, performance, achievement, and implications.

Analysis of Behavior of Seoullo 7017 Visitors - With a Focus on Text Mining and Social Network Analysis - (서울로 7017 방문자들의 이용행태 분석 -텍스트 마이닝과 소셜 네트워크 분석을 중심으로-)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.6
    • /
    • pp.16-24
    • /
    • 2020
  • The purpose of this study is to analyze the usage behavior of Seoullo 7017, the first public garden in Korea, to understand the usage status by analyzing blogs, and to present usage behavior and improvement plans for Seoullo 7017. From June 2017 to May 2020, after Seoullo 7017 was open to citizens, character data containing 'Seoullo 7017' in the title and contents of NAVER and·DAUM blogs were converted to text mining and socialization, a Big Data technique. The analysis was conducted using social network analysis. The summary of the research results is as follows. First of all, the ratio of men and women searching for Seoullo 7017 online is similar, and the regions that searched most are in the order of Seoul and Gyeonggi, and those in their 40s and 50s were the most interested. In other words, it can be seen that there is a lack of interest in regions other than Seoul and Gyeonggi and among those in their 10s, 20s, and 30s. The main behaviors of Seoullo 7017 are' night view' and 'walking', and the factors that affect culture and art are elements related to culture and art. If various programs and festivals are opened and actively promoted, the main behavior will be more varied. On the other hand, the main behavior that the users of Seoullo 7017 want is 'sit', which is a static behavior, but the physical conditions are not sufficient for the behavior to occur. Therefore, facilities that can cause sitting behavior, such as shades and benches must be improved to meet the needs of visitors. The peculiarity of the change in the behavior of Seoullo 7017 is that it is recognized as a good place to travel alone and a good place to walk alone as a public multi-use facility and group activities are restricted due to COVID-19. Accordingly, in a situation like the COVD-19 pandemic, more diverse behaviors can be derived in facilities where people can take a walk, etc., and the increase of various attractions and the satisfaction of users can be increased. Seoullo 7017, as Korea's first public pedestrian area, was created for urban regeneration and the efficient use of urban resources in areas beyond the meaning of public spaces and is a place with various values such as history, nature, welfare, culture, and tourism. However, as a result of the use behavior analysis, various behaviors did not occur in Seoullo 7017 as expected, and elements that hinder those major behaviors were derived. Based on these research results, it is necessary to understand the usage behavior of Seoullo 7017 and to establish a plan for spatial system and facility improvement, so that Seoullo 7017 can be an important place for urban residents and a driving force to revitalize the city.

Sentiment analysis on movie review through building modified sentiment dictionary by movie genre (영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석)

  • Lee, Sang Hoon;Cui, Jing;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.97-113
    • /
    • 2016
  • Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.

Pandemics Era, A Study one the Viewers' Responses of Medical Drama through Text Mining. -Focused on - (팬데믹 시대, 텍스트 마이닝을 통한 의학드라마의 시청자 반응 연구-<슬기로운 의사생활>을 중심으로-)

  • Ahn, Sunghun;Oh, SeJong;Jeong, Dalyoung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.385-389
    • /
    • 2020
  • The medical drama has developed into a story centered on 'people', raising viewers' sympathy. The story of the drama is the true life story of doctors, patients and families. It is also a story that reminds me of 'a little special day of our ordinary people'. And the song played and sung by five characters in the drama became a factor that stimulates nostalgia and increases immersion. The highest viewer rating was 14.1%, and 51,584 blogs alone were registered. According to the big data analysis, the related words were 'Wise OST', 'Album Name', 'Artist Name', 'Two Hours in a row', 'Record', 'Remake', 'OST Revealed', 'Advertisement Revenue', 'Playlist', 'Aroha' and 'Cho Jung-seok'. The commercialization of medical dramas includes 'Sales of Drama OST Albums', 'Organizing Online Live Concerts (PPL in Advertising)', 'Publishing Piano Music', 'Picture of People-Oriented Photography', 'Making Music Video Editing Drama Highlight', 'YouTube Upload Profits', 'Mask' and 'Disinfectant'. it is predicted that the touching story of Corona 19 and the charming humanity will unfold. The limitations of the research will require analysis of various works by genre and attempts to analyze consumer values by industry.

Bibliometric Analysis on Studies of Korean Intangible Cultural Property Dance : Focusing on Events in the Seoul Area (한국무형문화재 춤 연구의 계량서지학적 분석 : 서울지역 종목을 중심으로)

  • Yoo, Ji-Young;Kim, Jee-Young;Baek, Hyun-Soon
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.4
    • /
    • pp.139-147
    • /
    • 2019
  • This study conducted bibliometric analysis on studies of Korean intangible cultural heritage dance in the Seoul area and it aimed to figure out the tendencies of that research. For this, a list of Korean intangible cultural heritage dance studies of 24 events was collected and analysis was conducted through the big data analysis solution of TEXTOM. Text mining was used as the method for analysis. Research results showed that first, most of the studies were conducted on the Bongsan Talchum and studies on teaching and learning methods were especially actively conducted. On the other hand, there were not many studies on Gut and the need for research vitalization in that area was confirmed. Second, in studies on Cheoyongmu events, the term'contemporary Cheoyongmu' was used frequently. This can be considered the use of meaningful terms with regard to intangible cultural heritage dance that has changed throughout history. At this, the vitalization of research that can reveal the typicality of dance is demanded from research of other events as well. Third, there was a notable amount of research that compared and analyzed dance styles with regard to the Munmyoilmu. This was seen as the result of discussions in the Korean dancing world regarding archetypal dance styles expanding into academic discussions. Therefore, it was revealed that academic discussions can connect to academic outcomes apart from whether the matter is right or wrong.

Comparing Corporate and Public ESG Perceptions Using Text Mining and ChatGPT Analysis: Based on Sustainability Reports and Social Media (텍스트마이닝과 ChatGPT 분석을 활용한 기업과 대중의 ESG 인식 비교: 지속가능경영보고서와 소셜미디어를 기반으로)

  • Jae-Hoon Choi;Sung-Byung Yang;Sang-Hyeak Yoon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.347-373
    • /
    • 2023
  • As the significance of ESG (Environmental, Social, and Governance) management amplifies in driving sustainable growth, this study delves into and compares ESG trends and interrelationships from both corporate and societal viewpoints. Employing a combination of Latent Dirichlet Allocation Topic Modeling (LDA) and Semantic Network Analysis, we analyzed sustainability reports alongside corresponding social media datasets. Additionally, an in-depth examination of social media content was conducted using Joint Sentiment Topic Modeling (JST), further enriched by Semantic Network Analysis (SNA). Complementing text mining analysis with the assistance of ChatGPT, this study identified 25 different ESG topics. It highlighted differences between companies aiming to avoid risks and build trust, and the general public's diverse concerns like investment options and working conditions. Key terms like 'greenwashing,' 'serious accidents,' and 'boycotts' show that many people doubt how companies handle ESG issues. The findings from this study set the foundation for a plan that serves key ESG groups, including businesses, government agencies, customers, and investors. This study also provide to guide the creation of more trustworthy and effective ESG strategies, helping to direct the discussion on ESG effectiveness.

A Study of protective measures of the source program for the development of the Internet of Things (IoT): Protection of the program as well as plagiarism research (사물인터넷(IoT)발전을 위한 소스프로그램 보호방안 연구: 프로그램의 보호와 유사표절 연구)

  • Lee, Jong-Sik
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.4
    • /
    • pp.31-45
    • /
    • 2018
  • Recent dramatical development of computer technology related to internet technology intensifies the dispute over software of computer or smart device. Research on software has been flourished with political issuing of fierce competition among nations for software development. Particularly industrial growth in ethernet based big data and IoT (Internet of Things) has promoted to build and develop open source programs based on java, xcode and C. On these circumstances, issue on software piracy has been confronted despite the basic security policy protecting intellectual property rights of software and thus it is of substantial importance to protect the rights of originality of source program license. However, the other issue on source technology protection of developer is the possibility of hindrance to advancement in industry and culture by developing programs. This study discuss the way of enhancing legal stability of IoT application program development and reinforcing precision in inspection of program plagiarism by analyzing the source programs with newly introducing text mining technique, thus suggests an alternative protective way of infringement of personal information due to duplicating program.