• Title/Summary/Keyword: Document classification

Search Result 444, Processing Time 0.02 seconds

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

    • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
      • Journal of Intelligence and Information Systems
      • /
      • v.17 no.1
      • /
      • pp.139-152
      • /
      • 2011
    • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

    A Study on the Distribution, Contents and Types of Stone Inscription of Wuyi-Gugok in China (중국 무이구곡 바위글씨(石刻)의 분포와 내용 및 유형에 관한 연구)

    • Rho, Jae-Hyun;Cheng, Zhao-Xia;Kim, Hong-Gyun
      • Journal of the Korean Institute of Traditional Landscape Architecture
      • /
      • v.38 no.1
      • /
      • pp.115-131
      • /
      • 2020
    • Through literature research and field investigation, this paper attempts to study the distribution, morphology and the typification of the visual and perceptual stone inscription in Wuyi-Gugok of China. The results are as follows: First, there are 350 stone inscriptions in total from the 1st Gok to 9th Gok in Wuyi-Gugok. Second, according to the analysis of the stone inscription distribution, 74(21.2%) stone inscriptions in the 5th Gok, 67(19.2%) in the 6th Gok, 65(18.6%) in the 1st Gok, 60(17.2%) in the 2nd Gok and 53(15.2%) in the 4th Gok are confirmed. The above five Goks contain 319(91.1%) stone inscriptions, so they have rich cultural landscape. Third, according to the survey, the number of the stone inscriptions existed in the Sugwangseok of the 1st Gok are 41(22.6%), in the Homagan of Cheonyubong of the 6th Gok are 29(8.3%), in the Jesiam of the 4th Gok are 23(6.6%), in the Nyeongam of the 2nd Gok are 22(6.3%), in the Hyangseongam of the 6th Gok are 21(6%), in the Unwa of the 5th Gok are 19(5.4%), in the Bokhoam of the 5th Gok are 18(5.1%), in the Eunbyeongbong of the 5th Gok are 17(4.9%), in the Daejangbong of the 4th Gok are 14(4%), in the Daewangbong of the 1st Gok and the Geumgokam of the 4th Gok are 12(3.4%). Thus, a total of 228 (65.1%) stone inscriptions are concentrated in these 11 sites, which represent the popularity and cultural value of these rocks. Fourth, the stone inscription of Wuyi-Gugok, praising the landform and topographical geological landscape of Mount Wuyi, mainly describe the scenic name of each Gok related to Zhu Xi's Gugok culture, appreciate Zhu Xi's tracks and the stone inscription in the sacred land of Neo-Confucianism culture, and also record the Confucian edification of mencius thoughts, Muigun(武夷君) and the myths and legends related to the site names of Wuyi mountain, which can remind people of the worldview of the celestial paradise where the gods live and the fairyland of the land of peach blossoms. In addition, it indicates that the historical and cultural landscape, which is full of colorful history and myths and legends, including allusions related to Confucian, buddhist and Taoist celebrities and the ancestor ancient things related to traditional culture of China is very diverse. Fifth, the results of the classification, based on the content of the stone inscription in Wuyi-Gugok, are classified as the scenery name inscription, the praise scene inscription, the recording travel inscription, the recording event inscription, the philosophy inscription, the expressing emotion inscription, the religion inscription, the inscription for auspiciousness, the slogan and expressing ambition inscription and the official document notice inscription, among which there are 102(29.1%) praise scene inscriptions, 93(26.6%) scenery name inscriptions and 61(17.4%) recording travel inscriptions. The stone inscriptions of Wuyi-Gugok have the characteristics of the special emphasis on scenery names, landscape praise and commemorative tours. Sixth, the analysis of the intertext between the 「Figure of Wuyi-Gugok」 and Wuyi-Gugok rock letters, in the study found that the method of propagation between media was mostly the method of propagation of quotations and maintained intermedia through extension, repetition, extension, and compression.

    An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

    • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
      • Journal of Intelligence and Information Systems
      • /
      • v.25 no.1
      • /
      • pp.21-41
      • /
      • 2019
    • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.