• Title/Summary/Keyword: document clustering

Search Result 223, Processing Time 0.024 seconds

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

    • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
      • Journal of Intelligence and Information Systems
      • /
      • v.17 no.1
      • /
      • pp.139-152
      • /
      • 2011
    • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

    Research Framework for International Franchising (국제프랜차이징 연구요소 및 연구방향)

    • Kim, Ju-Young;Lim, Young-Kyun;Shim, Jae-Duck
      • Journal of Global Scholars of Marketing Science
      • /
      • v.18 no.4
      • /
      • pp.61-118
      • /
      • 2008
    • The purpose of this research is to construct research framework for international franchising based on existing literature and to identify research components in the framework. Franchise can be defined as management styles that allow franchisee use various management assets of franchisor in order to make or sell product or service. It can be divided into product distribution franchise that is designed to sell products and business format franchise that is designed for running it as business whatever its form is. International franchising can be defined as a way of internationalization of franchisor to foreign country by providing its business format or package to franchisee of host country. International franchising is growing fast for last four decades but academic research on this is quite limited. Especially in Korea, research about international franchising is carried out on by case study format with single case or empirical study format with survey based on domestic franchise theory. Therefore, this paper tries to review existing literature on international franchising research, providing research framework, and then stimulating new research on this field. International franchising research components include motives and environmental factors for decision of expanding to international franchising, entrance modes and development plan for international franchising, contracts and management strategy of international franchising, and various performance measures from different perspectives. First, motives of international franchising are fee collection from franchisee. Also it provides easier way to expanding to foreign country. The other motives including increase total sales volume, occupying better strategic position, getting quality resources, and improving efficiency. Environmental factors that facilitating international franchising encompasses economic condition, trend, and legal or political factors in host and/or home countries. In addition, control power and risk management capability of franchisor plays critical role in successful franchising contract. Final decision to enter foreign country via franchising is determined by numerous factors like history, size, growth, competitiveness, management system, bonding capability, industry characteristics of franchisor. After deciding to enter into foreign country, franchisor needs to set entrance modes of international franchising. Within contractual mode, there are master franchising and area developing franchising, licensing, direct franchising, and joint venture. Theories about entrance mode selection contain concepts of efficiency, knowledge-based approach, competence-based approach, agent theory, and governance cost. The next step after entrance decision is operation strategy. Operation strategy starts with selecting a target city and a target country for franchising. In order to finding, screening targets, franchisor needs to collect information about candidates. Critical information includes brand patent, commercial laws, regulations, market conditions, country risk, and industry analysis. After selecting a target city in target country, franchisor needs to select franchisee, in other word, partner. The first important criteria for selecting partners are financial credibility and capability, possession of real estate. And cultural similarity and knowledge about franchisor and/or home country are also recognized as critical criteria. The most important element in operating strategy is legal document between franchisor and franchisee with home and host countries. Terms and conditions in legal documents give objective information about characteristics of franchising agreement for academic research. Legal documents have definitions of terminology, territory and exclusivity, agreement of term, initial fee, continuing fees, clearing currency, and rights about sub-franchising. Also, legal documents could have terms about softer elements like training program and operation manual. And harder elements like law competent court and terms of expiration. Next element in operating strategy is about product and service. Especially for business format franchising, product/service deliverable, benefit communicators, system identifiers (architectural features), and format facilitators are listed for product/service strategic elements. Another important decision on product/service is standardization vs. customization. The rationale behind standardization is cost reduction, efficiency, consistency, image congruence, brand awareness, and competitiveness on price. Also standardization enables large scale R&D and innovative change in management style. Another element in operating strategy is control management. The simple way to control franchise contract is relying on legal terms, contractual control system. There are other control systems, administrative control system and ethical control system. Contractual control system is a coercive source of power, but franchisor usually doesn't want to use legal power since it doesn't help to build up positive relationship. Instead, self-regulation is widely used. Administrative control system uses control mechanism from ordinary work relationship. Its main component is supporting activities to franchisee and communication method. For example, franchisor provides advertising, training, manual, and delivery, then franchisee follows franchisor's direction. Another component is building franchisor's brand power. The last research element is performance factor of international franchising. Performance elements can be divided into franchisor's performance and franchisee's performance. The conceptual performance measures of franchisor are simple but not easy to obtain objectively. They are profit, sale, cost, experience, and brand power. The performance measures of franchisee are mostly about benefits of host country. They contain small business development, promotion of employment, introduction of new business model, and level up technology status. There are indirect benefits, like increase of tax, refinement of corporate citizenship, regional economic clustering, and improvement of international balance. In addition to those, host country gets socio-cultural change other than economic effects. It includes demographic change, social trend, customer value change, social communication, and social globalization. Sometimes it is called as westernization or McDonaldization of society. In addition, the paper reviews on theories that have been frequently applied to international franchising research, such as agent theory, resource-based view, transaction cost theory, organizational learning theory, and international expansion theories. Resource based theory is used in strategic decision based on resources, like decision about entrance and cooperation depending on resources of franchisee and franchisor. Transaction cost theory can be applied in determination of mutual trust or satisfaction of franchising players. Agent theory tries to explain strategic decision for reducing problem caused by utilizing agent, for example research on control system in franchising agreements. Organizational Learning theory is relatively new in franchising research. It assumes organization tries to maximize performance and learning of organization. In addition, Internalization theory advocates strategic decision of direct investment for removing inefficiency of market transaction and is applied in research on terms of contract. And oligopolistic competition theory is used to explain various entry modes for international expansion. Competency theory support strategic decision of utilizing key competitive advantage. Furthermore, research methodologies including qualitative and quantitative methodologies are suggested for more rigorous international franchising research. Quantitative research needs more real data other than survey data which is usually respondent's judgment. In order to verify theory more rigorously, research based on real data is essential. However, real quantitative data is quite hard to get. The qualitative research other than single case study is also highly recommended. Since international franchising has limited number of applications, scientific research based on grounded theory and ethnography study can be used. Scientific case study is differentiated with single case study on its data collection method and analysis method. The key concept is triangulation in measurement, logical coding and comparison. Finally, it provides overall research direction for international franchising after summarizing research trend in Korea. International franchising research in Korea has two different types, one is for studying Korean franchisor going overseas and the other is for Korean franchisee of foreign franchisor. Among research on Korean franchisor, two common patterns are observed. First of all, they usually deal with success story of one franchisor. The other common pattern is that they focus on same industry and country. Therefore, international franchise research needs to extend their focus to broader subjects with scientific research methodology as well as development of new theory.

    • PDF