• Title/Summary/Keyword: Keywords Similarity

Search Result 89, Processing Time 0.026 seconds

Design and Implementation of Web Directory Engine Using Dynamic Category Hierarchy (동적분류에 의한 주제별 웹 검색엔진의 설계 및 구현)

  • Choi Bum-Ghi;Park Sun;Park Tae-Su;Song Jae-Won;Lee Ju-Hong
    • Journal of Internet Computing and Services
    • /
    • v.7 no.2
    • /
    • pp.71-80
    • /
    • 2006
  • In web search engines, there are two main methods: directory searching and keyword searching. Keyword searching shows high recall rate but tends to come up with too many search results to find which users want to see the pages. Directory searching has also a difficulty to find the pages that users want in case of selecting improper category without knowing the exact category, that is, it shows high precision rates but low recall rates. We designed and implemented a new web search engine to resolve the problems of directory search method. It regards a category as a fuzzy set which contains keywords and calculate the degree of inclusion between categories. The merit of this method is to enhance the recall rate of directory searching by expanding subcategories on the basis of similarity.

  • PDF

A Study on the Analysis of Intellectual Structure of Korean Veterinary Sciences (국내 수의과학 분야의 지적 구조 분석에 관한 연구)

  • Cho, Hyun-Yang
    • Journal of Information Management
    • /
    • v.43 no.2
    • /
    • pp.43-66
    • /
    • 2012
  • The purpose of this study is to see the intellectual structure in the field of veterinary sciences in Korea, using author profiling analysis(APA), a bibliometric approach. Three journals are selected on the basis of citation data, exchanging most citations with Korean Journal of Veterinary. And then, 50 authors who published most articles at selected journals during the given period of time were chosen. The analysis of similarity and dissimilarity among authors by comparing co-word appearance patterns from article title, abstracts, and keywords was made. Authors can be grouped 11 minor clusters under 4 major clusters, depending on their interests in the area of veterinary sciences in Korea. The subjects for each cluster at the veterinary sciences are decided by the matching the keyword, representing author's research interest. As a result, it is possible to figure out the current research trends and the researcher network in the field of veterinary sciences.

Analyzing data-related policy programs in Korea using text mining and network cluster analysis (텍스트 마이닝과 네트워크 군집 분석을 활용한 한국의 데이터 관련 정책사업 분석)

  • Sungjun Choi;Kiyoon Shin;Yoonhwan Oh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.6
    • /
    • pp.63-81
    • /
    • 2023
  • This study endeavors to classify and categorize similar policy programs through network clustering analysis, using textual information from data-related policy programs in Korea. To achieve this, descriptions of data-related budgetary programs in South Korea in 2022 were collected, and keywords from the program contents were extracted. Subsequently, the similarity between each program was derived using TF-IDF, and policy program network was constructed accordingly. Following this, the structural characteristics of the network were analyzed, and similar policy programs were clustered and categorized through network clustering. Upon analyzing a total of 97 programs, 7 major clusters were identified, signifying that programs with analogous themes or objectives were categorized based on application area or services utilizing data. The findings of this research illuminate the current status of data-related policy programs in Korea, providing policy implications for a strategic approach to planning future national data strategies and programs, and contributing to the establishment of evidence-based policies.

A Study on the Design of Case-based Reasoning Office Knowledge Recommender System for Office Professionals (사례기반추론을 이용한 사무지식 추천시스템)

  • Kim, Myong-Ok;Na, Jung-Ah
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.131-146
    • /
    • 2011
  • It is becoming more essential than ever for office professionals to become competent in information collection/gathering and problem solving in today's global business society. In particular, office professionals do not only assist simple chores but are also forced to make decisions as quickly and efficiently as possible in problematic situations that can end in either profit or loss to their company. Since office professionals rely heavily on their tacit knowledge to solve problems that arise in everyday business situations, it is truly helpful and efficient to refer to similar business cases from the past and share or reuse such previous business knowledge for better performance results. Case-based reasoning(CBR) is a problem-solving method which utilizes previous similar cases to solve problems. Through CBR, the closest case to the current business situation can be searched and retrieved from the case or knowledge base and can be referred to for a new solution. This reduces the time and resources needed and increase success probability. The main purpose of this study is to design a system called COKRS(Case-based reasoning Office Knowledge Recommender System) and develop a prototype for it. COKRS manages cases and their meta data, accepts key words from the user and searches the casebase for the most similar past case to the input keyword, and communicates with users to collect information about the quality of the case provided and continuously apply the information to update values on the similarity table. Core concepts like system architecture, definition of a case, meta database, similarity table have been introduced, and also an algorithm to retrieve all similar cases from past work history has also been proposed. In this research, a case is best defined as a work experience in office administration. However, defining a case in office administration was not an easy task in reality. We surveyed 10 office professionals in order to get an idea of how to define a case in office administration and found out that in most cases any type of office work is to be recorded digitally and/or non-digitally. Therefore, we have defined a record or document case as for COKRS. Similarity table was composed of items of the result of job analysis for office professionals conducted in a previous research. Values between items of the similarity table were initially set to those from researchers' experiences and literature review. The results of this study could also be utilized in other areas of business for knowledge sharing wherever it is necessary and beneficial to share and learn from past experiences. We expect this research to be a reference for researchers and developers who are in this area or interested in office knowledge recommendation system based on CBR. Focus group interview(FGI) was conducted with ten administrative assistants carefully selected from various areas of business. They were given a chance to try out COKRS in an actual work setting and make some suggestions for future improvement. FGI has identified the user-interface for saving and searching cases for keywords as the most positive aspect of COKRS, and has identified the most urgently needed improvement as transforming tacit knowledge and knowhow into recorded documents more efficiently. Also, the focus group has mentioned that it is essential to secure enough support, encouragement, and reward from the company and promote positive attitude and atmosphere for knowledge sharing for everybody's benefit in the company.

The Ontology Based, the Movie Contents Recommendation Scheme, Using Relations of Movie Metadata (온톨로지 기반 영화 메타데이터간 연관성을 활용한 영화 추천 기법)

  • Kim, Jaeyoung;Lee, Seok-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.25-44
    • /
    • 2013
  • Accessing movie contents has become easier and increased with the advent of smart TV, IPTV and web services that are able to be used to search and watch movies. In this situation, there are increasing search for preference movie contents of users. However, since the amount of provided movie contents is too large, the user needs more effort and time for searching the movie contents. Hence, there are a lot of researches for recommendations of personalized item through analysis and clustering of the user preferences and user profiles. In this study, we propose recommendation system which uses ontology based knowledge base. Our ontology can represent not only relations between metadata of movies but also relations between metadata and profile of user. The relation of each metadata can show similarity between movies. In order to build, the knowledge base our ontology model is considered two aspects which are the movie metadata model and the user model. On the part of build the movie metadata model based on ontology, we decide main metadata that are genre, actor/actress, keywords and synopsis. Those affect that users choose the interested movie. And there are demographic information of user and relation between user and movie metadata in user model. In our model, movie ontology model consists of seven concepts (Movie, Genre, Keywords, Synopsis Keywords, Character, and Person), eight attributes (title, rating, limit, description, character name, character description, person job, person name) and ten relations between concepts. For our knowledge base, we input individual data of 14,374 movies for each concept in contents ontology model. This movie metadata knowledge base is used to search the movie that is related to interesting metadata of user. And it can search the similar movie through relations between concepts. We also propose the architecture for movie recommendation. The proposed architecture consists of four components. The first component search candidate movies based the demographic information of the user. In this component, we decide the group of users according to demographic information to recommend the movie for each group and define the rule to decide the group of users. We generate the query that be used to search the candidate movie for recommendation in this component. The second component search candidate movies based user preference. When users choose the movie, users consider metadata such as genre, actor/actress, synopsis, keywords. Users input their preference and then in this component, system search the movie based on users preferences. The proposed system can search the similar movie through relation between concepts, unlike existing movie recommendation systems. Each metadata of recommended candidate movies have weight that will be used for deciding recommendation order. The third component the merges results of first component and second component. In this step, we calculate the weight of movies using the weight value of metadata for each movie. Then we sort movies order by the weight value. The fourth component analyzes result of third component, and then it decides level of the contribution of metadata. And we apply contribution weight to metadata. Finally, we use the result of this step as recommendation for users. We test the usability of the proposed scheme by using web application. We implement that web application for experimental process by using JSP, Java Script and prot$\acute{e}$g$\acute{e}$ API. In our experiment, we collect results of 20 men and woman, ranging in age from 20 to 29. And we use 7,418 movies with rating that is not fewer than 7.0. In order to experiment, we provide Top-5, Top-10 and Top-20 recommended movies to user, and then users choose interested movies. The result of experiment is that average number of to choose interested movie are 2.1 in Top-5, 3.35 in Top-10, 6.35 in Top-20. It is better than results that are yielded by for each metadata.

An Analysis Method of User Preference by using Web Usage Data in User Device (사용자 기기에서 이용한 웹 데이터 분석을 통한 사용자 취향 분석 방법)

  • Lee, Seung-Hwa;Choi, Hyoung-Kee;Lee, Eun-Seok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.3
    • /
    • pp.189-199
    • /
    • 2009
  • The amount of information on the Web is explosively growing as the Internet gains in popularity. However, only a small portion of the information on the Web is truly relevant or useful to the user. Thus, offering suitable information according to user demand is an important subject in information retrieval. In e-commerce, the recommender system is essential to revitalize commercial transactions, raise user satisfaction and loyalty towards the information provider. The existing recommender systems are mostly based on user data collected at servers, so user data are dispersed over several servers. Therefore, web servers that lack sufficient user behavior data cannot easily infer user preferences. Also, if the user visits the server infrequently, it may be hard to reflect the dynamically changing user's interest. This paper proposes a novel personalization system analyzing the user preference based on web documents that are accessed by the user on a user device. The system also identifies non-content blocks appearing repeatedly in the dynamically generated web documents, and adds weight to the keywords extracted from the hyperlink sentence selected by the user. Therefore, the system establishes at an early stage recommendation strategies for the web server that has little user data. Also, user profiles are generated rapidly and more accurately by identifying the information blocks. In order to evaluate the proposed system, this study collected web data and purchase history from users who have current purchase activity. Then, we computed the similarity between purchase data and the user profile. We confirm the accuracy of the generated user profile since the web page containing the purchased item has higher correlation than other item pages.

Sell-sumer: The New Typology of Influencers and Sales Strategy in Social Media (셀슈머(Sell-sumer)로 진화한 인플루언서의 새로운 유형과 소셜미디어에서의 세일즈 전략)

  • Shin, Hajin;Kim, Sulim;Hong, Manny;Hwang, Bom Nym;Yang, Hee-Dong
    • Knowledge Management Research
    • /
    • v.22 no.4
    • /
    • pp.217-235
    • /
    • 2021
  • As 49% of the world's population uses social media platforms, communication and content sharing within social media are becoming more active than ever. In this environmental base, the one-person media market grew rapidly and formed public opinion, creating a new trend called sell-sumer. This study defined new types of influencers by product category by analyzing the subject concentration of the commercial/non-commercial keywords of influencers and the impact of the ratio of commercial postings on sales. It is hoped that influencers working within social media will be helpful to new sales strategies that are transformed into sell-sumers. The method of this study classifies influencers' commercial/non-commercial posts using Python, performs text mining using KoNLPy, and calculates similarity between FastText-based words. As a result, it has been confirmed that the higher the keyword theme concentration of the influencer's commercial posting, the higher the sales. In addition, it was confirmed through the cluster analysis that the influencer types for each product category were classified into four types and that there was a significant difference between groups according to sales. In other words, the implications of this study may suggest empirical solutions of social media sales strategies for influencers working on social media and marketers who want to use them as marketing tools.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Professional Baseball Viewing Culture Survey According to Corona 19 using Social Network Big Data (소셜네트워크 빅데이터를 활용한 코로나 19에 따른 프로야구 관람문화조사)

    • Kim, Gi-Tak
      • Journal of Korea Entertainment Industry Association
      • /
      • v.14 no.6
      • /
      • pp.139-150
      • /
      • 2020
    • The data processing of this study focuses on the textom and social media words about three areas: 'Corona 19 and professional baseball', 'Corona 19 and professional baseball', and 'Corona 19 and professional sports' The data was collected and refined in a web environment and then processed in batch, and the Ucinet6 program was used to visualize it. Specifically, the web environment was collected using Naver, Daum, and Google's channels, and was summarized into 30 words through expert meetings among the extracted words and used in the final study. 30 extracted words were visualized through a matrix, and a CONCOR analysis was performed to identify clusters of similarity and commonality of words. As a result of analysis, the clusters related to Corona 19 and Pro Baseball were composed of one central cluster and five peripheral clusters, and it was found that the contents related to the opening of professional baseball according to the corona 19 wave were mainly searched. The cluster related to Corona 19 and unrelated to professional baseball consisted of one central cluster and five peripheral clusters, and it was found that the keyword of the position of professional baseball related to the professional baseball game according to Corona 19 was mainly searched. Corona 19 and the cluster related to professional sports consisted of one central cluster and five peripheral clusters, and it was found that the keywords related to the start of professional sports according to the aftermath of Corona 19 were mainly searched.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.