• 제목/요약/키워드: Keywords Similarity

검색결과 90건 처리시간 0.022초

키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법 (A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model)

  • 조원진;노상규;윤지영;박진수
    • Asia pacific journal of information systems
    • /
    • 제21권1호
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

네트워크 분석을 통한 암 생존자 지식구조 연구 (A Study on the Knowledge Structure of Cancer Survivors based on Social Network Analysis)

  • 권선영;배가령
    • 대한간호학회지
    • /
    • 제46권1호
    • /
    • pp.50-58
    • /
    • 2016
  • Purpose: The purpose of this study was to identify the knowledge structure of cancer survivors. Methods: For data, 1099 articles were collected, with 365 keywords as a Noun phrase extracted from the articles and standardized for analyzing. Co-occurrence matrix were generated via a cosine similarity measure, and then the network analysis and visualization using PFNet and NodeXL were applied to visualize intellectual interchanges among keywords. Results: According to the result of the content analysis and the cluster analysis of author keywords from cancer survivors articles, keywords such as 'quality of life', 'breast neoplasms', 'cancer survivors', 'neoplasms', 'exercise' had a high degree centrality. The 9 most important research topics concerning cancer survivors were 'cancer-related symptoms and nursing', 'cancer treatment-related issues', 'late effects', 'psychosocial issues', 'healthy living managements', 'social supports', 'palliative cares', 'research methodology', and 'research participants'. Conclusion: Through this study, the knowledge structure of cancer survivors was identified. The 9 topics identified in this study can provide useful research direction for the development of nursing in cancer survivor research areas. The Network analysis used in this study will be useful for identifying the knowledge structure and identifying general views and current cancer survivor research trends.

감정 기반 키워드 속성값 산출에 따른 글꼴 추천 서비스 (Font Recommendation Service Based on Emotion Keyword Attribute Value Estimation)

  • 지영서;임순범
    • 한국멀티미디어학회논문지
    • /
    • 제25권8호
    • /
    • pp.999-1006
    • /
    • 2022
  • The use of appropriate fonts is not only an aesthetic point of view, but also a factor influencing the reinforcement of meaning. However, it is a difficult process and wastes a lot of time for general users to choose a font that suits their needs and emotions. Therefore, in this study, keywords and fonts to be used in the experiment were selected for emotion-based font recommendation, and keyword values for each font were calculated through an experiment to check the correlation between keywords and fonts. Using the experimental results, a prototype of a keyword-based font recommendation system was designed and the possibility of the system was tested. As a result of the usability evaluation of the font recommendation system prototype, it received a positive evaluation compared to the existing font search system, but the number of fonts was limited and users had difficulties in the process of associating keywords suitable for their desired situation. Therefore, we plan to expand the number of fonts and conduct follow-up research to automatically recommend fonts suitable for the user's situation without selecting keywords.

연관 웹 페이지 검색을 위한 e-아크 랭킹 메저 (e-Cohesive Keyword based Arc Ranking Measure for Web Navigation)

  • 이우기;이병수
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제36권1호
    • /
    • pp.22-29
    • /
    • 2009
  • 웹은 사용자에게 제품이나 정보를 제공할 수 있는 가장 커다란 매체로 성장하였으며, 또한 사용자에게는 필요 이상의 정보를 얻게 해주고 있다. 웹은 다량의 관련 정보들을 여러 웹 페이지들을 통해 표현하고 있으며, 현재 검색엔진들은 키워드들에 관련된 단일 페이지들만을 리스트화하여 보여주고 있다. 근본적으로 이러한 방법들로는 관련된 정보를 가지고 있는 페이지들의 쌍 및 연관된 뭔 페이지들의 집합을 구조화하여 제공할 수 없다. 웹은 하나의 웹 페이지에 모든 관련 정보를 담는 범위를 넘어 관련된 정보 페이지들을 하이퍼링크로 서로 연결한 일련의 정보로 인식되고 있다. 따라서 본 논문에서는 새로운 링크 가중치 기반 검색 기법으로서 e-아크 메저에 관하여 제안하고자 하며, 이는 사용자가 입력한 키워드들과 관련된 페이지의 집합을 웹 사이트 안에서 찾아내는 연관 검색에 효과적이라는 것을 보이고, 실험을 통해 기존의 메저들 보다 그 효과성을 우월하다는 점을 입증하였다.

관계형 데이터베이스에서의 시맨틱 기반 키워드 탐색 시스템 (Semantic-based Keyword Search System over Relational Database)

  • 양영휴
    • 한국컴퓨터정보학회논문지
    • /
    • 제18권12호
    • /
    • pp.91-101
    • /
    • 2013
  • 키워드의 모호성은 효율적인 키워드 탐색에 있어서 일반적인 이슈가 되어왔는데, 이 모호성은 탐색결과의 신뢰성에 큰 영향을 줄 수 있으며, 기본적으로 질의에 사용된 용어 자체가 가지는 문맥상 의미의 모호함에 기인한다. 질의 자체의 모호함뿐만 아니라, 사용자들이 그 탐색 결과를 적절하게 해석하기 위해 결과에 나타나는 키워드간의 관계도 중요하므로 명확하게 명시 되어야 한다. 이 논문에서는 기존의 질의 용어와 스키마 용어/인스턴스간의 키워드 매핑기법을 적용하여 키워드 탐색의 모호성을 해결한다. 용어간의 매핑에서는 질의 키워드와 스키마 용어간의 구문적 유사성은 물론 시맨틱 유사성까지 고려하기 때문에 기존의 시스템에 비해 매핑과 정밀도가 50% 이상 상승하는 결과를 얻을 수 있다. 탐색결과에 나타나는 용어간의 불분명한 관계를 점 더 명확하게 나타내기 위하여 시맨틱 웹 기술을 적용하여 키워드간의 의미 있는 관계를 더 많이 지식베이스 내에서 찾을 수 있도록 하였다.

Hot Keyword Extraction of Sci-tech Periodicals Based on the Improved BERT Model

  • Liu, Bing;Lv, Zhijun;Zhu, Nan;Chang, Dongyu;Lu, Mengxin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권6호
    • /
    • pp.1800-1817
    • /
    • 2022
  • With the development of the economy and the improvement of living standards, the hot issues in the subject area have become the main research direction, and the mining of the hot issues in the subject currently has problems such as a large amount of data and a complex algorithm structure. Therefore, in response to this problem, this study proposes a method for extracting hot keywords in scientific journals based on the improved BERT model.It can also provide reference for researchers,and the research method improves the overall similarity measure of the ensemble,introducing compound keyword word density, combining word segmentation, word sense set distance, and density clustering to construct an improved BERT framework, establish a composite keyword heat analysis model based on I-BERT framework.Taking the 14420 articles published in 21 kinds of social science management periodicals collected by CNKI(China National Knowledge Infrastructure) in 2017-2019 as the experimental data, the superiority of the proposed method is verified by the data of word spacing, class spacing, extraction accuracy and recall of hot keywords. In the experimental process of this research, it can be found that the method proposed in this paper has a higher accuracy than other methods in extracting hot keywords, which can ensure the timeliness and accuracy of scientific journals in capturing hot topics in the discipline, and finally pass Use information technology to master popular key words.

온톨로지 매핑 기반 엔지니어링 정보 검색 (Engineering Information Search based on Ontology Mapping)

  • 정민;서효원
    • 한국정밀공학회지
    • /
    • 제23권5호
    • /
    • pp.30-36
    • /
    • 2006
  • The participants in collaborative environment want to get the right information or documents which are intended to find. In general search systems, documents which contain only the keywords are retrieved. For searching different word-expressions for the same meaning, we perform mapping before searching. Our mapping-based search approach has two parts, ontology-based mapping logic and ontology libraries. The ontology-based mapping consists of three steps such as character matching (CM), definition comparing (DC) and similarity checking (SC). First, the character matching is the mapping of two terminologies that have identical character strings. Second, the definition comparing is the method that compares two terminologies' ontological definitions. Third, the similarity checking pairs two terminologies which were not mapped by two prior steps through evaluating the similarity of the ontological definitions. For the ontology libraries, document ontology library (DOL), keyword ontology library (KOL), and mapping result library (MRL) are defined. With these three libraries and three mapping steps, an ontology-based search engine (OntSE) is built, and a use case scenario is discussed to show the applicability.

대화 말뭉치 구축을 위한 반자동 의미표지 태깅 시스템 (A Semi-Automatic Semantic Mark Tagging System for Building Dialogue Corpus)

  • 박준혁;이성욱;임윤섭;최종석
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제8권5호
    • /
    • pp.213-222
    • /
    • 2019
  • 지능형 음성 대화 인터페이스 구현에 있어 핵심어의 의미표지는 사용자 의도 파악을 위한 중요한 요소이다. 대화시스템은 사용자 발화의 의도를 파악하기 위해 핵심어와 그 의미표지를 이용하여 발화의 의도를 결정한다. 하나의 핵심어는 여러 개의 의미표지를 가질 수 있는 중의성을 지닌다. 이러한 중의성을 지닌 핵심어를 사용자의 의도와 일치하는 의미표지로 결정하는 것은 단어 의미 분별 문제와 유사하다. 우리는 전사된 대화 말뭉치의 약 23%를 수동으로 의미를 부착하여 핵심어에 대한 의미표지 사전, 유의어 사전, 문맥벡터 사전을 먼저 구축한 후, 나머지 77% 대화 말뭉치에 존재하는 핵심어의 의미를 자동으로 부착한다. 중의성을 가진 핵심어는 문맥벡터 사전으로부터 문맥 벡터 유사도를 계산하여 의미를 결정한다. 핵심어가 미등록어인 경우에는 유의어 사전을 이용하여 가장 유사한 핵심어를 찾아 그 핵심어의 의미를 부착한다. 중의성을 가진 고빈도 핵심어 3개와 저빈도 핵심어 3개를 말뭉치에서 선정하여 제안 시스템의 성능을 평가하였다. 실험결과, 수동으로 구축한 말뭉치를 사용하였을 때 약 54.4%의 정확도를 얻었고, 반자동으로 확장한 말뭉치를 사용하였을 때 약 50.0%의 정확도를 얻었다.

이동상 하천모형이론의 수립 및 적용 (Study of Similarity Theory of River Models with Movable Beds and its Application.)

  • 서일원;정태성;김영한
    • 한국수자원학회논문집
    • /
    • 제31권5호
    • /
    • pp.575-586
    • /
    • 1998
  • 본연구에서는 Einstein과 Chien(1954)의 이론은 토대로 하여 하천의 모형연구에 적합한 이동상 모형이론을 수립하였다. 흐름의 상사 (${\Delta}F{\Delta}M$)와 유사이동의 상사 (${\Delta}F_s$)의 변화에 따른 총 하상변동량의 거동을 비교함으로써 적용성을 검토하였다. 그 결과 ${\Delta}F{\Delta}M$의 값 또는 ${\Delta}F_s$의 값이 작을 수록 총 사항변동량은 크게 발생하는 것으로 나타났다. 본 연구에서 수립된 모형이론은 각 모형이론의 제한조건을 완화한 것으로서 실험장소 또는 모형사의 제한으로 모형이론을 이상적으로 만족시킬 수 없는 경우에 유용하게 적용할 수 있을 것이다.

  • PDF

네트워크분석을 통한 직업건강간호학회지 논문의 지식구조 분석 (Knowledge Structure of the Korean Journal of Occupational Health Nursing through Network Analysis)

  • 권선영;박은정
    • 한국직업건강간호학회지
    • /
    • 제24권2호
    • /
    • pp.76-85
    • /
    • 2015
  • Purpose: The purpose of this study was to identify knowledge structure of the Korean Journal of Occupational Health Nursing from 1991 to 2014. Methods: 400 articles between 1991 and 2014 were collected. 1,369 keywords as noun phrases were extracted from articles and standardized for analysis. Co-occurrence matrix was generated via a cosine similarity measure, then the network was analyzed and visualized using PFNet. Also NodeXL was applied to visualize intellectual interchanges among keywords. Results: According to the results of the content analysis and the cluster analysis of author keywords from the Korean Journal of Occupational Health Nursing articles, 7 most important research topics of the journal were 'Workers & Work-related Health Problem', 'Recognition & Preventive Health Behaviors', 'Health Promotion & Quality of Life', 'Occupational Health Nursing & Management', 'Clinical Nursing Environment', 'Caregivers and Social Support', and 'Job Satisfaction, Stress & Performance'. Newly emerging topics for 4-year period units were observed as research trends. Conclusion: Through this study, the knowledge structure of the Korean Journal of Occupational Health Nursing was identified. The network analysis of this study will be useful for identifying the knowledge structure as well as finding general view and current research trends. Furthermore, The results of this study could be utilized to seek the research direction in the Korean Journal of Occupational Health Nursing.