• Title/Summary/Keyword: Keywords Extraction

Search Result 139, Processing Time 0.038 seconds

Keyword Extraction in Korean Using Unsupervised Learning Method (비감독 학습 기법에 의한 한국어의 키워드 추출)

  • Shin, Seong-Yoon;Rhee, Yang-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.6
    • /
    • pp.1403-1408
    • /
    • 2010
  • Korean information retrieval uses noun as index terms or keywords of representing the document. and noun and keyword extraction is to find all nouns presented in the document, In this paper, we proposes the method of keyword extraction using pre-built dictionary. This method reduces the execution time by reducing unnecessary operations. And noun, even large documents without affecting significantly the accuracy, can be extracted. This paper proposed noun extraction method using the appearance characteristics of the noun and keyword extraction method using unsupervised learning techniques.

Keyword Extraction Using Unsupervised Learning Method (비감독 학습 기법에 의한 키워드 추출)

  • Shin, Seong-Yoon;Baek, Jeong-Uk;Rhee, Yang-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.165-166
    • /
    • 2010
  • Noun extraction is to find all nouns presented in the document, Korean information retrieval uses noun as index terms or keywords of representing the document. In this paper, we proposes the method of keyword extraction using pre-built dictionary. This method reduces the execution time by reducing unnecessary operations. And noun, even large documents without affecting significantly the accuracy, can be extracted. This paper proposed noun extraction method using the appearance characteristics of the noun and keyword extraction method using unsupervised learning techniques.

  • PDF

Hierarchical Automatic Classification of News Articles based on Association Rules (연관규칙을 이용한 뉴스기사의 계층적 자동분류기법)

  • Joo, Kil-Hong;Shin, Eun-Young;Lee, Joo-Il;Lee, Won-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.6
    • /
    • pp.730-741
    • /
    • 2011
  • With the development of the internet and computer technology, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The conventional document categorization method used only the keywords of related documents for document classification. However, this paper proposed keyword extraction method of based on association rule. This method extracts a set of related keywords which are involved in document's category and classifies representative keyword by using the classification rule proposed in this paper. In addition, this paper proposed the preprocessing method for efficient keywords creation and predicted the new document's category. We can design the classifier and measure the performance throughout the experiment to increase the profile's classification performance. When predicting the category, substituting all the classification rules one by one is the major reason to decrease the process performance in a profile. Finally, this paper suggested automatically categorizing plan which can be applied to hierarchical category architecture, extended from simple category architecture.

Adaptive Web Search based on User Web Log (사용자 웹 로그를 이용한 적응형 웹 검색)

  • Yoon, Taebok;Lee, Jee-Hyong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.11
    • /
    • pp.6856-6862
    • /
    • 2014
  • Web usage mining is a method to extract meaningful patterns based on the web users' log data. Most existing patterns of web usage mining, however, do not consider the users' diverse inclination but create general models. Web users' keywords can have a variety of meanings regarding their tendency and background knowledge. This study evaluated the extraction web-user's pattern after collecting and analyzing the web usage information on the users' keywords of interest. Web-user's pattern can supply a web page network with various inclination information based on the users' keywords of interest. In addition, the Web-user's pattern can be used to recommend the most appropriate web pages and the suggested method of this experiment was confirmed to be useful.

Performance Evaluation of the Extractiojn Method of Representative Keywords by Fuzzy Inference (퍼지추론 기반 대표 키워드 추출방법의 성능 평가)

  • Rho Sun-Ok;Kim Byeong Man;Oh Sang Yeop;Lee Hyun Ah
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.10 no.1
    • /
    • pp.28-37
    • /
    • 2005
  • In our previous works, we suggested a method that extracts representative keywords from a few positive documents and assigns weights to them. To show the usefulness of the method, in this paper, we evaluate the performance of a famous classification algorithm called GIS(Generalized Instance Set) when it is combined with our method. In GIS algorithm, generalized instances are built from learning documents by a generalization function and then the K-NN algorithm is applied to them. Here, our method is used as a generalization function. For comparative works, Rocchio and Widrow-Hoff algorithms are also used as a generalization function. Experimental results show that our method is better than the others for the case that only positive documents are considered, but not when negative documents are considered together.

  • PDF

Representative Keyword Extraction from Few Documents through Fuzzy Inference (퍼지추론을 이용한 소수 문서의 대표 키워드 추출)

  • 노순억;김병만;허남철
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.9
    • /
    • pp.837-843
    • /
    • 2001
  • In this work, we propose a new method of extracting and weighting representative keywords(RKs) from a few documents that might interest a user. In order to extract RKs, we first extract candidate terms and them choose a number of terms called initial representative keywords (IRKs) from them through fuzzy inference. Then, by expanding and reweighting IRKs using term co-occurrence similarity, the final RKs are obtained. Performance of our approach is heavily influenced by effectiveness of selection method of IRKs so that we choose fuzzy inference because it is more effective in handling the uncertainty inherent in selecting representative keywords of documents. The problem addressed in this paper can be viewed as the one of calculating center of document vectors. So, to show the usefulness of our approach, we compare with two famous methods - Rocchio and Widrow-Hoff - on a number of documents collections. The result show that our approach outperforms the other approaches.

  • PDF

Hot Keyword Extraction of Sci-tech Periodicals Based on the Improved BERT Model

  • Liu, Bing;Lv, Zhijun;Zhu, Nan;Chang, Dongyu;Lu, Mengxin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1800-1817
    • /
    • 2022
  • With the development of the economy and the improvement of living standards, the hot issues in the subject area have become the main research direction, and the mining of the hot issues in the subject currently has problems such as a large amount of data and a complex algorithm structure. Therefore, in response to this problem, this study proposes a method for extracting hot keywords in scientific journals based on the improved BERT model.It can also provide reference for researchers,and the research method improves the overall similarity measure of the ensemble,introducing compound keyword word density, combining word segmentation, word sense set distance, and density clustering to construct an improved BERT framework, establish a composite keyword heat analysis model based on I-BERT framework.Taking the 14420 articles published in 21 kinds of social science management periodicals collected by CNKI(China National Knowledge Infrastructure) in 2017-2019 as the experimental data, the superiority of the proposed method is verified by the data of word spacing, class spacing, extraction accuracy and recall of hot keywords. In the experimental process of this research, it can be found that the method proposed in this paper has a higher accuracy than other methods in extracting hot keywords, which can ensure the timeliness and accuracy of scientific journals in capturing hot topics in the discipline, and finally pass Use information technology to master popular key words.

Proposal of keyword extraction method based on morphological analysis and PageRank in Tweeter (트위터에서 형태소 분석과 PageRank 기반 화제단어 추출 방법 제안)

  • Lee, Won-Hyung;Cho, Sung-Il;Kim, Dong-Hoi
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.157-163
    • /
    • 2018
  • People who use SNS publish their diverse ideas on SNS every day. The data posted on the SNS contains many people's thoughts and opinions. In particular, popular keywords served on Twitter compile the number of frequently appearing words in user posts and rank them. However, this method is sensitive to unnecessary data simply by listing duplicate words. The proposed method determines the ranking based on the topic of the word using the relationship diagram between words, so that the influence of unnecessary data is less and the main word can be stably extracted. For the performance comparison in terms of the descending keyword rank and the ratios of meaningless keywords among high rank 20 keywords, we make a comparison between the proposed scheme which is based on morphological analysis and PageRank, and the existing scheme which is based on the number of appearances. As a result, the proposed scheme and the existing scheme have included 55% and 70% of meaningless keywords among high rank 20 keywords, respectively, where the proposed scheme is improved about 15% compared with the existing scheme.

A Keyword Network Analysis on Research Trends in the Area of Health Insurance (건강보험 연구동향에 대한 키워드 네트워크 분석)

  • Lee, Su Jung;Lee, Sun-Hee
    • Health Policy and Management
    • /
    • v.31 no.3
    • /
    • pp.335-343
    • /
    • 2021
  • Background: The purpose of this study was to extract the major areas of interest in health insurance research in Korea, and infer policy agendas related to health insurance by analyzing research keywords. Methods: For this study, 2,590 articles were selected from among 7,459 academic papers related to health insurance published between January 1987 and December 2018, which were looked up using the Research Information Sharing Service (RISS). Keyword extraction and keyword network analysis were performed using the KrKwic, KrTitle, and UCINET software. Results: First, the number of studies in the area of health insurance continued to increase in all government terms, and it was not until after the 2000s that the subjects of health insurance researches were diversified. Second, degree centrality showed that 'medical expenditure' and 'medical utilization' were consistently high-ranking keywords regardless of the government in power. Aging and long-term care insurance-related keywords were ranked higher in the Lee Myung-bak government, Park Geun-hye government, and Moon Jae-in government. Third, betweenness centrality showed the same high ranking in key topics such as medical expenditure and medical utilization, while the ranking of key keywords differed depending on the interests and characteristics of each government policy. Conclusion: We confirm that health insurance as a research topic has been the main theme in Korean health care research fields. Research keywords extracted from articles also corresponded to the main health policies promoted during each government period. Efforts to systematically investigate policy megatrends are needed to plan adaptive future policies.

Comparative Study of Keyword Extraction Models in Biomedical Domain (생의학 분야 키워드 추출 모델에 대한 비교 연구)

  • Donghee Lee;Soonchan Kwon;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.24 no.4
    • /
    • pp.77-84
    • /
    • 2023
  • Given the growing volume of biomedical papers, the ability to efficiently extract keywords has become crucial for accessing and responding to important information in the literature. In this study, we conduct a comprehensive evaluation of different unsupervised learning-based models and BERT-based models for keyword extraction in the biomedical field. Our experimental findings reveal that the BioBERT model, trained on biomedical-specific data, achieves the highest performance. This study offers precise and dependable insights to guide forthcoming research in biomedical keyword extraction. By establishing a well-suited experimental framework and conducting thorough comparisons and analyses of diverse models, we have furnished essential information. Furthermore, we anticipate extending our contributions to other domains by providing comparative experiments and practical guidelines for effective keyword extraction.