• Title/Summary/Keyword: 유사 키워드

Search Result 311, Processing Time 0.03 seconds

A Study on Research Paper Classification Using Keyword Clustering (키워드 군집화를 이용한 연구 논문 분류에 관한 연구)

  • Lee, Yun-Soo;Pheaktra, They;Lee, JongHyuk;Gil, Joon-Min
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.12
    • /
    • pp.477-484
    • /
    • 2018
  • Due to the advancement of computer and information technologies, numerous papers have been published. As new research fields continue to be created, users have a lot of trouble finding and categorizing their interesting papers. In order to alleviate users' this difficulty, this paper presents a method of grouping similar papers and clustering them. The presented method extracts primary keywords from the abstracts of each paper by using TF-IDF. Based on TF-IDF values extracted using K-means clustering algorithm, our method clusters papers to the ones that have similar contents. To demonstrate the practicality of the proposed method, we use paper data in FGCS journal as actual data. Based on these data, we derive the number of clusters using Elbow scheme and show clustering performance using Silhouette scheme.

Related Documents Classification System by Similarity between Documents (문서 유사도를 통한 관련 문서 분류 시스템 연구)

  • Jeong, Jisoo;Jee, Minkyu;Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.77-86
    • /
    • 2019
  • This paper proposes using machine-learning technology to analyze and classify historical collected documents based on them. Data is collected based on keywords associated with a specific domain and the non-conceptuals such as special characters are removed. Then, tag each word of the document collected using a Korean-language morpheme analyzer with its nouns, verbs, and sentences. Embedded documents using Doc2Vec model that converts documents into vectors. Measure the similarity between documents through the embedded model and learn the document classifier using the machine running algorithm. The highest performance support vector machine measured 0.83 of F1-score as a result of comparing the classification model learned.

A Study on the Method of Scholarly Paper Recommendation Using Multidimensional Metadata Space (다차원 메타데이터 공간을 활용한 학술 문헌 추천기법 연구)

  • Miah Kam;Jee Yeon Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.121-148
    • /
    • 2023
  • The purpose of this study is to propose a scholarly paper recommendation system based on metadata attribute similarity with excellent performance. This study suggests a scholarly paper recommendation method that combines techniques from two sub-fields of Library and Information Science, namely metadata use in Information Organization and co-citation analysis, author bibliographic coupling, co-occurrence frequency, and cosine similarity in Bibliometrics. To conduct experiments, a total of 9,643 paper metadata related to "inequality" and "divide" were collected and refined to derive relative coordinate values between author, keyword, and title attributes using cosine similarity. The study then conducted experiments to select weight conditions and dimension numbers that resulted in a good performance. The results were presented and evaluated by users, and based on this, the study conducted discussions centered on the research questions through reference node and recommendation combination characteristic analysis, conjoint analysis, and results from comparative analysis. Overall, the study showed that the performance was excellent when author-related attributes were used alone or in combination with title-related attributes. If the technique proposed in this study is utilized and a wide range of samples are secured, it could help improve the performance of recommendation techniques not only in the field of literature recommendation in information services but also in various other fields in society.

A Study on the Search Behavior of Digital Library Users: Focus on the Network Analysis of Search Log Data (디지털 도서관 이용자의 검색행태 연구 - 검색 로그 데이터의 네트워크 분석을 중심으로 -)

  • Lee, Soo-Sang;Wei, Cheng-Guang
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.4
    • /
    • pp.139-158
    • /
    • 2009
  • This paper used the network analysis method to analyse a variety of attributes of searcher's search behaviors which was appeared on search access log data. The results of this research are as follows. First, the structure of network represented depending on the similarity of the query that user had inputed. Second, we can find out the particular searchers who occupied in the central position in the network. Third, it showed that some query were shared with ego-searcher and alter searchers. Fourth, the total number of searchers can be divided into some sub-groups through the clustering analysis. The study reveals a new recommendation algorithm of associated searchers and search query through the social network analysis, and it will be capable of utilization.

  • PDF

Design and Implemantation of Information Retrieval System based on Semantic Information (의미정보기반 검색시스템의 설계 및 구현)

  • Park, Chang-Keun;Yang, Gi-Chul
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2004.11a
    • /
    • pp.265-268
    • /
    • 2004
  • Keyword matching technique which is used in most information retrieval systems is unfit for efficient processing of geometrically increasing information. The problem can be solved by using semantic information and an efficient method of semantic processing is introduced in this paper. The technique uses conceptual graph to represent the semantic information and apply it for information retrieval. The implemented system can perform exact matching and partial matching. Partial matching has two different types. One is syntactic partial matching and the other is semantic partial matching. The semantic semilaries are measured by the subclass relations in the ontology. The introduced technique can be used not only information retrieval but also in various applications such as an implementation of dynamic hyperlinks.

  • PDF

Method of Related Document Recommendation with Similarity and Weight of Keyword (키워드의 유사도와 가중치를 적용한 연관 문서 추천 방법)

  • Lim, Myung Jin;Kim, Jae Hyun;Shin, Ju Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1313-1323
    • /
    • 2019
  • With the development of the Internet and the increase of smart phones, various services considering user convenience are increasing, so that users can check news in real time anytime and anywhere. However, online news is categorized by media and category, and it provides only a few related search terms, making it difficult to find related news related to keywords. In order to solve this problem, we propose a method to recommend related documents more accurately by applying Doc2Vec similarity to the specific keywords of news articles and weighting the title and contents of news articles. We collect news articles from Naver politics category by web crawling in Java environment, preprocess them, extract topics using LDA modeling, and find similarities using Doc2Vec. To supplement Doc2Vec, we apply TF-IDF to obtain TC(Title Contents) weights for the title and contents of news articles. Then we combine Doc2Vec similarity and TC weight to generate TC weight-similarity and evaluate the similarity between words using PMI technique to confirm the keyword association.

Design and Implementation of a Efficient Search Engine Using Collaborative Filtering (협업 필터링을 이용한 효율적인 검색 엔진의 설계 및 구현)

  • Lee, Ki-Young;Seo, Il-Hee;Lim, Myung-Jae;Kim, Kyu-Ho;Kim, Jeong-Lae
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.23-28
    • /
    • 2012
  • Recently, due to the increasing demand for mobile devices, mobile searching market is rapidly growing. However, there is the limit of screen size, when searching for mobile devices, various results should be shown at a glance. The reason is that results are important given that up to 43 percent of people tend to check only first page. In this paper, a set of keywords for searching will be used to find out the users' interests. Users were divided into groups after going through Collaboration filtering. Therefore, the result of this experiment, reduced time for searching and improved quality of searching were confirmed.

Developing a Medical Image Retrieval System Based on MPEG-7 (MPEG-7 기반의 의료영상 검색시스템 개발)

  • Joo Kyung-Soo;Ko Young-Seung
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.8
    • /
    • pp.1032-1041
    • /
    • 2005
  • Now a days, PACS and the other image sharing systems use only high-level metadata for hospital to retrieve images. So if you want to retrieve some images, you have to know exact information about the patient. In this paper, we developed a Image Retrieval System based on MPEG-7 to retrieve medical images more efficiently. This system offers keyword retrieval using high-level metadata based of DICOM and similarity retrieval using low-level metadata based on MFEG-7. And we integrated high-level metadata and low-level metadata to retrieve medical images more exactly.

  • PDF

A performance improvement methodology of web document clustering using FDC-TCT (FDC-TCT를 이용한 웹 문서 클러스터링 성능 개선 기법)

  • Ko, Suc-Bum;Youn, Sung-Dae
    • The KIPS Transactions:PartD
    • /
    • v.12D no.4 s.100
    • /
    • pp.637-646
    • /
    • 2005
  • There are various problems while applying classification or clustering algorithm in that document classification which requires post processing or classification after getting as a web search result due to my keyword. Among those, two problems are severe. The first problem is the need to categorize the document with the help of the expert. And, the second problem is the long processing time the document classification takes. Therefore we propose a new method of web document clustering which can dramatically decrease the number of times to calculate a document similarity using the Transitive Closure Tree(TCT) and which is able to speed up the processing without loosing the precision. We also compare the effectivity of the proposed method with those existing algorithms and present the experimental results.

XML-based Modeling for Semantic Retrieval of Syslog Data (Syslog 데이터의 의미론적 검색을 위한 XML 기반의 모델링)

  • Lee Seok-Joon;Shin Dong-Cheon;Park Sei-Kwon
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.147-156
    • /
    • 2006
  • Event logging plays increasingly an important role in system and network management, and syslog is a de-facto standard for logging system events. However, due to the semi-structured features of Common Log Format data most studies on log analysis focus on the frequent patterns. The extensible Markup Language can provide a nice representation scheme for structure and search of formatted data found in syslog messages. However, previous XML-formatted schemes and applications for system logging are not suitable for semantic approach such as ranking based search or similarity measurement for log data. In this paper, based on ranked keyword search techniques over XML document, we propose an XML tree structure through a new data modeling approach for syslog data. Finally, we show suitability of proposed structure for semantic retrieval.