• Title/Summary/Keyword: Search Keyword Extraction

Search Result 43, Processing Time 0.024 seconds

Design of Keyword Extraction System Using TFIDF (TFIDF를 이용한 키워드 추출 시스템 설계)

  • 이말례;배환국
    • Korean Journal of Cognitive Science
    • /
    • v.13 no.1
    • /
    • pp.1-11
    • /
    • 2002
  • In this paper, a test was performed to determine whether words in Anchor Text were appropriate as key words. As a result of the test. there were proper words of high weighting factor, while some others did not even appear in the text. therefore, were not appropriate as key words. In order to resolve this problem. a new method was proposed to extract key words. Using the proposed method, inappropriate key words can be removed so that new key words be set, and then, ranking becomes possible with the TFIDF value as a weighting factor of the key word. It was verified that the new method has higher accuracy compared to the previous methods.

  • PDF

Keyword Extraction Using Modifying Relation to Improve Search Experience (수식 관계를 이용한 키워드 추출을 통한 검색 과정의 효율성 향상)

  • Moon, Uk-Seong;Lee, Sheen-Mok
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10c
    • /
    • pp.228-232
    • /
    • 2007
  • 정보화 시대에 방대한 양의 정보에서 필요한 정보를 효율적으로 찾아내는 것은 그 무엇보다도 중요하다. 이를 위해 많은 검색 엔진이 효율적인 검색 결과 제공을 위해 노력하고 있지만 그 인터페이스의 문제로 인하여 사용자가 검색결과를 효율적으로 받아들이기 어려우며 또한 원하는 정보를 검색하기 위해서는 일정 수준 이상의 검색 능력을 필요로 한다. 이 논문에서는 기존의 검색 엔진의 인터페이스 변경을 통하여 시각적인 연관성 정보를 제공하며 이를 통해 사용자가 검색 능력에 구애받지 않고 정확한 답을 얻을 수 있도록 유도한다. 또한 이 과정에서 기존의 키워드 추출 알고리즘의 문제점을 발견하여 이를 단어간의 수식 관계를 이용하여 해결하였다. 또한 단어간의 수식 관계를 이용하여 효율적으로 문서간의 연관성을 생성할 수 있는 알고리즘을 제시하였다.

  • PDF

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

Analysis of Major Changes in Press Articles Related to 'High School Credit System'

  • Kwon, Choong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.183-191
    • /
    • 2020
  • The purpose of this study is to objectively analyze the trend of media articles related to the 'high school credit system' (2017~2019: 3 years), which has become the biggest concern among Korean education policies, through BIGKinds, a news data big data analysis service for media companies. The main research methodologies were BIGKinds system's specific search term news search, news trend analysis, keyword extraction and wordcloud implementation, network analysis and network picture presentation. The research results are as follows; First, the number of articles related to the high school credit system that appeared in major media outlets in Korea for 3 years from 2017 to 2019 was 3,649. The number of articles was sharply increased at a certain point about 4 times, based on the government's announcement of related policies. It showed an increasing news trend. Second, the top 20 keywords that emerged from the press articles related to the high school credit system for 3 years of analysis were presented, and it was confirmed that the keyword change by year appeared. Third, the network of media articles related to the high school credit system was visualized and presented in different ways by person, institution, and keyword. The results of this study confirmed that the high school credit system education policy was adopted as the representative education policy of the Moon Jae-in government, and is proceeding in the policy decision stage and policy implementation stage.

Development and Performance Analysis of a Cultural Heritage Search Application Utilizing Image Recognition (이미지 인식을 활용한 문화유산 검색 어플리케이션 개발)

  • Hyun-Ji Kim;Tae-Hyun Shin;Hyun-Bin Jeong;Da-Hyun Kim;Jai-Soon Baek;Yong-Han Yu;Sung-Jin Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.181-183
    • /
    • 2024
  • 본 논문은 이미지 인식, 지도 기반 검색, 그리고 키워드 검색을 활용한 문화유산 검색 어플리케이션의 개발과 성능 분석에 대한 연구를 다룬다. 우리는 이러한 다양한 기술과 기능을 결합하여 사용자에게 맞춤형 문화유산 정보를 제공하는 어플리케이션을 설계하고 구현하였다. 더불어, 어플리케이션의 성능을 평가하고 향상시키기 위한 실험과 분석을 수행하였다. 연구 결과, 이미지 인식 및 지도 기반 검색을 활용한 어플리케이션은 문화유산 관련 정보를 빠르고 정확하게 제공함으로써 사용자의 경험을 향상시킬 수 있음을 확인하였다. 이러한 연구는 문화유산 검색 어플리케이션의 개발과 성능 향상을 위한 중요한 기여를 제공할 것으로 기대된다.

  • PDF

Document Analysis based Main Requisite Extraction System (문서 분석 기반 주요 요소 추출 시스템)

  • Lee, Jongwon;Yeo, Ilyeon;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.4
    • /
    • pp.401-406
    • /
    • 2019
  • In this paper, we propose a system for analyzing documents in XML format and in reports. The system extracts the paper or reports of keywords, shows them to the user, and then extracts the paragraphs containing the keywords by inputting the keywords that the user wants to search within the document. The system checks the frequency of keywords entered by the user, calculates weights, and removes paragraphs containing only keywords with the lowest weight. Also, we divide the refined paragraphs into 10 regions, calculate the importance of the paragraphs per region, compare the importance of each region, and inform the user of the main region having the highest importance. With these features, the proposed system can provide the main paragraphs with higher compression ratio than analyzing the papers or reports using the existing document analysis system. This will reduce the time required to understand the document.

A Study on Contents-based Retrieval using Wavelet (Wavelet을 이용한 내용기반 검색에 관한 연구)

  • 강진석;박재필;나인호;최연성;김장형
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.5
    • /
    • pp.1051-1066
    • /
    • 2000
  • According to the recent advances of digital encoding technologies and computing power, large amounts of multimedia informations such as image, graphic, audio and video are fully used in multimedia systems through Internet. By this, diverse retrieval mechanisms are required for users to search dedicated informations stored in multimedia systems, and especially it is preferred to use contents-based retrieval method rather than text-type keyword retrieval method. In this paper, we propose a new contents-based indexing and searching algorithm which aims to get both high efficiency and high retrieval performance. To achieve these objectives, firstly the proposed algorithm classifies images by a pre-processing process of edge extraction, range division, and multiple filtering, and secondly it searches the target images using spatial and textural characteristics of colors, which are extracted from the previous process, in a image. In addition, we describe the simulation results of search requests and retrieval outputs for several images of company's trade-mark using the proposed contents-based retrieval algorithm based on wavelet.

  • PDF

A Proposal of Methods for Extracting Temporal Information of History-related Web Document based on Historical Objects Using Machine Learning Techniques (역사객체 기반의 기계학습 기법을 활용한 웹 문서의 시간정보 추출 방안 제안)

  • Lee, Jun;KWON, YongJin
    • Journal of Internet Computing and Services
    • /
    • v.16 no.4
    • /
    • pp.39-50
    • /
    • 2015
  • In information retrieval process through search engine, some users want to retrieve several documents that are corresponding with specific time period situation. For example, if user wants to search a document that contains the situation before 'Japanese invasions of Korea era', he may use the keyword 'Japanese invasions of Korea' by using searching query. Then, search engine gives all of documents about 'Japanese invasions of Korea' disregarding time period in order. It makes user to do an additional work. In addition, a large percentage of cases which is related to historical documents have different time period between generation date of a document and record time of contents. If time period in document contents can be extracted, it may facilitate effective information for retrieval and various applications. Consequently, we pursue a research extracting time period of Joseon era's historical documents by using historic literature for Joseon era in order to deduct the time period corresponding with document content in this paper. We define historical objects based on historic literature that was collected from web and confirm a possibility of extracting time period of web document by machine learning techniques. In addition to the machine learning techniques, we propose and apply the similarity filtering based on the comparison between the historical objects. Finally, we'll evaluate the result of temporal indexing accuracy and improvement.

Web Image Retrieval using Prior Tags based on WordNet Semantic Information (워드넷 의미정보로 선별된 우선 태그와 이를 이용한 웹 이미지의 검색)

  • Kweon, Dae-Hyeon;Hong, Jun-Hyeok;Cho, Soo-Sun
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.7
    • /
    • pp.1032-1042
    • /
    • 2009
  • This research is for early extraction and utilization of semantic information from the tags in tagged Web image retrieval. Generally, users attach a tag to a Web image with little thought of the order, up to over 100 ones. In this paper, we suggest a method of selecting prior tags based on their importance when tagged images are uploaded, and using them in image retrieval. Ideas came from the recognition of the important tags which give a better description of the image as the tags sharing more semantic information with other tags of the same image. This method includes calculation of relation scores between tags based on WordNet and multilevel search of tagged images with the scores. For evaluation, we compared the suggested method and other retrieval methods searching images with simple matching of tags to a given keyword. As the results, we found the superiority of our method in precision and recall rate.

  • PDF

Review on Studies of Korean Medicine about Tinea Pedis (족부백선의 한의학 논문에 대한 고찰)

  • Park, Sun-Yeong;Seo, Hyung-Sik
    • The Journal of Korean Medicine Ophthalmology and Otolaryngology and Dermatology
    • /
    • v.29 no.3
    • /
    • pp.42-49
    • /
    • 2016
  • Objectives : The purpose of this study is to analyze research trends on tinea pedis in studies of Korean medicine.Methods : We searched papers using NDSL, KISS, RISS and KTKP(Korean Traditional Knowledge Portal). The first search used the keyword "Tinea pedis" in NDSL, KISS, RISS and KTKP. Used searching duration was not specified.Results : Studies found in NDSL, KISS and RISS were 122 and 118 studies were excluded. Studies found in KTKP were five papers and four studies of them were excluded. Finally five studies were selected and analyzed. Two studies of five selected ones were experimental researches and three studies were clinical researches. Among 2 researches of experimental researches, one of them was about antifungal efficacy of herbal medicines and ethahol extract of the mixture of Sophorae Subprostratae Radix, Aconiti Radix and Hibisci Syriaci Cortex and hot water extract of Phellodendri Cortex were effective. The other was about antifungal effect of the medicinal herb extraction method and vinegar extract was effective. Among 3 researches of clinical researches, there were one clinical study and two case studies. Functional soap containing herbal medicines and bee venom therapy were effective.Conclusions : As we looked for five researches, which were two experimental studies, one clinical study and two case studies. It is possible to treat tinea pedis with korean medical approach by conclusions of 5 researches. We expect that further researches will be proceeded and following results can be actively used as clinical treatments.