• Title/Summary/Keyword: Search Keyword Extraction

Search Result 42, Processing Time 0.042 seconds

An Efficient Web Search Method Based on a Style-based Keyword Extraction and a Keyword Mining Profile (스타일 기반 키워드 추출 및 키워드 마이닝 프로파일 기반 웹 검색 방법)

  • Joo, Kil-Hong;Lee, Jun-Hwl;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1049-1062
    • /
    • 2004
  • With the popularization of a World Wide Web (WWW), the quantity of web information has been increased. Therefore, an efficient searching system is needed to offer the exact result of diverse Information to user. Due to this reason, it is important to extract and analysis of user requirements in the distributed information environment. The conventional searching method used the only keyword for the web searching. However, the searching method proposed in this paper adds the context information of keyword for the effective searching. In addition, this searching method extracts keywords by the new keyword extraction method proposed in this paper and it executes the web searching based on a keyword mining profile generated by the extracted keywords. Unlike the conventional searching method which searched for information by a representative word, this searching method proposed in this paper is much more efficient and exact. This is because this searching method proposed in this paper is searched by the example based query included content information as well as a representative word. Moreover, this searching method makes a domain keyword list in order to perform search quietly. The domain keyword is a representative word of a special domain. The performance of the proposed algorithm is analyzed by a series of experiments to identify its various characteristic.

An Effective Keyword Extraction Method Based on Web Page Structure Analysis for Video Retrieval in WWW (웹 페이지 구조 분석을 통한 효과적인 동영상 검색용 키워드 추출 방법)

  • Lee, Jong-Won;Choi, Gi-Seok;Jang, Ju-Yeon;Nang, Jong-Ho
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.3
    • /
    • pp.103-110
    • /
    • 2008
  • This paper proposes an effective keyword extraction method for the Web videos. The proposed method classifies the Web video pages in one of 4 types. As such, we analyzed the structure of the Web pages based on the number of videos and the layout of the Web pages. And then we applied the keyword extraction algorithm fit to each page type. The experiment with 1,087 Web pages that have total 2,462 videos showed that the recall of the proposed extraction method is 18% higher than ImagerRover[2]. So, the proposed method could be used to build a powerful video search system for WWW.

Text-mining Based Graph Model for Keyword Extraction from Patent Documents (특허 문서로부터 키워드 추출을 위한 위한 텍스트 마이닝 기반 그래프 모델)

  • Lee, Soon Geun;Leem, Young Moon;Um, Wan Sup
    • Journal of the Korea Safety Management & Science
    • /
    • v.17 no.4
    • /
    • pp.335-342
    • /
    • 2015
  • The increasing interests on patents have led many individuals and companies to apply for many patents in various areas. Applied patents are stored in the forms of electronic documents. The search and categorization for these documents are issues of major fields in data mining. Especially, the keyword extraction by which we retrieve the representative keywords is important. Most of techniques for it is based on vector space model. But this model is simply based on frequency of terms in documents, gives them weights based on their frequency and selects the keywords according to the order of weights. However, this model has the limit that it cannot reflect the relations between keywords. This paper proposes the advanced way to extract the more representative keywords by overcoming this limit. In this way, the proposed model firstly prepares the candidate set using the vector model, then makes the graph which represents the relation in the pair of candidate keywords in the set and selects the keywords based on this relationship graph.

XML Document Keyword Weight Analysis based Paragraph Extraction Model (XML 문서 키워드 가중치 분석 기반 문단 추출 모델)

  • Lee, Jongwon;Kang, Inshik;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2133-2138
    • /
    • 2017
  • The analysis of existing XML documents and other documents was centered on words. It can be implemented using a morpheme analyzer, but it can classify many words in the document and cannot grasp the core contents of the document. In order for a user to efficiently understand a document, a paragraph containing a main word must be extracted and presented to the user. The proposed system retrieves keyword in the normalized XML document. Then, the user extracts the paragraphs containing the keyword inputted for searching and displays them to the user. In addition, the frequency and weight of the keyword used in the search are informed to the user, and the order of the extracted paragraphs and the redundancy elimination function are minimized so that the user can understand the document. The proposed system can minimize the time and effort required to understand the document by allowing the user to understand the document without reading the whole document.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Patent Analysis for Noise Barrier and Noise Reducing Device (방음벽 및 방음장치 특허 동향 분석)

  • Cho, Jun-Ho;Koh, Hyo-In;Kim, Heung-Sup
    • Proceedings of the KSR Conference
    • /
    • 2010.06a
    • /
    • pp.1975-1981
    • /
    • 2010
  • In this study, the patent trends for noise barrier and noise reducing device have been analyzed, for the development of adaptive noise barrier according to the transmission characteristics of railway noise. Using patent search engine, keyword searching for patents after 1980 in Korea was performed. The first 667 patents details were reviewed for the extraction core(ie, key) patents. From this review, finally 70 patents were built as DB. From this analysis of core patents, system requirements for development of noise reducing device were obtained.

  • PDF

Keyword Spotting on Hangul Document Images Using Character Feature Models (문자 별 특징 모델을 이용한 한글 문서 영상에서 키워드 검색)

  • Park, Sang-Cheol;Kim, Soo-Hyung;Choi, Deok-Jai
    • The KIPS Transactions:PartB
    • /
    • v.12B no.5 s.101
    • /
    • pp.521-526
    • /
    • 2005
  • In this Paper, we propose a keyword spotting system as an alternative to searching system for poor quality Korean document images and compare the Proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to remove the connectivity between adjacent characters and a character segmentation method by making the variance of character widths minimum. In the query creation step, feature vector for the query is constructed by a combination of a character model by typeface. In the matching step, word-to-word matching is applied base on a character-to-character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on the Korean document images, especially when the quality of documents is quite poor and point size is small.

A study on Similarity analysis of National R&D Programs using R&D Project's technical classification (R&D과제의 기술분류를 이용한 사업간 유사도 분석 기법에 관한 연구)

  • Kim, Ju-Ho;Kim, Young-Ja;Kim, Jong-Bae
    • Journal of Digital Contents Society
    • /
    • v.13 no.3
    • /
    • pp.317-324
    • /
    • 2012
  • Recently, coordination task of similarity between national R&D programs is emphasized on view from the R&D investment efficiency. But the previous similarity search method like text-based similarity search which using keyword of R&D projects has reached the limit due to deviation of document's quality. For the solve the limitations of text-based similarity search using the keyword extraction, in this study, utilization of R&D project's technical classification will be discussed as a new similarity search method when analyzed of similarity between national R&D programs. To this end, extracts the Science and Technology Standard Classification of R & D projects which are collected when national R&D Survey & analysis, and creates peculiar vector model of each R&D programs. Verify a reliability of this study by calculate the cosine-based and Euclidean distance-based similarity and compare with calculated the text-based similarity.

Keyword Weight based Paragraph Extraction Algorithm (문단 가중치 분석 기반 본문 영역 선정 알고리즘)

  • Lee, Jongwon;Yu, Seongjong;Kim, Doan;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.462-463
    • /
    • 2018
  • Traditional document analysis systems used word-based analysis using a morphological analyzer or TF-IDF technique. These systems have the advantage of being able to derive key keywords by calculating the weights of the keywords. On the other hand, it is not appropriate to analyze the contents of documents due to the structural limitations. To solve this problem, the proposed algorithm calculates the weights of the documents in the document and divides the paragraphs into areas. And we calculate the importance of the divided regions and let the user know the area with the most important paragraphs in the document. So, it is expected that the user will be provided with a service suitable for analyzing documents rather than using existing document analysis systems.

  • PDF

Contents Analysis and Synthesis Scheme for Music Album Cover Art

  • Moon, Dae-Jin;Rho, Seung-Min;Hwang, Een-Jun
    • Journal of IKEEE
    • /
    • v.14 no.4
    • /
    • pp.305-311
    • /
    • 2010
  • Most recent web search engines perform effective keyword-based multimedia contents retrieval by investigating keywords associated with multimedia contents on the Web and comparing them with query keywords. On the other hand, most music and compilation albums provide professional artwork as cover art that will be displayed when the music is played. If the cover art is not available, then the music player just displays some dummy or random images, but this has been a source of dissatisfaction. In this paper, in order to automatically create cover art that is matched with music contents, we propose a music album cover art creation scheme based on music contents analysis and result synthesis. We first (i) analyze music contents and their lyrics and extract representative keywords, (ii) expand the keywords using WordNet and generate various queries, (iii) retrieve related images from the Web using those queries, and finally (iv) synthesize them according to the user preference for album cover art. To show the effectiveness of our scheme, we developed a prototype system and reported some results.