• Title/Summary/Keyword: Keyword Generation

Search Result 70, Processing Time 0.033 seconds

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Keyword Selection for Visual Search based on Wikipedia (비주얼 검색을 위한 위키피디아 기반의 질의어 추출)

  • Kim, Jongwoo;Cho, Soosun
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.960-968
    • /
    • 2018
  • The mobile visual search service uses a query image to acquire linkage information through pre-constructed DB search. From the standpoint of this purpose, it would be more useful if you could perform a search on a web-based keyword search system instead of a pre-built DB search. In this paper, we propose a representative query extraction algorithm to be used as a keyword on a web-based search system. To do this, we use image classification labels generated by the CNN (Convolutional Neural Network) algorithm based on Deep Learning, which has a remarkable performance in image recognition. In the query extraction algorithm, dictionary meaningful words are extracted using Wikipedia, and hierarchical categories are constructed using WordNet. The performance of the proposed algorithm is evaluated by measuring the system response time.

Automatic Music-Story Video Generation Using Music Files and Photos in Automobile Multimedia System (자동차 멀티미디어 시스템에서의 사진과 음악을 이용한 음악스토리 비디오 자동생성 기술)

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.5
    • /
    • pp.80-86
    • /
    • 2010
  • This paper presents automated music story video generation technique as one of entertainment features that is equipped in multimedia system of the vehicle. The automated music story video generation is a system that automatically creates stories to accompany musics with photos stored in user's mobile phone by connecting user's mobile phone with multimedia systems in vehicles. Users watch the generated music story video at the same time. while they hear the music according to mood. The performance of the automated music story video generation is measured by accuracies of music classification, photo classification, and text-keyword extraction, and results of user's MOS-test.

Experimental Study of Keyword-Based Exploratory Testing (키워드 기반 탐색적 테스트의 실험적 연구)

  • Hwang, Jun Sun;Choi, Eun Man
    • Journal of Software Engineering Society
    • /
    • v.29 no.2
    • /
    • pp.13-20
    • /
    • 2020
  • The exploratory test was introduced as a desirable test method due to its fast development cycle, but it is not actively adopted because documentation and analysis of the test range are required for application. On the other hand, keyword-based testing has been introduced as a way to save resources and facilitate maintenance, but it is difficult to plan tests in advance due to the large number of variables such as data, settings, interactions, sequence and timing. However, in keyword-based testing, you can create a test case based on keywords by presenting clear criteria and methods for creating keywords and applying the exploration testing process. In this paper, we propose a model that automates exploratory tests based on keywords. To verify the effectiveness, we compared the general keyword-based test(KBT) and keyword-based exploratory test(KBET), and compared with the exploratory normal test case(ETC) and keyword-based exploratory test(KBET).

Network Analysis of Green Technology using Keyword of Green Field (녹색 분야 키워드 정보를 이용한 녹색기술 분야 네트워크 분석 (2006년 이후 녹색기술 관련 정보를 중심으로))

  • Jeong, Dae-Hyun;Kwon, Oh-Jin;Kwon, Young-Il
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.11
    • /
    • pp.511-518
    • /
    • 2012
  • In this study, the trend in green technology was observed and the domain of the green technology area that will be actively studied in the future was found by establishing knowledge map in green technology area and comparing and analyzing green technology information in Korea and overseas in time series. For the purpose of this study, network analysis was conducted for the keyword of green technology information provided by green technology information portal site (www.gtnet.go.kr) operated by Korea Institute of Science and Technology Information. Network analysis was conducted using keyword, and change of study subject was found by dividing the analysis result into periods. In the result of network analysis on top 100 keywords from total English keyword, it was found that renewable energy related areas such as solar energy and biomass had high centrality. When the main keyword trend by year was studied, centrality of solar cell, nanotechnology, smart grid, and fuel cell were found to increase, showing that research and development in generation and use of renewable energy are actively made.

A Test Case Generation Techniques Based on J2ME Platform (J2ME 플랫폼 기반의 테스트케이스 생성 기법)

  • Kim Sang-Il;Roh Myong-Ki;Rhew Sung-Yul
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.215-222
    • /
    • 2006
  • The importance of mobile software test is being addressed to improve the productivity and reliability of the software. Test automation technique based on mobile platform is required for effective application of mobile software test. That is, a technique is needed to generate test case for mobile platform API. When test case generated, software productivity and reliability are improved, while test duration and cost are decreased. In this paper, we identified test case generation scope through previous works about test automation, suggested keyword driven method, a test case generation technique on J2ME platform, and recognized that proposed method can be applicable to generating test case based on J2ME platform.

A Study on Ontology Instance Generation Using Keywords (키워드를 활용한 온톨로지 인스턴스 생성에 관한 연구)

  • Han, Kwang-Rok;Kang, Hyun-Min;Sohn, Surg-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.5
    • /
    • pp.1-11
    • /
    • 2010
  • The success of semantic web depends largely on the semantic annotation which systematizes knowledge for the construction and production of ontology. Therefore, the efficiency of semantic annotation is very important in order to change many knowledge expressions and generate into ontology instances. In this paper, we presents a generation system of rule-based ontology instances which are produced accurately and efficiently via semantic annotation in conventional web sites. In conventional studies, the manual process is necessary for finding relevant information, comparing it with ontology, and entering information. We propose a new method that manages keyword data regarding extracted information and rule information separately. Thus, it is quite practical to extract information efficiently from various web documents by adding a small number of keywords and rules. The proposed method shows the possibility of ontology instance generation which reuses the rules and keywords from the various websites.

Unstructured Data Processing Using Keyword-Based Topic-Oriented Analysis (키워드 기반 주제중심 분석을 이용한 비정형데이터 처리)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.521-526
    • /
    • 2017
  • Data format of Big data is diverse and vast, and its generation speed is very fast, requiring new management and analysis methods, not traditional data processing methods. Textual mining techniques can be used to extract useful information from unstructured text written in human language in online documents on social networks. Identifying trends in the message of politics, economy, and culture left behind in social media is a factor in understanding what topics they are interested in. In this study, text mining was performed on online news related to a given keyword using topic - oriented analysis technique. We use Latent Dirichiet Allocation (LDA) to extract information from web documents and analyze which subjects are interested in a given keyword, and which topics are related to which core values are related.

Fuel Cell Research Trend Analysis for Major Countries by Keyword-Network Analysis (키워드 네트워크 분석을 통한 주요국 연료전지 분야 연구동향 분석)

  • SON, BUMSUK;HWANG, HANSU;OH, SANGJIN
    • Journal of Hydrogen and New Energy
    • /
    • v.33 no.2
    • /
    • pp.130-141
    • /
    • 2022
  • Due to continuous climate change, greenhouse gases in the atmosphere are gradually accumulating, and various extreme weather events occurring all over the world are a serious threat to human sustainability. Countries around the world are making efforts to convert energy sources from traditional fossil fuels to renewable energy. Hydrogen energy is a clean energy source that exists infinitely on Earth, and can be used in most areas that require energy, such as power generation, transportation, commerce, and household sectors. A fuel cell, a device that produces electric and thermal energy by using hydrogen energy, is a key field to respond to climate change, and major countries around the world are spurring the development of core fuel cell technology. In this paper, research trends in China, the United States, Germany, Japan, and Korea, which have the highest number of papers related to fuel cells, are analyzed through keyword network analysis.

Automatic Background Keyword of Movie Extraction Method from Media Reviews (미디어 리뷰를 이용한 영화 배경 키워드 자동 추출 기법)

  • Kim, Hyung W.;Cho, Joonmyun;Yoo, Jeongju
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.1149-1151
    • /
    • 2013
  • 본 연구는 영화 콘텐츠의 배경(공간적/시간적)에 해당하는 키워드를 자동으로 추출하는 기법을 제안한다. 제안된 기법은 영화 콘텐츠들의 리뷰 텍스트 데이터를 웹 상으로부터 수집하는 과정, 수집된 텍스트 리뷰 데이터의 전처리 과정에 해당하는 형태소 분석 및 개체명인식 과정, 마지막으로 통계적 기법을 이용하여 최종적으로 배경에 해당하는 단어를 선택하는 과정으로 이루어진다. 자동으로 추출된 배경 정보는 사용자 평가를 통하여 정확도를 측정하였으며, 자동 생성된 배경 정보를 이용하여 영화 콘텐츠의 검색 및 추천 등에 다양하게 사용될 수 있을 것으로 예상된다.