• Title/Summary/Keyword: keyword

Search Result 2,066, Processing Time 0.028 seconds

A Study of High Speed Retrieval Algorithm of Long Component Keyword (복합키워드의 고속검색 알고리즘에 관한 연구)

  • Lee Jin-Kwan;Jung Kyu-cheol;Lee Tae-hun;Park Ki-hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.8
    • /
    • pp.1769-1776
    • /
    • 2004
  • Effective keyword extraction is important in the information search system and there are several ways to select proper keyword in many keywords. Among them, DER Structure for AC Algorithm to search single keyword, can search multiple keywords but it has time complexity problem. In this paper, we developed a algorithm, "EDER structure" by expanding standalone search table based on DER structure search method to improve time complexity. We tested the algorithm using 500 text files and found that EDER structure is more efficient than DER structure for AC for keyword posting result and time complexity that 0.2 second for EDER and 0.6 second for DER structure,structure,

A Study on Natural Language Keyword Indexing for Web-based Information Retrieval (웹기반 정보검색을 위한 자연어 키워드 색인에 관한 연구)

  • 윤성희
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.12
    • /
    • pp.1103-1111
    • /
    • 2003
  • Information retrieval system with indexing system matching single keyword is simple and popular. But with single keyword matching it is very hard to represent the exact meaning of documents and the set of documents from retrieval is very large, therefore it can't satisfy the user of the information retrieval systems. This paper proposes a phrase-based indexing system based on the phrase, the larger syntax unit than a single keyword. Web documents include lots of syntactic errors, the natural language parser with high Quality cannot be expected in Web. Partial trees, even not a full tree, from fully bottom-up parsing is still useful for extracting phrases, and they are much more discriminative than single keyword for index. It helps the information retrieval system enhance the efficiency and reduce the processing overhead.

  • PDF

A Study on Embedded DSP Implementation of Keyword-Spotting System using Call-Command (호출 명령어 방식 핵심어 검출 시스템의 임베디드 DSP 구현에 관한 연구)

  • Song, Ki-Chang;Kang, Chul-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.9
    • /
    • pp.1322-1328
    • /
    • 2010
  • Recently, keyword spotting system is greatly in the limelight as UI(User Interface) technology of ubiquitous home network system. Keyword spotting system is vulnerable to non-stationary noises such as TV, radio, dialogue. Especially, speech recognition rate goes down drastically under the embedded DSP(Digital Signal Processor) environments because it is relatively low in the computational capability to process input speech in real-time. In this paper, we propose a new keyword spotting system using the call-command method, which is consisted of small number of recognition networks. We select the call-command such as 'narae', 'home manager' and compose the small network as a token which is consisted of silence with the noise and call commands to carry the real-time recognition continuously for input speeches.

Keyword Extraction in Korean Using Unsupervised Learning Method (비감독 학습 기법에 의한 한국어의 키워드 추출)

  • Shin, Seong-Yoon;Rhee, Yang-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.6
    • /
    • pp.1403-1408
    • /
    • 2010
  • Korean information retrieval uses noun as index terms or keywords of representing the document. and noun and keyword extraction is to find all nouns presented in the document, In this paper, we proposes the method of keyword extraction using pre-built dictionary. This method reduces the execution time by reducing unnecessary operations. And noun, even large documents without affecting significantly the accuracy, can be extracted. This paper proposed noun extraction method using the appearance characteristics of the noun and keyword extraction method using unsupervised learning techniques.

An Efficient Keyword Search Method on RDF Data (RDF 데이타에 대한 효율적인 검색 기법)

  • Kim, Jin-Ha;Song, In-Chul;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.495-504
    • /
    • 2008
  • Recently, there has been much work on supporting keyword search not only for text documents, but a]so for structured data such as relational data, XML data, and RDF data. In this paper, we propose an efficient keyword search method for RDF data. The proposed method first groups related nodes and edges in RDF data graphs to reduce data sizes for efficient keyword search and to allow relevant information to be returned together in the query answers. The proposed method also utilizes the semantics in RDF data to measure the relevancy of nodes and edges with respect to keywords for search result ranking. The experimental results based on real RDF data show that the proposed method reduces RDF data about in half and is at most 5 times faster than the previous methods.

Automatic In-Text Keyword Tagging based on Information Retrieval

  • Kim, Jin-Suk;Jin, Du-Seok;Kim, Kwang-Young;Choe, Ho-Seop
    • Journal of Information Processing Systems
    • /
    • v.5 no.3
    • /
    • pp.159-166
    • /
    • 2009
  • As shown in Wikipedia, tagging or cross-linking through major keywords in a document collection improves not only the readability of documents but also responsive and adaptive navigation among related documents. In recent years, the Semantic Web has increased the importance of social tagging as a key feature of the Web 2.0 and, as its crucial phenotype, Tag Cloud has emerged to the public. In this paper we provide an efficient method of automated in-text keyword tagging based on large-scale controlled term collection or keyword dictionary, where the computational complexity of O(mN) - if a pattern matching algorithm is used - can be reduced to O(mlogN) - if an Information Retrieval technique is adopted - while m is the length of target document and N is the total number of candidate terms to be tagged. The result shows that automatic in-text tagging with keywords filtered by Information Retrieval speeds up to about 6 $\sim$ 40 times compared with the fastest pattern matching algorithm.

Factors affecting the number of citations in papers published in the Journal of Korean Society of Dental Hygiene (한국치위생학회지 게재논문의 피인용수에 영향을 미친 요인)

  • Jeon, Se-Jeong
    • Journal of Korean society of Dental Hygiene
    • /
    • v.21 no.5
    • /
    • pp.639-644
    • /
    • 2021
  • Objectives: The purpose of this study was to analyze the factors that affected the number of citations for articles published in the Journal of Korean Society of Dental Hygiene based on previous studies. Methods: Information on papers including the number of citations was collected using a web crawling technique. The effect of the number of author keywords, the number of Medical Subject Headings (MeSH) keywords, MeSH match rate, abstract word count and keyword-abstract ratio on the number of citations was analyzed by multiple regression analysis. Results: The use of the MeSH keyword did not have a significant effect on the number of citations. Among the other factors, only the keyword-abstract ratio was statistically significant. Conclusions: Select a topic of constant interest in the field, write the title in detail using colons or asterisks if necessary, and do not repeat the words used in the title in keywords. Select specific keywords deeply related to the topic. In particular, choice words or phrases that are frequently used in the abstract. If the MeSH keyword selection contradicts the previous strategies, boldly give up the MeSH keyword.

Comparative Study of Keyword Extraction Models in Biomedical Domain (생의학 분야 키워드 추출 모델에 대한 비교 연구)

  • Donghee Lee;Soonchan Kwon;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.24 no.4
    • /
    • pp.77-84
    • /
    • 2023
  • Given the growing volume of biomedical papers, the ability to efficiently extract keywords has become crucial for accessing and responding to important information in the literature. In this study, we conduct a comprehensive evaluation of different unsupervised learning-based models and BERT-based models for keyword extraction in the biomedical field. Our experimental findings reveal that the BioBERT model, trained on biomedical-specific data, achieves the highest performance. This study offers precise and dependable insights to guide forthcoming research in biomedical keyword extraction. By establishing a well-suited experimental framework and conducting thorough comparisons and analyses of diverse models, we have furnished essential information. Furthermore, we anticipate extending our contributions to other domains by providing comparative experiments and practical guidelines for effective keyword extraction.

Keyword Network Analysis for Technology Forecasting (기술예측을 위한 특허 키워드 네트워크 분석)

  • Choi, Jin-Ho;Kim, Hee-Su;Im, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.227-240
    • /
    • 2011
  • New concepts and ideas often result from extensive recombination of existing concepts or ideas. Both researchers and developers build on existing concepts and ideas in published papers or registered patents to develop new theories and technologies that in turn serve as a basis for further development. As the importance of patent increases, so does that of patent analysis. Patent analysis is largely divided into network-based and keyword-based analyses. The former lacks its ability to analyze information technology in details while the letter is unable to identify the relationship between such technologies. In order to overcome the limitations of network-based and keyword-based analyses, this study, which blends those two methods, suggests the keyword network based analysis methodology. In this study, we collected significant technology information in each patent that is related to Light Emitting Diode (LED) through text mining, built a keyword network, and then executed a community network analysis on the collected data. The results of analysis are as the following. First, the patent keyword network indicated very low density and exceptionally high clustering coefficient. Technically, density is obtained by dividing the number of ties in a network by the number of all possible ties. The value ranges between 0 and 1, with higher values indicating denser networks and lower values indicating sparser networks. In real-world networks, the density varies depending on the size of a network; increasing the size of a network generally leads to a decrease in the density. The clustering coefficient is a network-level measure that illustrates the tendency of nodes to cluster in densely interconnected modules. This measure is to show the small-world property in which a network can be highly clustered even though it has a small average distance between nodes in spite of the large number of nodes. Therefore, high density in patent keyword network means that nodes in the patent keyword network are connected sporadically, and high clustering coefficient shows that nodes in the network are closely connected one another. Second, the cumulative degree distribution of the patent keyword network, as any other knowledge network like citation network or collaboration network, followed a clear power-law distribution. A well-known mechanism of this pattern is the preferential attachment mechanism, whereby a node with more links is likely to attain further new links in the evolution of the corresponding network. Unlike general normal distributions, the power-law distribution does not have a representative scale. This means that one cannot pick a representative or an average because there is always a considerable probability of finding much larger values. Networks with power-law distributions are therefore often referred to as scale-free networks. The presence of heavy-tailed scale-free distribution represents the fundamental signature of an emergent collective behavior of the actors who contribute to forming the network. In our context, the more frequently a patent keyword is used, the more often it is selected by researchers and is associated with other keywords or concepts to constitute and convey new patents or technologies. The evidence of power-law distribution implies that the preferential attachment mechanism suggests the origin of heavy-tailed distributions in a wide range of growing patent keyword network. Third, we found that among keywords that flew into a particular field, the vast majority of keywords with new links join existing keywords in the associated community in forming the concept of a new patent. This finding resulted in the same outcomes for both the short-term period (4-year) and long-term period (10-year) analyses. Furthermore, using the keyword combination information that was derived from the methodology suggested by our study enables one to forecast which concepts combine to form a new patent dimension and refer to those concepts when developing a new patent.

A Study on Multi-frequency Keyword Visualization based on Co-occurrence (다중빈도 키워드 가시화에 관한 연구)

  • Lee, HyunChang;Shin, SeongYoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.103-104
    • /
    • 2018
  • Recently, interest in data analysis has increased as the importance of big data becomes more important. Particularly, as social media data and academic research communities become more active and important, analysis becomes more important. In this study, co-word analysis was conducted through altmetrics articles collected from 2012 to 2017. In this way, the co-occurrence network map is derived from the keyword and the emphasized keyword is extracted.

  • PDF