• Title/Summary/Keyword: keyword extraction

Search Result 192, Processing Time 0.022 seconds

Tag Search System Using the Keyword Extraction and Similarity Evaluation (키워드 추출 및 유사도 평가를 통한 태그 검색 시스템)

  • Jung, Jaein;Yoo, Myungsik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.12
    • /
    • pp.2485-2487
    • /
    • 2015
  • Recently, Hashtag is widely used in SNS like Facebook, Twitter and personal blogs. However, the efficiency of tag search system is poor due to the indiscriminate use of hashtags. To enhance the accuracy of tag search system, we proposed a tag search system using the keyword extraction and similarity evaluation. The experimental results show that the proposed system provides the higher accuracy on tag search results.

Text-mining Based Graph Model for Keyword Extraction from Patent Documents (특허 문서로부터 키워드 추출을 위한 위한 텍스트 마이닝 기반 그래프 모델)

  • Lee, Soon Geun;Leem, Young Moon;Um, Wan Sup
    • Journal of the Korea Safety Management & Science
    • /
    • v.17 no.4
    • /
    • pp.335-342
    • /
    • 2015
  • The increasing interests on patents have led many individuals and companies to apply for many patents in various areas. Applied patents are stored in the forms of electronic documents. The search and categorization for these documents are issues of major fields in data mining. Especially, the keyword extraction by which we retrieve the representative keywords is important. Most of techniques for it is based on vector space model. But this model is simply based on frequency of terms in documents, gives them weights based on their frequency and selects the keywords according to the order of weights. However, this model has the limit that it cannot reflect the relations between keywords. This paper proposes the advanced way to extract the more representative keywords by overcoming this limit. In this way, the proposed model firstly prepares the candidate set using the vector model, then makes the graph which represents the relation in the pair of candidate keywords in the set and selects the keywords based on this relationship graph.

Automatic Keyword Extraction using Hierarchical Graph Model Based on Word Co-occurrences (단어 동시출현관계로 구축한 계층적 그래프 모델을 활용한 자동 키워드 추출 방법)

  • Song, KwangHo;Kim, Yoo-Sung
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.522-536
    • /
    • 2017
  • Keyword extraction can be utilized in text mining of massive documents for efficient extraction of subject or related words from the document. In this study, we proposed a hierarchical graph model based on the co-occurrence relationship, the intrinsic dependency relationship between words, and common sub-word in a single document. In addition, the enhanced TextRank algorithm that can reflect the influences of outgoing edges as well as those of incoming edges is proposed. Subsequently a novel keyword extraction scheme using the proposed hierarchical graph model and the enhanced TextRank algorithm is proposed to extract representative keywords from a single document. In the experiments, various evaluation methods were applied to the various subject documents in order to verify the accuracy and adaptability of the proposed scheme. As the results, the proposed scheme showed better performance than the previous schemes.

XML Document Keyword Weight Analysis based Paragraph Extraction Model (XML 문서 키워드 가중치 분석 기반 문단 추출 모델)

  • Lee, Jongwon;Kang, Inshik;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2133-2138
    • /
    • 2017
  • The analysis of existing XML documents and other documents was centered on words. It can be implemented using a morpheme analyzer, but it can classify many words in the document and cannot grasp the core contents of the document. In order for a user to efficiently understand a document, a paragraph containing a main word must be extracted and presented to the user. The proposed system retrieves keyword in the normalized XML document. Then, the user extracts the paragraphs containing the keyword inputted for searching and displays them to the user. In addition, the frequency and weight of the keyword used in the search are informed to the user, and the order of the extracted paragraphs and the redundancy elimination function are minimized so that the user can understand the document. The proposed system can minimize the time and effort required to understand the document by allowing the user to understand the document without reading the whole document.

Automatic Keyword Extraction System for Korean Documents Information Retrieval (국내(國內) 문헌정보(文獻情報) 검색(檢索)을 위한 키워드 자동추출(自動抽出) 시스템 개발(開發))

  • Yae, Yong-Hee
    • Journal of Information Management
    • /
    • v.23 no.1
    • /
    • pp.39-62
    • /
    • 1992
  • In this paper about 60 auxiliary words and 320 stopwords are selected from analysis of sample data, four types of stop word are classified left, right and - auxiliary word truncation & normal. And a keyword extraction system is suggested which undertakes efficient truncation of auxiliary word from words, conversion of Chinese word to Korean and exclusion of stopword. The selected keyeords in this system show 92.2% of accordance ratio compared with manually selected keywords by expert. And then compound words consist of $4{\sim}6$ character generate twice of additional new words and 58.8% words of those are useful as keyword.

  • PDF

Web Document Classification Based on Hangeul Morpheme and Keyword Analyses (한글 형태소 및 키워드 분석에 기반한 웹 문서 분류)

  • Park, Dan-Ho;Choi, Won-Sik;Kim, Hong-Jo;Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.19D no.4
    • /
    • pp.263-270
    • /
    • 2012
  • With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.

Concept-based Compound Keyword Extraction (개념기반 복합키워드 추출방법)

  • Lee, Sangkon;Lee, Taehun
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.2
    • /
    • pp.23-31
    • /
    • 2003
  • In general, people use a key word or a phrase as the name of field or subject word in document. This paper has focused on keyword extraction. First of all, we investigate that an author suggests keywords that are not occurred as contents words in literature, and present generation rules to combine compound keywords based on concept of lexical information. Moreover, we present a new importance measurement to avoid useless keywords that are not related to documents' contents. To verify the validity of extraction result, we collect titles and abstracts from research papers about natural language and/or voice processing studies, and obtain the 96% precision in a top rank of extraction result.

  • PDF

A Study on the Research Trends to Flipped Learning through Keyword Network Analysis (플립러닝 연구 동향에 대한 키워드 네트워크 분석 연구)

  • HEO, Gyun
    • Journal of Fisheries and Marine Sciences Education
    • /
    • v.28 no.3
    • /
    • pp.872-880
    • /
    • 2016
  • The purpose of this study is to find the research trends relating to flipped learning through keyword network analysis. For investigating this topic, final 100 papers (removed due to overlap in all 205 papers) were selected as subjects from the result of research databases such as RISS, DBPIA, and KISS. After keyword extraction, coding, and data cleaning, we made a 2-mode network with final 202 keywords. In order to find out the research trends, frequency analysis, social network structural property analysis based on co-keyword network modeling, and social network centrality analysis were used. Followings were the results of the research: (a) Achievement, writing, blended learning, teaching and learning model, learner centered education, cooperative leaning, and learning motivation, and self-regulated learning were found to be the most common keywords except flipped learning. (b) Density was .088, and geodesic distance was 3.150 based on keyword network type 2. (c) Teaching and learning model, blended learning, and satisfaction were centrally located and closed related to other keywords. Satisfaction, teaching and learning model blended learning, motivation, writing, communication, and achievement were playing an intermediary role among other keywords.

A Study on the Research Trend in the Dyslexia and Learning Disability Trough a Keyword Network Analysis (키워드 네트워크 분석을 통한 난독증과 학습장애 관련 연구 동향 분석)

  • Lee, Woo-Jin;Kim, Tae-Gang
    • Journal of Digital Convergence
    • /
    • v.17 no.1
    • /
    • pp.91-98
    • /
    • 2019
  • The present study was performed to investigate the general research trends of dyslexia and learning disability to explore the centrality of related variables though analysis of keyword networks. Data were collected from ten years articles research information sharing service(RISS) which is provided by korea education and research information service(KERIS). The research subjects selected for the analysis were keyword cleansing work, extraction major keyword using KrKwic program and using NodeXL program to Visualize the center of connection between keyword. The results of this were as follows. First, totally 72 of keyword were extracted from keyword cleansing process and among those keyword. major keywords included learning disability, dyslexia, RTI. Second, analysis of the betweenness centrality of dyslexia and learing disabilities shows that learning disabilities are a key word that has been addressed in the study of dyslexia and learning disabilities in korea. The results of these studies suggest a method of analyzing trends in qualitative and qualitative analysis in relation to dyslexia and learning disorder.

Representative Keyword Extraction from Few Documents through Fuzzy Inference (퍼지 추론을 이용한 소수 문서의 대표 키워드 추출)

  • 노순억;김병만;허남철
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.117-120
    • /
    • 2001
  • In this work, we propose a new method of extracting and weighting representative keywords(RKs) from a few documents that might interest a user. In order to extract RKs, we first extract candidate terms and then choose a number of terms called initial representative keywords (IRKS) from them through fuzzy inference. Then, by expanding and reweighting IRKS using term co-occurrence similarity, the final RKs are obtained. Performance of our approach is heavily influenced by effectiveness of selection method of IRKS so that we choose fuzzy inference because it is more effective in handling the uncertainty inherent in selecting representative keywords of documents. The problem addressed in this paper can be viewed as the one of calculating center of document vectors. So, to show the usefulness of our approach, we compare with two famous methods - Rocchio and Widrow-Hoff - on a number of documents collections. The results show that our approach outperforms the other approaches.

  • PDF