• 제목/요약/키워드: 키워드 추출 방법

검색결과 355건 처리시간 0.028초

Multimodal Media Content Classification using Keyword Weighting for Recommendation (추천을 위한 키워드 가중치를 이용한 멀티모달 미디어 콘텐츠 분류)

  • Kang, Ji-Soo;Baek, Ji-Won;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • 제9권5호
    • /
    • pp.1-6
    • /
    • 2019
  • As the mobile market expands, a variety of platforms are available to provide multimodal media content. Multimodal media content contains heterogeneous data, accordingly, user requires much time and effort to select preferred content. Therefore, in this paper we propose multimodal media content classification using keyword weighting for recommendation. The proposed method extracts keyword that best represent contents through keyword weighting in text data of multimodal media contents. Based on the extracted data, genre class with subclass are generated and classify appropriate multimodal media contents. In addition, the user's preference evaluation is performed for personalized recommendation, and multimodal content is recommended based on the result of the user's content preference analysis. The performance evaluation verifies that it is superiority of recommendation results through the accuracy and satisfaction. The recommendation accuracy is 74.62% and the satisfaction rate is 69.1%, because it is recommended considering the user's favorite the keyword as well as the genre.

A Corpus Analysis of British-American Children's Adventure Novels: Treasure Island (영미 아동 모험 소설에 관한 코퍼스 분석 연구: 『보물섬』을 중심으로)

  • Choi, Eunsaem;Jung, Chae Kwan
    • The Journal of the Korea Contents Association
    • /
    • 제21권1호
    • /
    • pp.333-342
    • /
    • 2021
  • In this study, we analyzed the vocabulary, lemmas, keywords, and n-grams in 『Treasure Island』 to identify certain linguistic features of this British-American children's adventure novel. The current study found that, contrary to the popular claim that frequently-used words are important and essential to a story, the set of frequently-used words in 『Treasure Island』 were mostly function words and proper nouns that were not directly related to the plot found in 『Treasure Island』. We also ascertained that a list of keywords using a statistical method making use of a corpus program was not good enough to surmise the story of 『Treasure Island』. However, we managed to extract 30 keywords through the first quantitative keyword analysis and then a second qualitative keyword analysis. We also carried out a series of n-gram analyses and were able to discover lexical bundles that were preferred and frequently used by the author of 『Treasure Island』. We hope that the results of this study will help spread this knowledge among British-American children's literature as well as to further put forward corpus stylistic theory.

Performance Evaluation of the Extractiojn Method of Representative Keywords by Fuzzy Inference (퍼지추론 기반 대표 키워드 추출방법의 성능 평가)

  • Rho Sun-Ok;Kim Byeong Man;Oh Sang Yeop;Lee Hyun Ah
    • Journal of Korea Society of Industrial Information Systems
    • /
    • 제10권1호
    • /
    • pp.28-37
    • /
    • 2005
  • In our previous works, we suggested a method that extracts representative keywords from a few positive documents and assigns weights to them. To show the usefulness of the method, in this paper, we evaluate the performance of a famous classification algorithm called GIS(Generalized Instance Set) when it is combined with our method. In GIS algorithm, generalized instances are built from learning documents by a generalization function and then the K-NN algorithm is applied to them. Here, our method is used as a generalization function. For comparative works, Rocchio and Widrow-Hoff algorithms are also used as a generalization function. Experimental results show that our method is better than the others for the case that only positive documents are considered, but not when negative documents are considered together.

  • PDF

Text Pattern Search Based on User Profile using Prefix Tree (전위 트리를 이용한 사용자 프로파일 기반의 문서 패턴 검색 기법)

  • Woo, Ho-Jin;Lee, Won-Suk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 한국정보처리학회 2005년도 추계학술발표대회 및 정기총회
    • /
    • pp.533-536
    • /
    • 2005
  • 기하급수적으로 증가하는 데이터 중에서 개개인 사용자에게 적합한 정보를 추출하여 제공해야 할 필요성이 증대되고 있다. 본 논문에서는 대용량의 문서 집합으로부터 사용자가 원하는 특정 주제의 정보를 정확하게 추출해 낼 수 있는 문서 패턴 검색 방법을 제시한다. 사용자 선호도를 정확하게 반영할 수 있도록 전위 트리를 기반으로 사용자의 키워드 마이닝 프로파일을 생성하고, 이를 이용하여 문서 집합에서 매치된 패턴을 찾아내는 방법을 제안하였다. 생성된 프로파일을 이용한 검색 기법의 효용성을 실험을 통해 검증하였다.

  • PDF

Music Recommendation System Based on User Preference Analysis Using Hidden Markov Model (은닉 마코프 모델을 이용한 사용자 선호도 분석 기반의 음악 추천 시스템)

  • Kim, Geon-Su;Lee, Dong-Hun;Yun, Tae-Bok;Lee, Ji-Hyeong
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 한국지능시스템학회 2008년도 춘계학술대회 학술발표회 논문집
    • /
    • pp.56-59
    • /
    • 2008
  • 현재의 음악 서비스들의 대부분은 음악을 가수 이름이나 장르와 같은 키워드들로 구분하여 사용자에게 제공한다. 하지만 음악의 장르가 다양해지고, 장르별로 음악의 유형도 다양해짐에 따라 키워드 기반은 음악 제공 방법만으로는 사용자가 원하는 음악을 제공하는데 한계가 있다. 이런 한계점을 극복하기 위하여 음악 자체의 성질을 기반으로 음악을 분석하는 컨텐츠 기반의 음악 분석 방법이 필요하다. 또한 사용자가 원하는 음악을 제공 받을 수 있도록 사용자의 음악 선호도를 분석하여 그에 맞는 음악을 제공하는 방법도 필요하다. 본 논문에서는 음악의 시퀀스 정보와 특징을 추출하여 음악 모델을 구축하고, 이를 사용하여 사용자의 음악 선호도를 분석하는 방법을 제안하고, 사용자의 선호도에 맞는 음악을 제공하기 위하여 선호도 분석 방법을 통해 음악을 추천해주는 시스템을 제안한다.

  • PDF

Similar Patent Search Service System using Latent Dirichlet Allocation (잠재 의미 분석을 적용한 유사 특허 검색 서비스 시스템)

  • Lim, HyunKeun;Kim, Jaeyoon;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • 제22권8호
    • /
    • pp.1049-1054
    • /
    • 2018
  • Keyword searching used in the past as a method of finding similar patents, and automated classification by machine learning is using in recently. Keyword searching is a method of analyzing data that is formalized through data refinement. While the accuracy for short text is high, long one consisted of several words like as document that is not able to analyze the meaning contained in sentences. In semantic analysis level, the method of automatic classification is used to classify sentences composed of several words by unstructured data analysis. There was an attempt to find similar documents by combining the two methods. However, it have a problem in the algorithm w the methods of analysis are different ways to use simultaneous unstructured data and regular data. In this paper, we study the method of extracting keywords implied in the document and using the LDA(Latent Semantic Analysis) method to classify documents efficiently without human intervention and finding similar patents.

사용자 의도 정보를 사용한 웹문서 분류

  • Jang, Yeong-Cheol
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 한국산업정보학회 2008년도 추계 공동 국제학술대회
    • /
    • pp.292-297
    • /
    • 2008
  • 복잡한 시맨틱을 포함한 웹 문서를 정확히 범주화하고 이 과정을 자동화하기 위해서는 인간의 지식체계를 수용할 수 있는 표준화, 지능화, 자동화된 문서표현 및 분류기술이 필요하다. 이를 위해 키워드 빈도수, 문서내 키워드들의 관련성, 시소러스의 활용, 확률기법 적용 등에 사용자의도(intention) 정보를 활용한 범주화와 조정 프로세스를 도입하였다. 웹 문서 분류과정에서 시소러스 등을 사용하는 지식베이스 문서분류와 비 감독 학습을 하는 사전 지식체계(a priori)가 없는 유사성 문서분류 방법에 의도정보를 사용할 수 있도록 기반체계를 설계하였고 다시 이 두 방법의 차이는 Hybrid조정프로세스에서 조정하였다. 본 연구에서 설계된 HDCI(Hybrid Document Classification with Intention) 모델은 위의 웹 문서 분류과정과 이를 제어 및 보조하는 사용자 의도 분석과정으로 구성되어 있다. 의도분석과정에 키워드와 함께 제공된 사용자 의도는 도메인 지식(domain Knowledge)을 이용하여 의도간 계층트리(intention hierarchy tree)를 구성하고 이는 문서 분류시 제약(constraint) 또는 가이드의 역할로 사용자 의도 프로파일(profile) 또는 문서 특성 대표 키워드를 추출하게 된다. HDCI는 문서간 유사성에 근거한 상향식(bottom-up)의 확률적인 접근에서 통제 및 안내의 역할을 수행하고 지식베이스(시소러스) 접근 방식에서 다양성에 한계가 있는 키워들 간 관계설정의 정확도를 높인다.

  • PDF

Keyword Spotting on Hangul Document Images Using Character Feature Models (문자 별 특징 모델을 이용한 한글 문서 영상에서 키워드 검색)

  • Park, Sang-Cheol;Kim, Soo-Hyung;Choi, Deok-Jai
    • The KIPS Transactions:PartB
    • /
    • 제12B권5호
    • /
    • pp.521-526
    • /
    • 2005
  • In this Paper, we propose a keyword spotting system as an alternative to searching system for poor quality Korean document images and compare the Proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to remove the connectivity between adjacent characters and a character segmentation method by making the variance of character widths minimum. In the query creation step, feature vector for the query is constructed by a combination of a character model by typeface. In the matching step, word-to-word matching is applied base on a character-to-character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on the Korean document images, especially when the quality of documents is quite poor and point size is small.

Investigating Trends of Gifted Counseling in Domestic through Sementic Network Analysis (네트워크분석 방법을 활용한 국내 영재상담 관련 연구동향 분석)

  • Lee, Sanggyun;Kim, Soonshik
    • Journal of the Korean Society of Earth Science Education
    • /
    • 제11권2호
    • /
    • pp.145-157
    • /
    • 2018
  • The purpose of this study is to analyze the research trends in domestic related to gifted counseling by utilizing Sementic analysis methods. For papers of gifted education in korea, KCI(Korea Citation Index) rated journals were selected 83 pieces published in journals were collected and the Sementic Network Analysis(SNA) way was utilizing for keyword frequency and Centrality Network Analysis throughout a variety of research articles using krkwic and Ucinet6.0. The results are as follows. first, the analysis appeared that the trends of paper keywords from highest frequency of appearance keyword in papers focused on four keywords: perfectionism, career, counseling, and the science gifted. second, Analysis of annual trends from 2001 to June 2018 showed that the top keywords were as follows: the gifted underachievers, the perfectionism, the gifted students of Science, and the science gifted students. the rising keywords were perfectionism, twice-exceptional students, and gifted parents, and the keywords of gifted students and general students showed a tendency to decrease. Consequently, gifted counseling research should be done from various perspectives.

The Use and Understanding of Keyword Searching in SELIS Online Public Access Catalogs (SELIS OPAC에 있어서 키워드탐색의 이용과 이해)

  • Koo Bon-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • 제33권2호
    • /
    • pp.119-139
    • /
    • 1999
  • It Is the purpose of this research to analyse users' understanding how keyword and boolean search work in SELIS(SEoul Women's University Library and Information System) OPAC. Results of analyses of the subject, SELIS OPAC system processing, are: comprehension percentage of keyword extraction is $67(22.48\%)$ out of total 298 persons, no comprehension is $231(77.52\%)$ understanding of boolean OR In keyword search appears $115(22.48\%)$ out of 297, no understanding does $182(77.52\%)$ : comprehension of boolean AND is $98(33.11\%)$ out of 296, no understanding appears $198(66.89\%)$ understanding of using boolean and symbols is $109(36.49\%)$ out of 285, no understanding is $181(63.51\%)$ which Is lower percentage generally. And in SELIS OPAC system, in Intentional analyses to see any difference in understanding of keyword search between experience of keyword search or no, It shows no difference in interrelation $5\%$ level of significance, but In boolean search it does in interrelation $5\%$ level of significance.

  • PDF