• Title/Summary/Keyword: Keyword Ranking

Search Result 55, Processing Time 0.025 seconds

Proposal of keyword extraction method based on morphological analysis and PageRank in Tweeter (트위터에서 형태소 분석과 PageRank 기반 화제단어 추출 방법 제안)

  • Lee, Won-Hyung;Cho, Sung-Il;Kim, Dong-Hoi
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.157-163
    • /
    • 2018
  • People who use SNS publish their diverse ideas on SNS every day. The data posted on the SNS contains many people's thoughts and opinions. In particular, popular keywords served on Twitter compile the number of frequently appearing words in user posts and rank them. However, this method is sensitive to unnecessary data simply by listing duplicate words. The proposed method determines the ranking based on the topic of the word using the relationship diagram between words, so that the influence of unnecessary data is less and the main word can be stably extracted. For the performance comparison in terms of the descending keyword rank and the ratios of meaningless keywords among high rank 20 keywords, we make a comparison between the proposed scheme which is based on morphological analysis and PageRank, and the existing scheme which is based on the number of appearances. As a result, the proposed scheme and the existing scheme have included 55% and 70% of meaningless keywords among high rank 20 keywords, respectively, where the proposed scheme is improved about 15% compared with the existing scheme.

A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns (인터넷 검색기록 분석을 통한 쇼핑의도 포함 키워드 자동 추출 기법)

  • Kim, Mingyu;Kim, Namgyu;Jung, Inhwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.123-136
    • /
    • 2014
  • Recently, online shopping has further developed as the use of the Internet and a variety of smart mobile devices becomes more prevalent. The increase in the scale of such shopping has led to the creation of many Internet shopping malls. Consequently, there is a tendency for increasingly fierce competition among online retailers, and as a result, many Internet shopping malls are making significant attempts to attract online users to their sites. One such attempt is keyword marketing, whereby a retail site pays a fee to expose its link to potential customers when they insert a specific keyword on an Internet portal site. The price related to each keyword is generally estimated by the keyword's frequency of appearance. However, it is widely accepted that the price of keywords cannot be based solely on their frequency because many keywords may appear frequently but have little relationship to shopping. This implies that it is unreasonable for an online shopping mall to spend a great deal on some keywords simply because people frequently use them. Therefore, from the perspective of shopping malls, a specialized process is required to extract meaningful keywords. Further, the demand for automating this extraction process is increasing because of the drive to improve online sales performance. In this study, we propose a methodology that can automatically extract only shopping-related keywords from the entire set of search keywords used on portal sites. We define a shopping-related keyword as a keyword that is used directly before shopping behaviors. In other words, only search keywords that direct the search results page to shopping-related pages are extracted from among the entire set of search keywords. A comparison is then made between the extracted keywords' rankings and the rankings of the entire set of search keywords. Two types of data are used in our study's experiment: web browsing history from July 1, 2012 to June 30, 2013, and site information. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The original sample dataset contains 150 million transaction logs. First, portal sites are selected, and search keywords in those sites are extracted. Search keywords can be easily extracted by simple parsing. The extracted keywords are ranked according to their frequency. The experiment uses approximately 3.9 million search results from Korea's largest search portal site. As a result, a total of 344,822 search keywords were extracted. Next, by using web browsing history and site information, the shopping-related keywords were taken from the entire set of search keywords. As a result, we obtained 4,709 shopping-related keywords. For performance evaluation, we compared the hit ratios of all the search keywords with the shopping-related keywords. To achieve this, we extracted 80,298 search keywords from several Internet shopping malls and then chose the top 1,000 keywords as a set of true shopping keywords. We measured precision, recall, and F-scores of the entire amount of keywords and the shopping-related keywords. The F-Score was formulated by calculating the harmonic mean of precision and recall. The precision, recall, and F-score of shopping-related keywords derived by the proposed methodology were revealed to be higher than those of the entire number of keywords. This study proposes a scheme that is able to obtain shopping-related keywords in a relatively simple manner. We could easily extract shopping-related keywords simply by examining transactions whose next visit is a shopping mall. The resultant shopping-related keyword set is expected to be a useful asset for many shopping malls that participate in keyword marketing. Moreover, the proposed methodology can be easily applied to the construction of special area-related keywords as well as shopping-related ones.

User Query Expansion Through Keyword Similarity Ranking Algorithm Us ins Cluster ing Methods (클러스터링 기법을 이용한 키워드 유사도 순위화 알고리즘에 따른 사용자 질의 확장)

  • 이상훈;김기태
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.479-481
    • /
    • 2003
  • 본 논문에서는 여러 가지 클러스터링 기법들을 사용하여 키워드 유사도롤 순위화하여 사용자의 질의를 확장하는 기법을 제안한다. 클러스터링 기법에는 연관(Association) 클러스터링, 메트릭(Metric) 클러스터링, 스칼라(Scalar) 클러스터링 기법을 사용하고, 이들간의 가중치를 적절히 조절하여 검색 시스템을 만든다. 사용자의 질의가 주어졌을 때, 질의 키워드와 연관된 키워드들을 순위화 하여 사용자에게 보여주고, 사용자의 추가입력을 받아서 질의를 확장한다. 사용자가 적당한 질의어로 판단하여 확장된 질의로 검색을 수행할 때까지 이 과정을 반복한다. 실험에서 사용한 문헌집합은 Korea Herald의 2003년 1월과 2월의 경제 관련 기사들을 수집하여 사용하였고, 실험을 거쳐서 질의를 확장한 결과 만족할 만한 결과가 도출되었다.

  • PDF

Cluster-based keyword Ranking Technique (클러스터 기반 키워드 랭킹 기법)

  • Yoo, Han-mook;Kim, Han-joon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.529-532
    • /
    • 2016
  • 본 논문은 기존의 TextRank 알고리즘에 상호정보량 척도를 결합하여 군집 기반에서 키워드 추출하는 ClusterTextRank 기법을 제안한다. 제안 기법은 k-means 군집화 알고리즘을 이용하여 문서들을 여러 군집으로 나누고, 각 군집에 포함된 단어들을 최소신장트리 그래프로 표현한 후 이에 근거한 군집 정보량을 고려하여 키워드를 추출한다. 제안 기법의 성능을 평가하기 위해 여행 관련 블로그 데이터를 이용하였으며, 제안 기법이 기존 TextRank 알고리즘보다 키워드 추출의 정확도가 약 13% 가량 개선됨을 보인다.

Dynamic Management of Equi-Join Results for Multi-Keyword Searches (다중 키워드 검색에 적합한 동등조인 연산 결과의 동적 관리 기법)

  • Lim, Sung-Chae
    • The KIPS Transactions:PartA
    • /
    • v.17A no.5
    • /
    • pp.229-236
    • /
    • 2010
  • With an increasing number of documents in the Internet or enterprises, it becomes crucial to efficiently support users' queries on those documents. In that situation, the full-text search technique is accepted in general, because it can answer uncontrolled ad-hoc queries by automatically indexing all the keywords found in the documents. The size of index files made for full-text searches grows with the increasing number of indexed documents, and thus the disk cost may be too large to process multi-keyword queries against those enlarged index files. To solve the problem, we propose both of the index file structure and its management scheme suitable to the processing of multi-keyword queries against a large volume of index files. For this, we adopt the structure of inverted-files, which are widely used in the multi-keyword searches, as a basic index structure and modify it to a hierarchical structure for join operations and ranking operations performed during the query processing. In order to save disk costs based on that index structure, we dynamically store in the main memory the results of join operations between two keywords, if they are highly expected to be entered in users' queries. We also do performance comparisons using a cost model of the disk to show the performance advantage of the proposed scheme.

Service Provider Ranking Based on Visual Media Ontology (시각 미디어 온톨로지에 기반한 서비스 제공자 랭킹)

  • Min, Young-Kun;Lee, Bog-Ju
    • The KIPS Transactions:PartB
    • /
    • v.15B no.4
    • /
    • pp.315-322
    • /
    • 2008
  • It is important to retrieve effectively the visual media such as pictures and video in the internet, especially to the application areas such as electronic art museum, e-commerce, and internet shopping malls. It is also needed in these areas to have content-based or even semantic-based multimedia retrieval instead of simple keyword-based retrieval. In our earlier research, we proposed a semantic-based visual media retrieval framework for the effective retrieval of the visual media from the internet. It uses visual media metadata and ontology based on the web service to achieve the semantic-based retrieval. In this research, there are more than one visual media service providers and one central service broker. As a preliminary step to the visual media data retrieval, a method is proposed to retrieve the service providers effectively. The method uses the structure of the ontology tree to obtain the providers and their rankings. It also uses the size of sub nodes and child nodes in the tree. It measures the rankings of providers more effectively than previous method. The experimental results show the accuracy of the method while keeping compatible speed against the existing method.

Semantic search of web documents using ontology (온톨로지를 이용한 웹문서의 시맨틱 검색)

  • Oh, Sung-Kyun;Kim, Byung-Gon
    • Journal of Digital Contents Society
    • /
    • v.15 no.5
    • /
    • pp.603-612
    • /
    • 2014
  • To provide efficient and correct search results, ontology which use the structure of information, is considered as a main mechanism in the semantic web. Therefore, recent research in information retrieval and data construction have emphasized the use of ontologies as a data representation and search mechanism. In this paper, we propose a semantic search method using ontology to improve search ability in web environment. Ontology and knowledge base is used to represent semantic meaning of the data and provide related web documents and facts as results. Also, search result ranking mechanism is proposed. The mechanism use cardinality of the keyword in the contents and structural information of ontology. Experimental results with several query processing indicate that different coefficient value in the expression gives different results in sample ontology system and we propose appropriate values of the coefficient.

Performance Evaluation of Video Recommendation System with Rich Metadata (풍부한 메타데이터를 가진 동영상 추천 시스템의 성능 평가)

  • Min Hwa Cho;Da Yeon Kim;Hwa Rang Lee;Ha Neul Oh;Sun Young Lee;In Hwan Jung;Jae Moon Lee;Kitae Hwang
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.2
    • /
    • pp.29-35
    • /
    • 2023
  • This paper makes it possible to search videos based on sentence by improving the previous research which automatically generates rich metadata from videos and searches videos by key words. For search by sentence, morphemes are analyzed for each sentence, keywords are extracted, weights are assigned to each keyword, and some videos are recommended by applying a ranking algorithm developed in the previous research. In order to evaluate performance of video search in this paper, a sufficient amount of videos and sufficient number of user experiences are re required. However, in the current situation where these are insufficient, three indirect evaluation methods were used: evaluation of overall user satisfaction, comparison of recommendation scores and user satisfaction, and evaluation of user satisfaction by video categories. As a result of performance evaluation, it was shown that the rich metadata construction and video recommendation implementation in this paper give users high search satisfaction.

Searching Patents Effectively in terms of Keyword Distributions (키워드 분포를 고려한 효과적 특허검색기법)

  • Lee, Wookey;Song, Justin Jongsu;Kang, Michael Mingu
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.3
    • /
    • pp.323-331
    • /
    • 2012
  • With the advancement of the area of knowledge and information, Intellectual Property, especially, patents have captured attention more and more emergent. The increasing need for efficient way of patent information search has been essential, but the prevailing patent search engines have included too many noises for the results due to the Boolean models. This has occasioned too much time for the professional experts to investigate the results manually. In this paper, we reveal the differences between the conventional document search and patent search and analyze the limitations of existing patent search. Furthermore, we propose a specialized in patent search, so that the relationship between the keywords within each document and their significance within each patent document search keyword can be identified. Which in turn, the keywords and the relationships have been appointed a ranking for this patent in the upper ranks and the noise in the data sub-ranked. Therefore this approach is proposed to significantly reduce noise ratio of the data from the search results. Finally, in, we demonstrate the superiority of the proposed methodology by comparing the Kipris dataset.

Course recommendation system using deep learning (딥러닝을 이용한 강좌 추천시스템)

  • Min-Ah Lim;Seung-Yeon Hwang;Dong-Jin Shin;Jae-Kon Oh;Jeong-Joon Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.193-198
    • /
    • 2023
  • We study a learner-customized lecture recommendation project using deep learning. Recommendation systems can be easily found on the web and apps, and examples using this feature include recommending feature videos by clicking users and advertising items in areas of interest to users on SNS. In this study, the sentence similarity Word2Vec was mainly used to filter twice, and the course was recommended through the Surprise library. With this system, it provides users with the desired classification of course data conveniently and conveniently. Surprise Library is a Python scikit-learn-based library that is conveniently used in recommendation systems. By analyzing the data, the system is implemented at a high speed, and deeper learning is used to implement more precise results through course steps. When a user enters a keyword of interest, similarity between the keyword and the course title is executed, and similarity with the extracted video data and voice text is executed, and the highest ranking video data is recommended through the Surprise Library.