• Title/Summary/Keyword: Keyword Data

Search Result 925, Processing Time 0.024 seconds

Efficient Keyword Extraction from Social Big Data Based on Cohesion Scoring

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.10
    • /
    • pp.87-94
    • /
    • 2020
  • Social reviews such as SNS feeds and blog articles have been widely used to extract keywords reflecting opinions and complaints from users' perspective, and often include proper nouns or new words reflecting recent trends. In general, these words are not included in a dictionary, so conventional morphological analyzers may not detect and extract those words from the reviews properly. In addition, due to their high processing time, it is inadequate to provide analysis results in a timely manner. This paper presents a method for efficient keyword extraction from social reviews based on the notion of cohesion scoring. Cohesion scores can be calculated based on word frequencies, so keyword extraction can be performed without a dictionary when using it. On the other hand, their accuracy can be degraded when input data with poor spacing is given. Regarding this, an algorithm is presented which improves the existing cohesion scoring mechanism using the structure of a word tree. Our experiment results show that it took only 0.008 seconds to extract keywords from 1,000 reviews in the proposed method while resulting in 15.5% error ratio which is better than the existing morphological analyzers.

'Hot Search Keyword' Rank-Change Prediction (인기 검색어의 순위 변화 예측)

  • Kim, Dohyeong;Kang, Byeong Ho;Lee, Sungyoung
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.782-790
    • /
    • 2017
  • The service, 'Hot Search Keywords', provides a list of the most hot search terms of different web services such as Naver or Daum. The service, bases the changes in rank of a specific search keyword on changes in its users' interest. This paper introduces a temporal modelling framework for predicting the rank change of hot search keywords using past rank data and machine learning. Past rank data shows that more than 70% of hot search keywords tend to disappear and reappear later. The authors processed missing rank value, using deletion, dummy variables, mean substitution, and expectation maximization. It is however crucial to calculate the optimal window size of the past rank data. We proposed an optimal window size selection approach based on the minimum amount of time a topic within the same or a differing context disappeared. The experiments were conducted with four different machine-learning techniques using the Naver, Daum, and Nate 'Hot Search Keywords' datasets, which were collected for 2 years.

Relevant Keyword Collection using Click-log (클릭로그를 이용한 연관키워드 수집)

  • Ahn, Kwang-Mo;Seo, Young-Hoon;Heo, Jeong;Lee, Chung-Hee;Jang, Myung-Gil
    • The KIPS Transactions:PartB
    • /
    • v.19B no.2
    • /
    • pp.149-154
    • /
    • 2012
  • The aim of this paper is to collect relevant keywords from clicklog data including user's keywords and URLs accessed using them. Our main hyphothesis is that two or more different keywords may be relevant if users access same URLs using them. Also, they should have higher relationship when the more same URLs are accessed using them. To validate our idea, we collect relevant keywords from clicklog data which is offered by a portal site. As a result, our experiment shows 89.32% precision when we define answer set to only semantically same words, and 99.03% when we define answer set to broader sense. Our approach has merits that it is independent on language and collects relevant words from real world data.

Hot Keyword Extraction of Sci-tech Periodicals Based on the Improved BERT Model

  • Liu, Bing;Lv, Zhijun;Zhu, Nan;Chang, Dongyu;Lu, Mengxin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1800-1817
    • /
    • 2022
  • With the development of the economy and the improvement of living standards, the hot issues in the subject area have become the main research direction, and the mining of the hot issues in the subject currently has problems such as a large amount of data and a complex algorithm structure. Therefore, in response to this problem, this study proposes a method for extracting hot keywords in scientific journals based on the improved BERT model.It can also provide reference for researchers,and the research method improves the overall similarity measure of the ensemble,introducing compound keyword word density, combining word segmentation, word sense set distance, and density clustering to construct an improved BERT framework, establish a composite keyword heat analysis model based on I-BERT framework.Taking the 14420 articles published in 21 kinds of social science management periodicals collected by CNKI(China National Knowledge Infrastructure) in 2017-2019 as the experimental data, the superiority of the proposed method is verified by the data of word spacing, class spacing, extraction accuracy and recall of hot keywords. In the experimental process of this research, it can be found that the method proposed in this paper has a higher accuracy than other methods in extracting hot keywords, which can ensure the timeliness and accuracy of scientific journals in capturing hot topics in the discipline, and finally pass Use information technology to master popular key words.

A Study on Keyword Spotting System Using Pseudo N-gram Language Model (의사 N-gram 언어모델을 이용한 핵심어 검출 시스템에 관한 연구)

  • 이여송;김주곤;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.242-247
    • /
    • 2004
  • Conventional keyword spotting systems use the connected word recognition network consisted by keyword models and filler models in keyword spotting. This is why the system can not construct the language models of word appearance effectively for detecting keywords in large vocabulary continuous speech recognition system with large text data. In this paper to solve this problem, we propose a keyword spotting system using pseudo N-gram language model for detecting key-words and investigate the performance of the system upon the changes of the frequencies of appearances of both keywords and filler models. As the results, when the Unigram probability of keywords and filler models were set to 0.2, 0.8, the experimental results showed that CA (Correctly Accept for In-Vocabulary) and CR (Correctly Reject for Out-Of-Vocabulary) were 91.1% and 91.7% respectively, which means that our proposed system can get 14% of improved average CA-CR performance than conventional methods in ERR (Error Reduction Rate).

Indexing and Storage Schemes for Keyword-based Query Processing over Semantic Web Data (시맨틱 웹 데이터의 키워드 질의 처리를 위한 인덱싱 및 저장 기법)

  • Kim, Youn-Hee;Shin, Hye-Yeon;Lim, Hae-Chull;Chong, Kyun-Rak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.5
    • /
    • pp.93-102
    • /
    • 2007
  • Metadata and ontology can be used to retrieve related information through the inference mure accurately and simply on the Semantic Web. RDF and RDF Schema are general languages for representing metadata and ontology. An enormous number of keywords on the Semantic Web are very important to make practical applications of the Semantic Web because most users prefer to search with keywords. In this paper, we consider a resource as a unit of query results. And we classily queries with keyword conditions into three patterns and propose indexing techniques for keyword-search considering both metadata and ontology. Our index maintains resources that contain keywords indirectly using conceptual relationships between resources as well as resources that contain keywords directly. So, if user wants to search resources that contain a certain keyword, all resources are retrieved using our keyword index. We propose a structure of table for storing RDF Schema information that is labeled using some simple methods.

  • PDF

Investigation of Research Trends in the D(Data)·N(Network)·A(A.I) Field Using the Dynamic Topic Model (다이나믹 토픽 모델을 활용한 D(Data)·N(Network)·A(A.I) 중심의 연구동향 분석)

  • Wo, Chang Woo;Lee, Jong Yun
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.21-29
    • /
    • 2020
  • The Topic Modeling research, the methodology for deduction keyword within literature, has become active with the explosion of data from digital society transition. The research objective is to investigate research trends in D.N.A.(Data, Network, Artificial Intelligence) field using DTM(Dynamic Topic Model). DTM model was applied to the 1,519 of research projects with SW·A.I technology classifications among ICT(Information and Communication Technology) field projects between 6 years(2015~2020). As a result, technology keyword for D.N.A. field; Big data, Cloud, Artificial Intelligence, extended keyword; Unstructured, Edge Computing, Learning, Recognition was appeared every year, and accordingly that the above technology is being researched inclusively from other projects can be inferred. Finally, it is expected that the result from this paper become useful for future policy·R&D planning and corporation's technology·marketing strategy.

Privacy Preserving Keyword Search with Access Control based on DTLS (프라이버시를 보호하는 접근제어가 가능한 키워드 검색 기법)

  • Noh, Geon-Tae;Chun, Ji-Young;Jeong, Ik-Rae;Lee, Dong-Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.19 no.5
    • /
    • pp.35-44
    • /
    • 2009
  • To protect sensitive personal information, data will be stored in encrypted form. However in order to retrieve these encrypted data without decryption, there need efficient search methods to enable the retrieval of the encrypted data. Until now, a number of searchable encryption schemes have been proposed but these schemes are not suitable when dynamic users who have the permission to access the data share the encrypted data. Since, in previous searchable encryption schemes, only specific user who is the data owner in symmetric key settings or has the secret key corresponding to the public key for the encrypted data in asymmetric key settings can access to the encrypted data. To solve this problem, Stephen S. Yau et al. firstly proposed the controlled privacy preserving keyword search scheme which can control the search capabilities of users according to access policies of the data provider. However, this scheme has the problem that the privacy of the data retrievers can be breached. In this paper, we firstly analyze the weakness of Stephen S. Yau et al.'s scheme and propose privacy preserving keyword search with access control. Our proposed scheme preserves the privacy of data retrievers.

A Study on analyzing brand character of myth material, relevant keyword and relevance with big data of portal site and SNS (포털사이트, SNS의 빅데이터를 이용한 신화소재의 브랜드 캐릭터와 연관어, 연관도 분석)

  • Oh, Sejong;Doo, Illchul
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.1
    • /
    • pp.157-169
    • /
    • 2015
  • In digital marketing, means of public relations and marketing of enterprises are changing into marketing techniques of predictive analytics. A significant study can be carried out by an analysis of 'the patterns of customers' uses' using big data on major portal sites and SNSs and their correlation with related keywords. This study analyzes the origins of mythological characters in major brands such as Nike, Hermes, Versace, Canon and Starbucks. Also, it extracts related keywords and relevance using big data on portal sites and SNS and their correlation. Nike marketing that reminds people of 'the goddess of victory, Nike' formed a good combination of the brand with relevance. Most of them are based on Greek mythology and have rich materials for storytelling and artistic values in common. Hopefully, this case analysis of foreign brands would become a starting point of discovering the materials of the domestic mythological characters.

An Efficient Web Search Method Based on a Style-based Keyword Extraction and a Keyword Mining Profile (스타일 기반 키워드 추출 및 키워드 마이닝 프로파일 기반 웹 검색 방법)

  • Joo, Kil-Hong;Lee, Jun-Hwl;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1049-1062
    • /
    • 2004
  • With the popularization of a World Wide Web (WWW), the quantity of web information has been increased. Therefore, an efficient searching system is needed to offer the exact result of diverse Information to user. Due to this reason, it is important to extract and analysis of user requirements in the distributed information environment. The conventional searching method used the only keyword for the web searching. However, the searching method proposed in this paper adds the context information of keyword for the effective searching. In addition, this searching method extracts keywords by the new keyword extraction method proposed in this paper and it executes the web searching based on a keyword mining profile generated by the extracted keywords. Unlike the conventional searching method which searched for information by a representative word, this searching method proposed in this paper is much more efficient and exact. This is because this searching method proposed in this paper is searched by the example based query included content information as well as a representative word. Moreover, this searching method makes a domain keyword list in order to perform search quietly. The domain keyword is a representative word of a special domain. The performance of the proposed algorithm is analyzed by a series of experiments to identify its various characteristic.