• Title/Summary/Keyword: 단어빈도

Search Result 546, Processing Time 0.024 seconds

Investigating an Automatic Method in Summarizing a Video Speech Using User-Assigned Tags (이용자 태그를 활용한 비디오 스피치 요약의 자동 생성 연구)

  • Kim, Hyun-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.46 no.1
    • /
    • pp.163-181
    • /
    • 2012
  • We investigated how useful video tags were in summarizing video speech and how valuable positional information was for speech summarization. Furthermore, we examined the similarity among sentences selected for a speech summary to reduce its redundancy. Based on such analysis results, we then designed and evaluated a method for automatically summarizing speech transcripts using a modified Maximum Marginal Relevance model. This model did not only reduce redundancy but it also enabled the use of social tags, title words, and sentence positional information. Finally, we compared the proposed method to the Extractor system in which key sentences of a video speech were chosen using the frequency and location information of speech content words. Results showed that the precision and recall rates of the proposed method were higher than those of the Extractor system, although there was no significant difference in the recall rates.

Analysis Study on Trends of Library Development Plan by Using Big Data Analysis (빅데이터 분석 기법을 활용한 도서관발전종합계획 동향 분석 연구)

  • Kim, Dongseok;Noh, Younghee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.29 no.2
    • /
    • pp.85-108
    • /
    • 2018
  • This study aimed to analyze media reports of the Comprehensive Library Advancement Plan using big data analysis in order to determine trends and implications by period. To do so, related data from 2009 to 2017 were collected from major domestic web portal sites. Words in the collected data were refined through the text mining process and frequency, centrality, and structural equivalence analyses were performed. Results confirmed that, during the implementation of the first and the second phases of the Comprehensive Library Advancement Plan, the focus of the library policy changed from external growth to strengthening internal stability and advancement of library operation, and the media coverage were limited to specific policies such as expansion of library facilities. Findings from this study will serve as useful material for ascertaining the approach to perceive and understand the national library policy represented by the Comprehensive Library Advancement Plan.

Topic-Network based Topic Shift Detection on Twitter (트위터 데이터를 이용한 네트워크 기반 토픽 변화 추적 연구)

  • Jin, Seol A;Heo, Go Eun;Jeong, Yoo Kyung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.1
    • /
    • pp.285-302
    • /
    • 2013
  • This study identified topic shifts and patterns over time by analyzing an enormous amount of Twitter data whose characteristics are high accessibility and briefness. First, we extracted keywords for a certain product and used them for representing the topic network allows for intuitive understanding of keywords associated with topics by nodes and edges by co-word analysis. We conducted temporal analysis of term co-occurrence as well as topic modeling to examine the results of network analysis. In addition, the results of comparing topic shifts on Twitter with the corresponding retrieval results from newspapers confirm that Twitter makes immediate responses to news media and spreads the negative issues out quickly. Our findings may suggest that companies utilize the proposed technique to identify public's negative opinions as quickly as possible and to apply for the timely decision making and effective responses to their customers.

The Blog Polarity Classification Technique using Opinion Mining (오피니언 마이닝을 활용한 블로그의 극성 분류 기법)

  • Lee, Jong-Hyuk;Lee, Won-Sang;Park, Jea-Won;Choi, Jae-Hyun
    • Journal of Digital Contents Society
    • /
    • v.15 no.4
    • /
    • pp.559-568
    • /
    • 2014
  • Previous polarity classification using sentiment analysis utilizes a sentence rule by product reviews based rating points. It is difficult to be applied to blogs which have not rating of product reviews and is possible to fabricate product reviews by comment part-timers and managers who use web site so it is not easy to understand a product and store reviews which are reliability. Considering to these problems, if we analyze blogs which have personal and frank opinions and classify polarity, it is possible to understand rightly opinions for the product, store. This paper suggests that we extract high frequency vocabularies in blogs by several domains and choose topic words. Then we apply a technique of sentiment analysis and classify polarity about contents of blogs. To evaluate performances of sentiment analysis, we utilize the measurement index that use Precision, Recall, F-Score in an information retrieval field. In a result of evaluation, using suggested sentiment analysis is the better performances to classify polarity than previous techniques of using the sentence rule based product reviews.

E-commerce data based Sentiment Analysis Model Implementation using Natural Language Processing Model (자연어처리 모델을 이용한 이커머스 데이터 기반 감성 분석 모델 구축)

  • Choi, Jun-Young;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.11
    • /
    • pp.33-39
    • /
    • 2020
  • In the field of Natural Language Processing, Various research such as Translation, POS Tagging, Q&A, and Sentiment Analysis are globally being carried out. Sentiment Analysis shows high classification performance for English single-domain datasets by pretrained sentence embedding models. In this thesis, the classification performance is compared by Korean E-commerce online dataset with various domain attributes and 6 Neural-Net models are built as BOW (Bag Of Word), LSTM[1], Attention, CNN[2], ELMo[3], and BERT(KoBERT)[4]. It has been confirmed that the performance of pretrained sentence embedding models are higher than word embedding models. In addition, practical Neural-Net model composition is proposed after comparing classification performance on dataset with 17 categories. Furthermore, the way of compressing sentence embedding model is mentioned as future work, considering inference time against model capacity on real-time service.

Personalized Web Search using Query based User Profile (질의기반 사용자 프로파일을 이용하는 개인화 웹 검색)

  • Yoon, Sung Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.2
    • /
    • pp.690-696
    • /
    • 2016
  • Search engines that rely on morphological matching of user query and web document content do not support individual interests. This research proposes a personalized web search scheme that returns the results that reflect the users' query intent and personal preferences. The performance of the personalized search depends on using an effective user profiling strategy to accurately capture the users' personal interests. In this study, the user profiles are the databases of topic words and customized weights based on the recent user queries and the frequency of topic words in click history. To determine the precise meaning of ambiguous queries and topic words, this strategy uses WordNet to calculate the semantic relatedness to words in the user profile. The experiments were conducted by installing a query expansion and re-ranking modules on the general web search systems. The results showed that this method has 92% precision and 82% recall in the top 10 search results, proving the enhanced performance.

Automatic Keyword Extraction System for Korean Documents Information Retrieval (국내(國內) 문헌정보(文獻情報) 검색(檢索)을 위한 키워드 자동추출(自動抽出) 시스템 개발(開發))

  • Yae, Yong-Hee
    • Journal of Information Management
    • /
    • v.23 no.1
    • /
    • pp.39-62
    • /
    • 1992
  • In this paper about 60 auxiliary words and 320 stopwords are selected from analysis of sample data, four types of stop word are classified left, right and - auxiliary word truncation & normal. And a keyword extraction system is suggested which undertakes efficient truncation of auxiliary word from words, conversion of Chinese word to Korean and exclusion of stopword. The selected keyeords in this system show 92.2% of accordance ratio compared with manually selected keywords by expert. And then compound words consist of $4{\sim}6$ character generate twice of additional new words and 58.8% words of those are useful as keyword.

  • PDF

An analysis of mobile communication environment by a socio-technical approach (사회기술적 접근방식을 통한 모바일 통신환경 분석)

  • Lee, Hyun Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.2
    • /
    • pp.59-69
    • /
    • 2013
  • By the end of 2012, there were 5.1 billion mobile communication subscribers in South Korea. The majority of subscriptions (97.3 percent) were for mobile phones, particularly smartphones. From 2010 to 2012, the number of mobile phone subscriptions increased 2,000 percent, growing from an initial subscriber base of 150 million. This paper explores the changing mobile communication landscape from a socio-technological perspective to understand the underlying drivers of change and their effects on the South Korean populace. A content analysis of 11,156 electronic newspaper articles mentioning mobile communication in South Korea and occurring between 2010 and 2012 was conducted. 5,119 keywords were extracted based on frequency statistics and further analyzed to determine the drivers of change. Based on this analysis, we conclude that South Korea's mobile communication environment is focused on rapid expansion of technology with minor consideration given to the social aspects of this change. This has resulted in several negative consequences and we urge for new policies by government and industry to address this gap.

A Study on the Core Values of Presidents Based on the Content Analysis of the Presidential Speech Archives (대통령 연설기록 내용분석을 통한 역대 대통령의 중심가치 연구)

  • Park, JunHyeong;Yoo, Ho-Suon;Kim, Tae-Young;Han, Hui Jeong;Oh, Hyo-Jung
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.17 no.2
    • /
    • pp.57-78
    • /
    • 2017
  • This study reveals the core values of presidents based on the content analysis of the Presidential Speech Archives and examines the policy direction of each government from a macro perspective. For this purpose, we collected the speech archives provided by the Presidential Archives and compared central words. The Presidential Speech Archives is helpful for understanding the problem-solving capacity and consciousness of presidents. Among them, we particularly selected statements in the diplomatic and trade fields as the main study targets. As a result, the presidents have basically pursued peaceful resolution and cooperation on diplomatic issues. In addition, they placed a high priority in the implementation of the economic policy in the diplomatic and trade fields.

A Study on Graph-based Topic Extraction from Microblogs (마이크로블로그를 통한 그래프 기반의 토픽 추출에 관한 연구)

  • Choi, Don-Jung;Lee, Sung-Woo;Kim, Jae-Kwang;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.5
    • /
    • pp.564-568
    • /
    • 2011
  • Microblogs became popular information delivery ways due to the spread of smart phones. They have the characteristic of reflecting the interests of users more quickly than other medium. Particularly, in case of the subject which attracts many users, microblogs can supply rich information originated from various information sources. Nevertheless, it has been considered as a hard problem to obtain useful information from microblogs because too much noises are in them. So far, various methods are proposed to extract and track some subjects from particular documents, yet these methods do not work effectively in case of microblogs which consist of short phrases. In this paper, we propose a graph-based topic extraction and partitioning method to understand interests of users about a certain keyword. The proposed method contains the process of generating a keyword graph using the co-occurrences of terms in the microblogs, and the process of splitting the graph by using a network partitioning method. When we applied the proposed method on some keywords. our method shows good performance for finding a topic about the keyword and partitioning the topic into sub-topics.