• Title/Summary/Keyword: TF-IDF 키워드 추출

Search Result 41, Processing Time 0.026 seconds

Twitter HashTag Recommendation Scheme based on Similar Tweet Analysis (유사 트윗 분석에 기반한 트위터 해시태그 추천기법)

  • Jeon, Mina;Jun, Sanghoon;Hwang, Eenjun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.962-963
    • /
    • 2013
  • 트위터 해시태그(#, HashTag)는 트윗(Tweets)에서 특정 키워드나 내용을 주제별로 분류하고 검색을 보다 효율적으로 사용하기 위한 사용자 정의 태그이다. 사용자가 정의하기에 따라 다양한 형태로 작성되기 때문에 오히려 검색의 효율성이 떨어질 수 있으며, 사용자는 자신이 작성한 트윗에 어떤 해시태그를 추가해야 하는지에 대한 궁금증이 생기는 경우가 발생한다. 본 논문에서는 이러한 문제를 해결하기 위해 사용자가 작성한 트윗에 적합한 해시태그를 추천하는 기법을 제안한다. 수집한 트윗과 해시태그의 키워드를 추출하고 트윗의 유사도를 계산하기 위해 TF-IDF와 Cosine Similarity를 적용하여 유사한 트윗을 갖는 해시태그를 추천한다. 본 논문에서 제안된 기법을 검증하기 위한 실험으로 추천의 정확성을 평가했다.

Automatic English MeSH keywords assignment to Korean medical documents - spacing variant effect (한국어 의학 문서에 대한 영문 MeSH 키워드의 자동 부여 - 띄어쓰기 변이 처리 효과를 중심으로)

  • Lee, Jae-Sung;Kim, Mi-Suk;Lee, Young-Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2004.10d
    • /
    • pp.82-89
    • /
    • 2004
  • 본 논문에서는 한국어 의학 논문의 요약문으로부터 자동 영문 MeSH 키워드 제안 시스템을 소개하고, 띄어쓰기 변이(spacing variant) 문제를 해결할 수 있는 방법을 제안한다. 띄어쓰기 변이란 표준 한글 맞춤법에 비해 다르게 띄어쓰기된 것을 말한다. 이를 위해 시소러스에는 생성 가능한 모든 띄어쓰기 변이 대신에 최대 띄어쓰기 어구만을 저장하고, 문서에서 K-MeSH 용어를 찾기 위해 음절단위 부분문자열 검색을 사용한다. 이 방법으로 한국어 의학 논문의 요약문에서 K-MeSH 용어를 추출한 후, TF-IDF 순위 함수를 이용하여 상위 10위내의 키워드를 저자가 선정한 영문 키워드와 비교한 결과 58%가 일치하였다. 이는 기존 방법에 비해 42%정도의 시소러스 크기가 축소되었고, 상위 10위내에서 영문 MeSH 키워드 추천 재현률이 약 7.8% 증가한 것으로 효과적인 방법임을 보여주었다.

  • PDF

Study of Feature Extraction Algorithm for Harmful word Filtering (유해어 필터링을 위한 자질어 추출 알고리즘에 관한 연구)

  • Jeong Jung-Hoon;Lee Won-Hee;Lee Shin-Won;An Don-Gun;Chung Sung-Jong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06b
    • /
    • pp.7-9
    • /
    • 2006
  • 유해 정보란 정보의 홍수 속에서 무차별적으로 제공되는 음란, 폭력 등의 내용을 담고 있는 정보를 말한다. 이러한 유해 정보들로부터 청소년 등 사회적으로 보호를 받아야 할 인터넷 이용자들을 보호하기 위한 장치가 필요하다. 현재 다양한 방법이 제안되고 연구되고 있다. 본 연구에서는 유해 문서의 필터링을 기법 중 키워드 필터링에서 사용되는 유해어 사전을 위한 자질어 추출 알고리즘에 대해서 비교/연구하였다. 키워드 필터링에서 자질어는 필터링의 성능에 많은 영향을 미친다. 따라서 필터링의 성능을 높이기 위한 자질어 추출 알고리즘 선택은 매우 중요하다. 이에 본 논문에서는 다양한 알고리즘을 비교 분석하여 정확하고 효율적인 자질어 추출 알고리즘 조합을 찾고자 하였다. 그 결과 CHI/TF-IDF 조합이 높은 성능을 보였으며 92%의 정확도를 얻을 수 있었다.

  • PDF

An Exploratory Study of VR Technology using Patents and News Articles (특허와 뉴스 기사를 이용한 가상현실 기술에 관한 탐색적 연구)

  • Kim, Sungbum
    • Journal of Digital Convergence
    • /
    • v.16 no.11
    • /
    • pp.185-199
    • /
    • 2018
  • The purpose of this study is to derive the core technologies of VR using patent analysis and to explore the direction of social and public interest in VR using news analysis. In Study 1, we derived keywords using the frequency of words in patent texts, and we compared by company, year, and technical classification. Netminer, a network analysis program, was used to analyze the IPC codes of patents. In Study 2, we analyzed news articles using T-LAB program. TF-IDF was used as a keyword selection method and chi-square and association index algorithms were used to extract the words most relevant to VR. Through this study, we confirmed that VR is a fusion technology including optics, head mounted display (HMD), data analysis, electric and electronic technology, and found that optical technology is the central technology among the technologies currently being developed. In addition, through news articles, we found that the society and the public are interested in the formation and growth of VR suppliers and markets, and VR should be developed on the basis of user experience.

Social awareness of Arduino and artificial intelligence using big data analysis

  • Eun-Sang, Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.1
    • /
    • pp.189-199
    • /
    • 2023
  • This study aimed to identify the development direction of Arduino-based boards relating to artificial intelligence based on social awareness identified using big data analytical methods. For the purpose, big data were extracted through the Textom website, focusing on keywords that included 'Arduino + artificial intelligence' and 'Arduino + AI', and these data were refined and analyzed using the Textom website and the UNICET program. In this study, big data analyses, including frequency analysis, TF-IDF analysis, Degree Centrality analysis, N-gram analysis, and CONCOR analysis, were performed. The analyses' results confirmed that keywords relating to education and coding education, keywords relating to making and experience based on Arduino, and keywords relating to programs were the main keywords used in Arduino- and artificial intelligence-related Internet documents, and clusters were formed based on these keywords confirmed. The social awareness of Arduino and artificial intelligence was evaluated, and the direction of board development was identified based on this social awareness. This study is meaningful in that it identified various factors of board development based on the general public's social awareness, which was evaluated using a big data analysis method. This study may serve as a point of reference for future researchers or developers wishing to understand user needs using big data analysis methods.

Associated Keyword Recommendation System for Keyword-based Blog Marketing (키워드 기반 블로그 마케팅을 위한 연관 키워드 추천 시스템)

  • Choi, Sung-Ja;Son, Min-Young;Kim, Young-Hak
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.5
    • /
    • pp.246-251
    • /
    • 2016
  • Recently, the influence of SNS and online media is rapidly growing with a consequent increase in the interest of marketing using these tools. Blog marketing can increase the ripple effect and information delivery in marketing at low cost by prioritizing keyword search results of influential portal sites. However, because of the tough competition to gain top ranking of search results of specific keywords, long-term and proactive efforts are needed. Therefore, we propose a new method that recommends associated keyword groups with the possibility of higher exposure of the blog. The proposed method first collects the documents of blog including search results of target keyword, and extracts and filters keyword with higher association considering the frequency and location information of the word. Next, each associated keyword is compared to target keyword, and then associated keyword group with the possibility of higher exposure is recommended considering the information such as their association, search amount of associated keyword per month, the number of blogs including in search result, and average writhing date of blogs. The experiment result shows that the proposed method recommends keyword group with higher association.

Multimodal Media Content Classification using Keyword Weighting for Recommendation (추천을 위한 키워드 가중치를 이용한 멀티모달 미디어 콘텐츠 분류)

  • Kang, Ji-Soo;Baek, Ji-Won;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.5
    • /
    • pp.1-6
    • /
    • 2019
  • As the mobile market expands, a variety of platforms are available to provide multimodal media content. Multimodal media content contains heterogeneous data, accordingly, user requires much time and effort to select preferred content. Therefore, in this paper we propose multimodal media content classification using keyword weighting for recommendation. The proposed method extracts keyword that best represent contents through keyword weighting in text data of multimodal media contents. Based on the extracted data, genre class with subclass are generated and classify appropriate multimodal media contents. In addition, the user's preference evaluation is performed for personalized recommendation, and multimodal content is recommended based on the result of the user's content preference analysis. The performance evaluation verifies that it is superiority of recommendation results through the accuracy and satisfaction. The recommendation accuracy is 74.62% and the satisfaction rate is 69.1%, because it is recommended considering the user's favorite the keyword as well as the genre.

A study on Korean tourism trends using social big data -Focusing on sentiment analysis- (소셜 빅데이터를 활용한 한국관광 트렌드에 관한연구 -감성분석을 중심으로-)

  • Youn-hee Choi;Kyoung-mi Yoo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.97-109
    • /
    • 2024
  • In the field of domestic tourism, tourism trend analysis of tourism consumers, both international tourists and domestic tourists, is essential not only for the Korean tourism market but also for local and governmental tourism policy makers. e will explore the keywords and sentiment analysis on social media to establish a marketing strategy plan and revitalize the domestic tourism industry through communication and information from tourism consumers. This study utilized TEXTOM 6.0 to analyze recent trends in Korean tourism. Data was collected from September 31, 2022, to August 31, 2023, using 'Korean tourism' and 'domestic tourism' as keywords, targeting blogs, cafes, and news provided by Naver, Daum, and Google. Through text mining, 100 key words and TF-IDF were extracted in order of frequency, and then CONCOR analysis and sentiment analysis were conducted. For Korean tourism keywords, words related to tourist destinations, travel companions and behaviors, tourism motivations and experiences, accommodation types, tourist information, and emotional connections ranked high. The results of the CONCOR analysis were categorized into five clusters related to tourist destinations, tourist information, tourist activities/experiences, tourism motivation/content, and inbound related. Finally, the sentiment analysis showed a high level of positive documents and vocabulary. This study analyzes the rapidly changing trends of Korean tourism through text mining on Korean tourism and is expected to provide meaningful data to promote domestic tourism not only for Koreans but also for foreigners visiting Korea.

Text Mining and Association Rules Analysis to a Self-Introduction Letter of Freshman at Korea National College of Agricultural and Fisheries (1) (한국농수산대학 신입생 자기소개서의 텍스트 마이닝과 연관규칙 분석 (1))

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.22 no.1
    • /
    • pp.113-129
    • /
    • 2020
  • In this study we examined the topic analysis and correlation analysis by text mining to extract meaningful information or rules from the self introduction letter of freshman at Korea National College of Agriculture and Fisheries in 2020. The analysis items are described in items related to 'academic' and 'in-school activities' during high school. In the text mining results, the keywords of 'academic' items were 'study', 'thought', 'effort', 'problem', 'friend', and the key words of 'in-school activities' were 'activity', 'thought', 'friend', 'club', 'school' in order. As a result of the correlation analysis, the key words of 'thinking', 'studying', 'effort', and 'time' played a central role in the 'academic' item. And the key words of 'in-school activities' were 'thought', 'activity', 'school', 'time', and 'friend'. The results of frequency analysis and association analysis were visualized with word cloud and correlation graphs to make it easier to understand all the results. In the next study, TF-IDF(Term Frequency-Inverse Document Frequency) analysis using 'frequency of keywords' and 'reverse of document frequency' will be performed as a method of extracting key words from a large amount of documents.

Hot Topic Prediction Scheme Considering User Influences in Social Networks (소셜 네트워크에서 사용자의 영향력을 고려한 핫 토픽 예측 기법)

  • Noh, Yeon-woo;Kim, Dae-yun;Han, Jieun;Yook, Misun;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.24-36
    • /
    • 2015
  • Recently, interests in detecting hot topics have been significantly growing as it becomes important to find out and analyze meaningful information from the large amount of data which flows in from social network services. Since it deals with a number of random writings that are not confirmed in advance due to the characteristics of SNS, there is a problem that the reliability of the results declines when hot topics are predicted from the writings. To solve such a problem, this paper proposes a high reliable hot topic prediction scheme considering user influences in social networks. The proposed scheme extracts a set of keywords with hot issues instantly through the modified TF-IDF algorithm based on Twitter. It improves the reliability of the results of hot topic prediction by giving weights of user influences to the tweets. To show the superiority of the proposed scheme, we compare it with the existing scheme through performance evaluation. Our experimental results show that our proposed method has improved precision and recall compared to the existing method.