• Title/Summary/Keyword: TF-IDF 키워드 추출

Search Result 42, Processing Time 0.021 seconds

Hot Topic Prediction Scheme Considering User Influences in Social Networks (소셜 네트워크에서 사용자의 영향력을 고려한 핫 토픽 예측 기법)

  • Noh, Yeon-woo;Kim, Dae-yun;Han, Jieun;Yook, Misun;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.24-36
    • /
    • 2015
  • Recently, interests in detecting hot topics have been significantly growing as it becomes important to find out and analyze meaningful information from the large amount of data which flows in from social network services. Since it deals with a number of random writings that are not confirmed in advance due to the characteristics of SNS, there is a problem that the reliability of the results declines when hot topics are predicted from the writings. To solve such a problem, this paper proposes a high reliable hot topic prediction scheme considering user influences in social networks. The proposed scheme extracts a set of keywords with hot issues instantly through the modified TF-IDF algorithm based on Twitter. It improves the reliability of the results of hot topic prediction by giving weights of user influences to the tweets. To show the superiority of the proposed scheme, we compare it with the existing scheme through performance evaluation. Our experimental results show that our proposed method has improved precision and recall compared to the existing method.

Analyzing data-related policy programs in Korea using text mining and network cluster analysis (텍스트 마이닝과 네트워크 군집 분석을 활용한 한국의 데이터 관련 정책사업 분석)

  • Sungjun Choi;Kiyoon Shin;Yoonhwan Oh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.6
    • /
    • pp.63-81
    • /
    • 2023
  • This study endeavors to classify and categorize similar policy programs through network clustering analysis, using textual information from data-related policy programs in Korea. To achieve this, descriptions of data-related budgetary programs in South Korea in 2022 were collected, and keywords from the program contents were extracted. Subsequently, the similarity between each program was derived using TF-IDF, and policy program network was constructed accordingly. Following this, the structural characteristics of the network were analyzed, and similar policy programs were clustered and categorized through network clustering. Upon analyzing a total of 97 programs, 7 major clusters were identified, signifying that programs with analogous themes or objectives were categorized based on application area or services utilizing data. The findings of this research illuminate the current status of data-related policy programs in Korea, providing policy implications for a strategic approach to planning future national data strategies and programs, and contributing to the establishment of evidence-based policies.

A Study on the Analysis of Related Information through the Establishment of the National Core Technology Network: Focused on Display Technology (국가핵심기술 관계망 구축을 통한 연관정보 분석연구: 디스플레이 기술을 중심으로)

  • Pak, Se Hee;Yoon, Won Seok;Chang, Hang Bae
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.2
    • /
    • pp.123-141
    • /
    • 2021
  • As the dependence of technology on the economic structure increases, the importance of National Core Technology is increasing. However, due to the nature of the technology itself, it is difficult to determine the scope of the technology to be protected because the scope of the relation is abstract and information disclosure is limited due to the nature of the National Core Technology. To solve this problem, we propose the most appropriate literature type and method of analysis to distinguish important technologies related to National Core Technology. We conducted a pilot test to apply TF-IDF, and LDA topic modeling, two techniques of text mining analysis for big data analysis, to four types of literature (news, papers, reports, patents) collected with National Core Technology keywords in the field of Display industry. As a result, applying LDA theme modeling to patent data are highly relevant to National Core Technology. Important technologies related to the front and rear industries of displays, including OLEDs and microLEDs, were identified, and the results were visualized as networks to clarify the scope of important technologies associated with National Core Technology. Throughout this study, we have clarified the ambiguity of the scope of association of technologies and overcome the limited information disclosure characteristics of national core technologies.

NFT(Non-Fungible Token) Patent Trend Analysis using Topic Modeling

  • Sin-Nyum Choi;Woong Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.41-48
    • /
    • 2023
  • In this paper, we propose an analysis of recent trends in the NFT (Non-Fungible Token) industry using topic modeling techniques, focusing on their universal application across various industrial fields. For this study, patent data was utilized to understand industry trends. We collected data on 371 domestic and 454 international NFT-related patents registered in the patent information search service KIPRIS from 2017, when the first NFT standard was introduced, to October 2023. In the preprocessing stage, stopwords and lemmas were removed, and only noun words were extracted. For the analysis, the top 50 words by frequency were listed, and their corresponding TF-IDF values were examined to derive key keywords of the industry trends. Next, Using the LDA algorithm, we identified four major latent topics within the patent data, both domestically and internationally. We analyzed these topics and presented our findings on NFT industry trends, underpinned by real-world industry cases. While previous review presented trends from an academic perspective using paper data, this study is significant as it provides practical trend information based on data rooted in field practice. It is expected to be a useful reference for professionals in the NFT industry for understanding market conditions and generating new items.

Analysis of ICT Education Trends using Keyword Occurrence Frequency Analysis and CONCOR Technique (키워드 출현 빈도 분석과 CONCOR 기법을 이용한 ICT 교육 동향 분석)

  • Youngseok Lee
    • Journal of Industrial Convergence
    • /
    • v.21 no.1
    • /
    • pp.187-192
    • /
    • 2023
  • In this study, trends in ICT education were investigated by analyzing the frequency of appearance of keywords related to machine learning and using conversion of iteration correction(CONCOR) techniques. A total of 304 papers from 2018 to the present published in registered sites were searched on Google Scalar using "ICT education" as the keyword, and 60 papers pertaining to ICT education were selected based on a systematic literature review. Subsequently, keywords were extracted based on the title and summary of the paper. For word frequency and indicator data, 49 keywords with high appearance frequency were extracted by analyzing frequency, via the term frequency-inverse document frequency technique in natural language processing, and words with simultaneous appearance frequency. The relationship degree was verified by analyzing the connection structure and centrality of the connection degree between words, and a cluster composed of words with similarity was derived via CONCOR analysis. First, "education," "research," "result," "utilization," and "analysis" were analyzed as main keywords. Second, by analyzing an N-GRAM network graph with "education" as the keyword, "curriculum" and "utilization" were shown to exhibit the highest correlation level. Third, by conducting a cluster analysis with "education" as the keyword, five groups were formed: "curriculum," "programming," "student," "improvement," and "information." These results indicate that practical research necessary for ICT education can be conducted by analyzing ICT education trends and identifying trends.

Patent data analysis using clique analysis in a keyword network (키워드 네트워크의 클릭 분석을 이용한 특허 데이터 분석)

  • Kim, Hyon Hee;Kim, Donggeon;Jo, Jinnam
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1273-1284
    • /
    • 2016
  • In this paper, we analyzed the patents on machine learning using keyword network analysis and clique analysis. To construct a keyword network, important keywords were extracted based on the TF-IDF weight and their association, and network structure analysis and clique analysis was performed. Density and clustering coefficient of the patent keyword network are low, which shows that patent keywords on machine learning are weakly connected with each other. It is because the important patents on machine learning are mainly registered in the application system of machine learning rather thant machine learning techniques. Also, our results of clique analysis showed that the keywords found by cliques in 2005 patents are the subjects such as newsmaker verification, product forecasting, virus detection, biomarkers, and workflow management, while those in 2015 patents contain the subjects such as digital imaging, payment card, calling system, mammogram system, price prediction, etc. The clique analysis can be used not only for identifying specialized subjects, but also for search keywords in patent search systems.

Design and Implementation of an Analysis module based on MapReduce for Large-scalable Social Data (대용량 소셜 데이터의 의미 분석을 위한 MapReduce 기반의 분석 모듈 설계 및 구현)

  • Lee, Hyeok-Ju;Kim, Myoung-Jin;Lee, Han-Ku;Yoon, Hyo-Gun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06b
    • /
    • pp.357-360
    • /
    • 2011
  • 최근 인터넷과 통신기술, 특히 모바일과 관련된 기술의 급속한 발전으로 소셜 커뮤니케이션 수단으로 대표되는 SNS(Social Networking Service)가 중요한 이슈로 부각되어지고 있다. SNS 서비스 제공시 중요하게 고려되어져야 할 사항은 정확하고 의미 있는 데이터를 통해서 사용자가 원하고 관심 있는 분야의 정보를 어떻게 제공할 것인가에 초점이 맞춰져 있어야 한다. 그러나 최근 폭발적으로 증가되어지고 있는 소셜 데이터 때문에 사용자는 의미 분석이 정확하게 이루어지지 않은 신뢰성이 결여된 소셜 커뮤니케이션 서비스를 제공받고 있다. 이러한 소셜데이터 분석의 문제점을 해결하기 위해서 본 논문에서는 소셜 네트워크 서비스에 필요한 데이터를 수집하고, 클라우드 컴퓨팅 환경에서 수집된 대용량 SNS 데이터의 의미를 분석 할 수 있는 MapReduce 기반의 분석 모듈의 구조를 제안하였다. 제안한 모듈은 의미 분석에 필요한 소셜 데이터를 수집하는 수집 기능과 수집된 소셜데이터의 의미 분석을 수행하는 분석 기능을 포함하고 있다. 수집 기능은 SNS에서 생성되는 텍스트 형태의 데이터를 수집하고 MapReduce를 통해서 데이터를 분석하기 쉽게 적절한 크기로 생성된 파일을 분할한다. 수집된 소셜 데이터의 의미 분석은 기존 TF-IDF 방식에 개선된 Weighted-MINMAX 적용한 알고리즘을 통해서 구현하였다. 개선된 알고리즘은 단어의 중요도를 평가하고, 중요도가 높은 단어로 구성된 의미정보 제공 서비스를 지원한다. 시스템의 성능 평가를 위해서 노드별 데이터 처리시간과 추출 키워드의 정확도를 측정하였다.

An Analysis on Media Trends in Public Agency for Social Service Applying Text Mining (텍스트 마이닝을 적용한 사회서비스원 언론보도기사 분석)

  • Park, Hae-Keung;Youn, Ki-Hyok
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.2
    • /
    • pp.41-48
    • /
    • 2022
  • This study tried to empirically explore which issues related to the social service agency for public(as below SSA), that is, social perceptions were formed, by using mess media related to the SSA. This study is meaningful in that it identifies the overall social perception and trend of SSA through public opinion. In order to extract media trend data, the search used the big data analysis system, Textom, to collect data from the representative portals Naver News and Daum News. The collected texts were 1,299 in 2020 and 1,410 in 2021, for a total of 2,709. As a result of the analysis, first, the most derived words in relation to the frequency of text appearance were 'SSA', 'establishment', and 'operation'. Second, as a result of the N-gram analysis, the pairs of words directly related to the SSA 'SSA and public', 'SSA and opening', 'SSA and launch', and 'SSA and Department Director', 'SSA and Staff', 'SSA and Caregiver' etc. Third, in the results of TF-IDF analysis and word network analysis, similar to the word occurrence frequency and N-gram results, 'establishment', 'operation', 'public', 'launch', 'provided', 'opened', ' 'Holding' and 'Care' were derived. Based on the above analysis results, it was suggested to strengthen the emergency care support group, to commercialize it in detail, and to stabilize jobs.

Exploring Issues Related to the Metaverse from the Educational Perspective Using Text Mining Techniques - Focusing on News Big Data (텍스트마이닝 기법을 활용한 교육관점에서의 메타버스 관련 이슈 탐색 - 뉴스 빅데이터를 중심으로)

  • Park, Ju-Yeon;Jeong, Do-Heon
    • Journal of Industrial Convergence
    • /
    • v.20 no.6
    • /
    • pp.27-35
    • /
    • 2022
  • The purpose of this study is to analyze the metaverse-related issues in the news big data from an educational perspective, explore their characteristics, and provide implications for the educational applicability of the metaverse and future education. To this end, 41,366 cases of metaverse-related data searched on portal sites were collected, and weight values of all extracted keywords were calculated and ranked using TF-IDF, a representative term weight model, and then word cloud visualization analysis was performed. In addition, major topics were analyzed using topic modeling(LDA), a sophisticated probability-based text mining technique. As a result of the study, topics such as platform industry, future talent, and extension in technology were derived as core issues of the metaverse from an educational perspective. In addition, as a result of performing secondary data analysis under three key themes of technology, job, and education, it was found that metaverse has issues related to education platform innovation, future job innovation, and future competency innovation in future education. This study is meaningful in that it analyzes a vast amount of news big data in stages to draw issues from an education perspective and provide implications for future education.

Development of Personalized Learning Course Recommendation Model for ITS (ITS를 위한 개인화 학습코스 추천 모델 개발)

  • Han, Ji-Won;Jo, Jae-Choon;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.21-28
    • /
    • 2018
  • To help users who are experiencing difficulties finding the right learning course corresponding to their level of proficiency, we developed a recommendation model for personalized learning course for Intelligence Tutoring System(ITS). The Personalized Learning Course Recommendation model for ITS analyzes the learner profile and extracts the keyword by calculating the weight of each word. The similarity of vector between extracted words is measured through the cosine similarity method. Finally, the three courses of top similarity are recommended for learners. To analyze the effects of the recommendation model, we applied the recommendation model to the Women's ability development center. And mean, standard deviation, skewness, and kurtosis values of question items were calculated through the satisfaction survey. The results of the experiment showed high satisfaction levels in accuracy, novelty, self-reference and usefulness, which proved the effectiveness of the recommendation model. This study is meaningful in the sense that it suggested a learner-centered recommendation system based on machine learning, which has not been researched enough both in domestic, foreign domains.