• Title/Summary/Keyword: Frequency based Text Analysis

Search Result 239, Processing Time 0.029 seconds

Topic Extraction and Classification Method Based on Comment Sets

  • Tan, Xiaodong
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.329-342
    • /
    • 2020
  • In recent years, emotional text classification is one of the essential research contents in the field of natural language processing. It has been widely used in the sentiment analysis of commodities like hotels, and other commentary corpus. This paper proposes an improved W-LDA (weighted latent Dirichlet allocation) topic model to improve the shortcomings of traditional LDA topic models. In the process of the topic of word sampling and its word distribution expectation calculation of the Gibbs of the W-LDA topic model. An average weighted value is adopted to avoid topic-related words from being submerged by high-frequency words, to improve the distinction of the topic. It further integrates the highest classification of the algorithm of support vector machine based on the extracted high-quality document-topic distribution and topic-word vectors. Finally, an efficient integration method is constructed for the analysis and extraction of emotional words, topic distribution calculations, and sentiment classification. Through tests on real teaching evaluation data and test set of public comment set, the results show that the method proposed in the paper has distinct advantages compared with other two typical algorithms in terms of subject differentiation, classification precision, and F1-measure.

Analysis of Keywords in national river occupancy permits by region using text mining and network theory (텍스트 마이닝과 네트워크 이론을 활용한 권역별 국가하천 점용허가 키워드 분석)

  • Seong Yun Jeong
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.185-197
    • /
    • 2023
  • This study was conducted using text mining and network theory to extract useful information for application for occupancy and performance of permit tasks contained in the permit contents from the permit register, which is used only for the simple purpose of recording occupancy permit information. Based on text mining, we analyzed and compared the frequency of vocabulary occurrence and topic modeling in five regions, including Seoul, Gyeonggi, Gyeongsang, Jeolla, Chungcheong, and Gangwon, as well as normalization processes such as stopword removal and morpheme analysis. By applying four types of centrality algorithms, including stage, proximity, mediation, and eigenvector, which are widely used in network theory, we looked at keywords that are in a central position or act as an intermediary in the network. Through a comprehensive analysis of vocabulary appearance frequency, topic modeling, and network centrality, it was found that the 'installation' keyword was the most influential in all regions. This is believed to be the result of the Ministry of Environment's permit management office issuing many permits for constructing facilities or installing structures. In addition, it was found that keywords related to road facilities, flood control facilities, underground facilities, power/communication facilities, sports/park facilities, etc. were at a central position or played a role as an intermediary in topic modeling and networks. Most of the keywords appeared to have a Zipf's law statistical distribution with low frequency of occurrence and low distribution ratio.

Keyword Analysis of Two SCI Journals on Rock Engineering by using Text Mining (텍스트 마이닝을 이용한 암반공학분야 SCI논문의 주제어 분석)

  • Jung, Yong-Bok;Park, Eui-Seob
    • Tunnel and Underground Space
    • /
    • v.25 no.4
    • /
    • pp.303-319
    • /
    • 2015
  • Text mining is one of the branches of data mining and is used to find any meaningful information from the large amount of text. In this study, we analyzed titles and keywords of two SCI journals on rock engineering by using text mining to find major research area, trend and associations of research fields. Visualization of the results was also included for the intuitive understanding of the results. Two journals showed similar research fields but different patterns in the associations among research fields. IJRMMS showed simple network, that is one big group based on the keyword 'rock' with a few small groups. On the other hand, RMRE showed a complex network among various medium groups. Trend analysis by clustering and linear regression of keyword - year frequency matrix provided that most of the keywords increased in number as time goes by except a few descending keywords.

A study on Metaverse keyword Consumer perception survey after Covid-19 using big Data

  • LEE, JINHO;Byun, Kwang Min;Ryu, Gi Hwan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.52-57
    • /
    • 2022
  • In this study, keywords from representative online portal sites such as Naver, Google, and Youtube were collected based on text mining analysis technique using Textom to check the changes in metqaverse after COVID-19. before Corona, it was confirmed that social media platforms such as Kakao Talk, Facebook, and Twitter were mentioned, and among the four metaverse, consumer awareness was still concentrated in the field of life logging. However, after Corona, keywords from Roblox, Fortnite, and Geppetto appeared, and keywords such as Universe, Space, Meta, and the world appeared, so Metaverse was recognized as a virtual world. As a result, it was confirmed that consumer perception changed from the life logging of Metaverse to the mirror world. Third, keywords such as cryptocurrency, cryptocurrency, coin, and exchange appeared before Corona, and the word frequency ranking for blockchain, which is an underlying technology, was high, but after Corona, the word frequency ranking fell significantly as mentioned above.

A Study on the Perception of Metaverse Fashion Using Big Data Analysis

  • Hosun Lim
    • Fashion & Textile Research Journal
    • /
    • v.25 no.1
    • /
    • pp.72-81
    • /
    • 2023
  • As changes in social and economic paradigms are accelerating, and non-contact has become the new normal due to the COVID-19 pandemic, metaverse services that build societies in online activities and virtual reality are spreading rapidly. This study analyzes the perception and trend of metaverse fashion using big data. TEXTOM was used to extract metaverse and fashion-related words from Naver and Google and analyze their frequency and importance. Additionally, structural equivalence analysis based on the derived main words was conducted to identify the perception and trend of metaverse fashion. The following results were obtained: First, term frequency(TF) analysis revealed the most frequently appearing words were "metaverse," "fashion," "virtual," "brand," "platform," "digital," "world," "Zepeto," "company," and "game." After analyzing TF-inverse document frequency(TF-IDF), "virtual" was the most important, followed by "brand," "platform," "Zepeto," "digital," "world," "industry," "game," "fashion show," and "industry." "Metaverse" and "fashion" were found to have a high TF but low TF-IDF. Further, words such as "virtual," "brand," "platform," "Zepeto," and "digital" had a higher TF-IDF ranking than TF, indicating that they had high importance in the text. Second, convergence of iterated correlations analysis using UNICET revealed four clusters, classified as "virtual world," "metaverse distribution platform," "fashion contents technology investment," and "metaverse fashion week." Fashion brands are hosting virtual fashion shows and stores on metaverse platforms where the virtual and real worlds coexist, and investment in developing metaverse-related technologies is under way.

Analysis on the Trend of The Journal of Information Systems Using TLS Mining (TLS 마이닝을 이용한 '정보시스템연구' 동향 분석)

  • Yun, Ji Hye;Oh, Chang Gyu;Lee, Jong Hwa
    • The Journal of Information Systems
    • /
    • v.31 no.1
    • /
    • pp.289-304
    • /
    • 2022
  • Purpose The development of the network and mobile industries has induced companies to invest in information systems, leading a new industrial revolution. The Journal of Information Systems, which developed the information system field into a theoretical and practical study in the 1990s, retains a 30-year history of information systems. This study aims to identify academic values and research trends of JIS by analyzing the trends. Design/methodology/approach This study aims to analyze the trend of JIS by compounding various methods, named as TLS mining analysis. TLS mining analysis consists of a series of analysis including Term Frequency-Inverse Document Frequency (TF-IDF) weight model, Latent Dirichlet Allocation (LDA) topic modeling, and a text mining with Semantic Network Analysis. Firstly, keywords are extracted from the research data using the TF-IDF weight model, and after that, topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm to identify issue keywords. Findings The current study used the summery service of the published research paper provided by Korea Citation Index to analyze JIS. 714 papers that were published from 2002 to 2012 were divided into two periods: 2002-2011 and 2012-2021. In the first period (2002-2011), the research trend in the information system field had focused on E-business strategies as most of the companies adopted online business models. In the second period (2012-2021), data-based information technology and new industrial revolution technologies such as artificial intelligence, SNS, and mobile had been the main research issues in the information system field. In addition, keywords for improving the JIS citation index were presented.

Analysis of domestic and foreign research trends of Tricholoma matsutake using text mining techniques

  • Choi, Ah Hyeon;Kang, Jun Won
    • Korean Journal of Agricultural Science
    • /
    • v.48 no.3
    • /
    • pp.505-514
    • /
    • 2021
  • Among non-timber forest products, Tricholoma matsutake is a high value added item. Many countries, including Korea, China, and Japan, are doing research and technology development to increase artificial cultivation and productivity. However, the production of T. matsutake is on the decline due to global warming, abnormal temperatures and pine tree pest problems. Therefore, it is necessary to identify trends in domestic and foreign research on T. matsutake, respond to preemptive research and development to preserve the genetic resources of T. matsutake and increase its productivity. Based on the correlation between keywords in the high frequency keywords, it was observed that microbial clusters of T. matsutake are mainly found in Korea. The main focus in China has been the pharmacology studies on the ingredients of T. matsutake. The main focus in Japan has been on preserving the genetic diversity and species of T. matsutake. Thus, future domestic studies of T. matsutake will require pharmacological studies on the ingredients of T. matsutake and on its genetic diversity and species conservation. In addition, unlike China and Japan, genetic keywords did not appear in Korea at high frequency. Therefore, Korea will have to proceed with research using modern molecular biology techniques.

A Study on the Network Text Analysis about Oral Health in Aging-Well

  • Seol-Hee Kim
    • Journal of dental hygiene science
    • /
    • v.23 no.4
    • /
    • pp.302-311
    • /
    • 2023
  • Background: Oral health is an important element of well aging. And oral health also affects overall health, mental health, and quality of life. In this study, we sought to identify oral health influencing factors and research trends for well-aging through text analysis of research on well-aging and oral health over the past 12 years. Methods: The research data was analyzed based on English literature published in PubMed from 2012 to 2023. Aging well and oral health were used as search terms, and 115 final papers were selected. Network text analysis included keyword frequency analysis, centrality analysis, and cohesion structure analysis using the Net-Miner 4.0 program. Results: Excluding general characteristics, the most frequent keywords in 115 articles, 520 keywords (Mesh terms) were psychology, dental prosthesis and Alzheimer's disease, Dental caries, cognition, cognitive dysfunction, and bacteria. Research keywords with high degree centrality were Dental caries (0.864), Quality of life (0.833), Tooth loss (0.818), Health status (0.727), and Life expectancy (0.712). As a result of community analysis, it consisted of 4 groups. Group 1 consisted of chewing and nutrition, Group 2 consisted oral diseases, systemic diseases and management, Group 3 consisted oral health and mental health, Group 4 consisted oral frailty symptoms and quality of life. Conclusion: In an aging society, oral dysfunction affects mental health and quality of life. Preventing oral diseases for well-aging can have a positive impact on mental health and quality of life. Therefore, efforts are needed to prevent oral frailty in a super-aging society by developing and educating systematic oral care programs for each life cycle.

A Study on the Feature Point Extraction Methodology based on XML for Searching Hidden Vault Anti-Forensics Apps (은닉형 Vault 안티포렌식 앱 탐색을 위한 XML 기반 특징점 추출 방법론 연구)

  • Kim, Dae-gyu;Kim, Chang-soo
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.61-70
    • /
    • 2022
  • General users who use smartphone apps often use the Vault app to protect personal information such as photos and videos owned by individuals. However, there are increasing cases of criminals using the Vault app function for anti-forensic purposes to hide illegal videos. These apps are one of the apps registered on Google Play. This paper proposes a methodology for extracting feature points through XML-based keyword frequency analysis to explore Vault apps used by criminals, and text mining techniques are applied to extract feature points. In this paper, XML syntax was compared and analyzed using strings.xml files included in the app for 15 hidden Vault anti-forensics apps and non-hidden Vault apps, respectively. In hidden Vault anti-forensics apps, more hidden-related words are found at a higher frequency in the first and second rounds of terminology processing. Unlike most conventional methods of static analysis of APK files from an engineering point of view, this paper is meaningful in that it approached from a humanities and sociological point of view to find a feature of classifying anti-forensics apps. In conclusion, applying text mining techniques through XML parsing can be used as basic data for exploring hidden Vault anti-forensics apps.

Research on Satisfaction Evaluation Based on Tourist Big Data

  • Guo, Hanwen;Liu, Ziyang;Jiao, Zeyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.231-244
    • /
    • 2022
  • With the improvement of people's living standards and the development of tourism, tourists have greater freedom in choosing destinations. Therefore, as an indicator of satisfaction with scenic spots, tourist comments are becoming increasingly prominent. This paper aims to compare and analyze the landscape image of the Five Great Mountains in China and provide specific strategies for its development. The online reviews of tourists on the Online Travel Agency (OTA) website about the Five Great Mountains from 2015 to 2018 are collected as research samples. The text analysis method and R language are used to analyze the content of the tourist reviews, while the high-frequency words in the word cloud are used for visual display. In addition, the entropy weight method is used to determine the index weight and tourist satisfaction is evaluated to understand the weaknesses of those scenic spots. The results of the study show that firstly, the tourist satisfaction with the Five Great Mountains is basically consistent with its popularity. Secondly, through weight analysis, tourists pay special attention to the landscape features and environmental health of the scenic area, so that relevant departments should focus on building the landscape characteristics and improving the environmental health of the scenic area. At the same time, the accommodation and service management of the scenic spot cannot be ignored. Finally, according to the analysis results, suggestions are made on how to improve the tourist satisfaction with the Five Great Mountains.