• Title/Summary/Keyword: Text frequency analysis

Search Result 469, Processing Time 0.022 seconds

100 Article Paper Text Minning Data Analysis and Visualization in Web Environment (웹 환경에서 100 논문에 대한 텍스트 마이닝, 데이터 분석과 시각화)

  • Li, Xiaomeng;Li, Jiapei;Lee, HyunChang;Shin, SeongYoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.157-158
    • /
    • 2017
  • There is a method to analyze the big data of the article and text mining by using Python language. And Python is a kind of programming language and it is easy to operating. Reaserch and use Python to creat a Web environment that the research result of the analysis can show directly on the browser. In this thesis, there are 100 article paper frrom Altmetric, Altmetric tracks a range of sources to capture. It is necessary to collect and analyze the big data use an effictive method, After the result coming out, Use Python wordcloud to make a directive image that can show the highest frequency of words.

  • PDF

A Structural Analysis of Acupuncture & Moxibustion Points in the NaeGyeong Chapter of DongUiBoGam Using Text Mining (텍스트마이닝을 이용한 동의보감의 질병인식방식과 내경편 침구법 경혈 특성 분석)

  • Lee, Taehyung;Jung, Won-Mo;Lee, In-Seon;Lee, Hyejung;Kim, Namil;Chae, Younbyoung
    • Korean Journal of Acupuncture
    • /
    • v.30 no.4
    • /
    • pp.230-242
    • /
    • 2013
  • Objectives : DongUiBoGam is a representative medical literature in Korea. This research intends to structurally grasp how DongUiBoGam understands the human body and review the methods of acupuncture and moxibustion in the NaeGyeong chapter of it using text mining. Methods : The structure of DongUiBoGam was analyzed with specific parts of the book that described contents, major premises of understanding the human body, and processes of treatment. We analyzed characteristics of each acupoints in a relationship with causes of diseases & symptoms in the NaeGyeong chapter using a Term Frequency - Inverse Document Frequency(TFIDF). Results : Three different categories of pattern identification(PI) were formed after structural analysis of DongUiBoGam. Every causes of diseases & symptoms were transformed according to the three categories of PI. After analyzing the relationship between acupoints and causes of diseases & symptoms, 114 acupoints were visualized with TFIDF values of three PI categories. Conclusions : The selection of acupoints in NaeGyeong chapter of DongUiBoGam were linked to causes of diseases & symptoms based on the three PI categories. Through visualization of bipartite relationships between acupoints and causes of diseases & symptoms, we could easily understand characteristics of each acupoint.

Text mining analysis of terms and information on product names used in online sales of women's clothing (텍스트마이닝을 활용한 온라인 판매 여성 의류 상품명에 나타난 용어 및 정보분석)

  • Yeo Sun Kang
    • The Research Journal of the Costume Culture
    • /
    • v.31 no.1
    • /
    • pp.34-52
    • /
    • 2023
  • In this study, text mining was conducted on the product names of skirts, pants, shirts/blouses, and dresses to analyze the characteristics of keywords appearing in online shopping product names. As a result of frequency analysis, the number of keywords that appeared 0.5% or more for each item was around 30, and the number of keywords that appeared 0.1% or more was around 150. The cumulative distribution rate of 150 terms was around 80%. Accordingly, information on 150 key terms was analyzed, from which item, clothing composition, and material information were the found to be the most important types of information (ranking in the top five of all items). In addition, fit and style information for skirts and pants and length information for skirts and dresses were also considered important information. Keywords representing clothing composition information were: banding, high waist, and split for skirts and pants; and V-neck, tie, long sleeves, and puff for shirts/blouses and dresses. It was possible to identify the current design characteristics preferred by consumers from this information. However, there were also problems with terminology that hindered the connection between sellers and consumers. The most common problems were the use of various terms with the same meaning and irregular use of Korean and English terms. However, as a result of using co-appearance frequency analysis, it can be interpreted that there is little intention for product exposure, so it is recommended to avoid it.

Keywords Analysis of Clothing Materials in Consumer Reviews Using Big Data Text Mining (빅데이터 텍스트 마이닝을 활용한 소비자 리뷰에서의 의류 소재 키워드 분석)

  • Gaeun Kang;Jiwon Park;Shinjung Yoo
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.48 no.4
    • /
    • pp.729-743
    • /
    • 2024
  • This research explores consumer preferences for materials in different clothing product categories, using web-crawling and text mining techniques. Specifically, the study focuses on the material-related terms found in consumer reviews across three distinct product categories: functional clothing, formal shirts, and knit sweaters. Top-selling products within each category were identified on the Naver Shopping website based on the volume of reviews, and the four most-reviewed products were selected. Six hundred reviews per product were analyzed using the Textom big-data analysis software to determine the frequency of material-related mentions and word associations. The analysis utilized two comparative metrics: product category and usage duration. Our findings reveal notable variations in the material preferences mentioned by consumers across different product categories. The study suggests a need to re-evaluate existing standardized review criteria to better reflect consumer interests specific to each product category. Additionally, an increase in material-related terms in reviews over one month indicates the potential importance of extending the duration of product reviews to enhance the accuracy of information that reflects longer-term consumer experiences with material quality.

A Method for Short Text Classification using SNS Feature Information based on Markov Logic Networks (SNS 특징정보를 활용한 마르코프 논리 네트워크 기반의 단문 텍스트 분류 방법)

  • Lee, Eunji;Kim, Pankoo
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.7
    • /
    • pp.1065-1072
    • /
    • 2017
  • As smart devices and social network services (SNSs) become increasingly pervasive, individuals produce large amounts of data in real time. Accordingly, studies on unstructured data analysis are actively being conducted to solve the resultant problem of information overload and to facilitate effective data processing. Many such studies are conducted for filtering inappropriate information. In this paper, a feature-weighting method considering SNS-message features is proposed for the classification of short text messages generated on SNSs, using Markov logic networks for category inference. The performance of the proposed method is verified through a comparison with an existing frequency-based classification methods.

An Analysis on Learning Effects of Character Animation Based-Mobile Foreign Language Vocabulary Learning App (캐릭터 애니메이션 기반 모바일 외국어 어휘 학습 앱 효과 분석)

  • Kim, Insook;Choi, Minsuh;Ko, Hyeyoung
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.12
    • /
    • pp.1526-1533
    • /
    • 2018
  • This study aims to provide implications for mobile foreign language vocabulary learning app by analyzing the effects of mobile vocabulary learning app based on character animation. For this purpose, we applied the learning application designed with character animation and text, and the application designed with text only to two groups of learners, and analyzed the effect. As a result, we found that application designed with character animation and text was useful in recognition frequency and duration concerning learning. Regarding learning outcomes, we found that it is useful not only in memory but also in learning interest and motivation. This study provides implications for learning method and design development of mobile-based foreign language vocabulary learning application which actively using recently.

Social media big data analysis of Z-generation fashion (Z세대 패션에 대한 소셜미디어의 빅데이터 분석)

  • Sung, Kwang-Sook
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.22 no.3
    • /
    • pp.49-61
    • /
    • 2020
  • This study analyzed the social media accounts and performed a Big Data analysis of Z-generation fashion using Textom Text Mining Techniques program and Ucinet Big Data analysis program. The research results are as follows: First, as a result of keyword analysis on 67.646 Z-generation fashion social media posts over the last 5 years, 220,211 keywords were extracted. Among them, 67 major keywords were selected based on the frequency of co-occurrence being greater than more than 250 times. As the top keywords appearing over 1000 times, were the most influential as the number of nodes connected to 'Z generation' (29595 times) are overwhelmingly, and was followed by 'millennials'(18536 times), 'fashion'(17836 times), and 'generation'(13055 times), 'brand'(8325 times) and 'trend'(7310 times) Second, as a result of the analysis of Network Degree Centrality between the key keywords for the Z-generation, the number of nodes connected to the "Z-generation" (29595 times) is overwhelmingly large. Next, many 'millennial'(18536 times), 'fashion'(17836 times), 'generation'(13055 times), 'brand'(8325 times), 'trend'(7310 times), etc. appear. These texts are considered to be important factors in exploring the reaction of social media to the Z-generation. Third, through the analysis of CONCOR, text with the structural equivalence between major keywords for Gen Z fashion was rearranged and clustered. In addition, four clusters were derived by grouping through network semantic network visualization. Group 1 is 54 texts, 'Diverse Characteristics of Z-Generation Fashion Consumers', Group 2 is 7 Texts, 'Z-Generation's teenagers Fashion Powers', Group 3 is 8 Texts, 'Z-Generation's Celebrity Fashions' Interest and Fashion', Group 4 named 'Gucci', the most popular luxury fashion of the Z-generation as one text.

Trend Analysis of Fraudulent Claims by Long Term Care Institutions for the Elderly using Text Mining and BIGKinds (텍스트 마이닝과 빅카인즈를 활용한 노인장기요양기관 부당청구 동향 분석)

  • Youn, Ki-Hyok
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.2
    • /
    • pp.13-24
    • /
    • 2022
  • In order to explore the context of fraudulent claims and the measures for preventing them targeting the long-term care institutions for the elderly, which is increasing every year in Korea, this study conducted the text mining analysis using the media report articles. The media report articles were collected from the news big data analysis system called 'BIG KINDS' for about 15 years from July 2008 when the Long-Term Care Insurance for the Elderly took effect, to February 28th 2022. During this period of time, total 2,627 articles were collected under keywords like 'elderly care+fraudulent claims' and 'long-term care+fraudulent claims', and among them, total 946 articles were selected after excluding overlapped articles. In the results of the text mining analysis in this study, first, the top 10 keywords mentioned in the highest frequency in every section(July 1st 2008-February 28th 2022) were shown in the order of long-term care institution for the elderly, fraudulent claims, National Health Insurance Service, Long-Term Care Insurance for the Elderly, long-term care benefits(expenses), elderly care facilities, The Ministry of Health & Welfare, the elderly, report, and reward(payment). Second, in the results of the N-gram analysis, they were shown in the order of long-term care benefits(expenses) and fraudulent claims, fraudulent claims and long-care institution for the elderly, falsehood and fraudulent claims, report and reward(payment), and long-term care institution for the elderly and report. Third, the analysis of TF-IDF was similar to the results of the frequency analysis while the rankings of report, reward(payment), and increase moved up. Based on such results of the analysis above, this study presented the future direction for the prevention of fraudulent claims of long-term care institutions for the elderly.

A Study on the Changes in Consumer Perceptions of the Relationship between Ethical Consumption and Consumption Value: Focusing on Analyzing Ethical Consumption and Consumption Value Keyword Changes Using Big Data (윤리적 소비와 소비가치의 관계에 대한 소비자 인식 변화: 소셜 빅데이터를 활용한 윤리적 소비와 소비가치의 키워드 변화 분석을 중심으로)

  • Shin, Eunjung;Koh, Ae-Ran
    • Human Ecology Research
    • /
    • v.59 no.2
    • /
    • pp.245-259
    • /
    • 2021
  • The purpose of this study was to analyze big data to identify the sub-dimensions of ethical consumption, as well as the consumption value associated with ethical consumption that changes over time. For this study, data were collected from Naver and Daum using the keyword 'ethical consumption' and frequency and matrix data were extracted through Textom, for the period January 1, 2016, to December 31, 2018. In addition, a two-way mode network analysis was conducted using the UCINET 6.0 program and visualized using the NetDraw function. The results of text mining show increasing keyword frequency year-on-year, indicating that interest in ethical consumption has grown. The sub-dimensions derived for 2014 and 2015 are fair trade, ethical consumption, eco-friendly products, and cooperatives and for 2016 are fair trade, ethical consumption, eco-friendly products and animal welfare. The results of deriving consumption value keywords were classified as emotional value, social value, functional value and conditional value. The influence of functional value was found to be growing over time. Through network analysis, the relationship between the sub-dimensions of ethical consumption and consumption values derived each year from 2014 to 2018 showed a significantly strong correlation between eco-friendly product consumption and emotional value, social value, functional value and conditional value.

A Case Study on Text Analysis Using Meal Kit Product Review Data (밀키트 제품 리뷰 데이터를 이용한 텍스트 분석 사례 연구)

  • Choi, Hyeseon;Yeon, Kyupil
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.1-15
    • /
    • 2022
  • In this study, text analysis was performed on the mealkit product review data to identify factors affecting the evaluation of the mealkit product. The data used for the analysis were collected by scraping 334,498 reviews of mealkit products in Naver shopping site. After preprocessing the text data, wordclouds and sentiment analyses based on word frequency and normalized TF-IDF were performed. Logistic regression model was applied to predict the polarity of reviews on mealkit products. From the logistic regression models derived for each product category, the main factors that caused positive and negative emotions were identified. As a result, it was verified that text analysis can be a useful tool that provides a basis for maximizing positive factors for a specific category, menu, and material and removing negative risk factors when developing a mealkit product.