• Title/Summary/Keyword: Text frequency analysis

Search Result 458, Processing Time 0.027 seconds

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition (화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가)

  • Choi, Young Ho;Ban, Sung Min;Kim, Kyung-Wha;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.3-10
    • /
    • 2015
  • In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.

Creation and clustering of proximity data for text data analysis (텍스트 데이터 분석을 위한 근접성 데이터의 생성과 군집화)

  • Jung, Min-Ji;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.451-462
    • /
    • 2019
  • Document-term frequency matrix is a type of data used in text mining. This matrix is often based on various documents provided by the objects to be analyzed. When analyzing objects using this matrix, researchers generally select only terms that are common in documents belonging to one object as keywords. Keywords are used to analyze the object. However, this method misses the unique information of the individual document as well as causes a problem of removing potential keywords that occur frequently in a specific document. In this study, we define data that can overcome this problem as proximity data. We introduce twelve methods that generate proximity data and cluster the objects through two clustering methods of multidimensional scaling and k-means cluster analysis. Finally, we choose the best method to be optimized for clustering the object.

A Study of Secondary Mathematics Materials at a Gifted Education Center in Science Attached to a University Using Network Text Analysis (네트워크 텍스트 분석을 활용한 대학부설 과학영재교육원의 중등수학 강의교재 분석)

  • Kim, Sungyeun;Lee, Seonyoung;Shin, Jongho;Choi, Won
    • Communications of Mathematical Education
    • /
    • v.29 no.3
    • /
    • pp.465-489
    • /
    • 2015
  • The purpose of this study is to suggest implications for the development and revision of future teaching materials for mathematically gifted students by using network text analysis of secondary mathematics materials. Subjects of the analysis were learning goals of 110 teaching materials in a gifted education center in science attached to a university from 2002 to 2014. In analysing the frequency of the texts that appeared in the learning goals, key words were selected. A co-occurrence matrix of the key words was established, and a basic information of network, centrality, centralization, component, and k-core were deducted. For the analysis, KrKwic, KrTitle, and NetMiner4.0 programs were used, respectively. The results of this study were as follows. First, there was a pivot of the network formed with core hubs including 'diversity', 'understanding' 'concept' 'method', 'application', 'connection' 'problem solving', 'basic', 'real life', and 'thinking ability' in the whole network from 2002 to 2014. In addition, knowledge aspects were well reflected in teaching materials based on the centralization analysis. Second, network text analysis based on the three periods of the Mater Plan for the promotion of gifted education was conducted. As a result, a network was built up with 'understanding', and there were strong ties among 'question', 'answer', and 'problem solving' regardless of the periods. On the contrary, the centrality analysis showed that 'communication', 'discovery', and 'proof' only appeared in the first, second, and third period of Master Plan, respectively. Therefore, the results of this study suggest that affective aspects and activities with high cognitive process should be accompanied, and learning goals' mannerism and ahistoricism be prevented in developing and revising teaching materials.

Structural Analysis of Cooking Recipe Texts - Based on Kimchi Jjigae Recipe - (요리레시피의 텍스트 구조해석 - 김치찌개 레시피 중심으로 -)

  • Choi, Jiyu;Han, Gyusang
    • The Korean Journal of Community Living Science
    • /
    • v.28 no.2
    • /
    • pp.191-201
    • /
    • 2017
  • This study compared and analyzed the structures of cooking recipes in order to identify the overall cooking method and develop an efficient method for analyzing cooking recipes. We present procedural texts using a flow graph, which can be referred to as a recipe tree, to represent cooking recipes and the database. A total of 110 kimchi jjigae recipes were identified and classified as 'portion', 'kinds of ingredients', and 'number of cooking deployment'. Recipes for two persons were the most common (43.6%), and 7-13 kinds of ingredients accounted for 50% of kimchi jjigae recipes. Kimchi presented the highest frequency at 78 cases, and pork showed the high frequency at 30 cases. To identify cooking deployment, step 6 was the highest, followed by step 5 (17.3%), step 7 (17.3%), step 4 (11.8%), and step 3 (9.1%). When analyzing the frequency of the relationship between ingredients and action in a recipe expression, Food (F) and Action by the chef (Ac) showed the highest rates at 11.29 and 12.30, respectively, in the cooking process. For frequencies of dependency relation expression in recipes, d-obj (direct object) was the highest at 13.56. The proposed method provides users more efficient and easier access to recipes suitable for their cooking skills.

Keyword Analysis of Research on Consumption of Children and Adolescents Using Text Mining (텍스트마이닝을 활용한 아동, 청소년 대상 소비관련 연구 키워드 분석)

  • Jin, Hyun-Jeong
    • Journal of Korean Home Economics Education Association
    • /
    • v.33 no.4
    • /
    • pp.1-13
    • /
    • 2021
  • The purpose of this study is to identify trends and potential themes of research on consumption of children and adolescents for 20 years by analyzing keywords. The keywords of 869 studies on consumption of children and adolescents published in journals listed in Korean Citation Index were analyzed using text mining techniques. The most frequent keywords were found in the order of youth, youth consumers, consumer education, conspicuous consumption, consumption behavior, and character. As a result of analyzing the frequency of keywords by dividing into five-year periods, it was confirmed that the frequency of consumer education was significantly higher betwn 2006 and 2010. Research on ethical consumption has been active since 2011, and research has been conducted on various topics instead of without a prominent keyword during the most recent 5-year period. Looking at the keywords based on the TF-IDF, the keywords related to the environment and the Internet were the main keywords between 2001 and 2005. From 2006 to 2010, the TF-IDF values of media use, advertisement education, and Internet items were high. From 2011 to 2015, fair trade, green growth, green consumption, North Korean defector youths, social media, and from 2016 to 2020, text mining, sustainable development education, maker education, and the 2015 revised curriculum appeared as important themes. As a result of topic modeling, eight topics were derived: consumer education, mass media/peer culture, rational consumption, Hallyu/cultural industry, consumer competency, economic education, teaching and learning method, and eco-friendly/ethical consumption. As a result of network analysis, it was found that conspicuous consumption and consumer education are important topics in consumption research of children and adolescents.

Comparative Study of User Reactions in OTT Service Platforms Using Text Mining (텍스트 마이닝을 활용한 OTT 서비스 플랫폼별 사용자 반응 비교 연구)

  • Soonchan Kwon;Jieun Kim;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.43-54
    • /
    • 2024
  • This study employs text mining techniques to compare user responses across various Over-The-Top (OTT) service platforms. The primary objective of the research is to understand user satisfaction with OTT service platforms and contribute to the formulation of more effective review strategies. The key questions addressed in this study involve identifying prominent topics and keywords in user reviews of different OTT services and comprehending platform-specific user reactions. TF-IDF is utilized to extract significant words from positive and negative reviews, while BERTopic, an advanced topic modeling technique, is employed for a more nuanced and comprehensive analysis of intricate user reviews. The results from TF-IDF analysis reveal that positive app reviews exhibit a high frequency of content-related words, whereas negative reviews display a high frequency of words associated with potential issues during app usage. Through the utilization of BERTopic, we were able to extract keywords related to content diversity, app performance components, payment, and compatibility, by associating them with content attributes. This enabled us to verify that the distinguishing attributes of the platforms vary among themselves. The findings of this study offer significant insights into user behavior and preferences, which OTT service providers can leverage to improve user experience and satisfaction. We also anticipate that researchers exploring deep learning models will find our study results valuable for conducting analyses on user review text data.

Response Experiences with a Semi-Quantitative Food Frequency Questionnaire : A Qualitative Study using Cognitive Interview (반정량 식품섭취빈도조사의 응답에 관한 인지면접연구)

  • Lee, Gyeong-Sil;Yi, Myung-Sun;Joung, Hyo-Jee;Paik, Hee-Young
    • Journal of Nutrition and Health
    • /
    • v.40 no.6
    • /
    • pp.566-575
    • /
    • 2007
  • The purpose of this research was to understand how individuals reflect on the frequency and quantity of foods that they consume. Participants selected 5 males and 15 females aged 30 years or older were first interviewed on the frequency of their food consumption. Then based on this data, they were given a cognitive interview using the method of verbal proving. The individual cognitive interviews were recorded with consent while being conducted after complete approval by the Seoul National University Institution Review Board. The recorded material was evaluated using a thematic analysis after transcribing them into text. By analyzing stages of reflection, the major barriers to make the device difficult are revealed: 1) More difficulty in remembering events over the course of a full year due to diversification in the types of food that people consume 2) difficulty calculating the average for seasonal foods 3) difficulty estimating the amount of consumption from the photos presented 4) difficulty estimating amount of consumption from the quantity presented 5) difficulty processing foods that people think are healthy and foods are unhealthy simultaneously 6) difficulty having to consider foods where target food goes in as an ingredient; 7) difficulties arising from having to increase frequency when the amount consumed is higher than the quantity that is presented 8) difficulty having to combine the frequency and quantity of each food item when numerous foods are clustered into one category. These findings show that the less participants were involved in cooking, the more diverse their eating habits were, and the more they tried to adhere to rules of filling out the questionnaire, the more it was difficult for them to come up with an answer to the question being asked. It therefore seems necessary to construct a Food Frequency questionnaire that is attentive to these problems that arise from the recall stages.

Topic Extraction and Classification Method Based on Comment Sets

  • Tan, Xiaodong
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.329-342
    • /
    • 2020
  • In recent years, emotional text classification is one of the essential research contents in the field of natural language processing. It has been widely used in the sentiment analysis of commodities like hotels, and other commentary corpus. This paper proposes an improved W-LDA (weighted latent Dirichlet allocation) topic model to improve the shortcomings of traditional LDA topic models. In the process of the topic of word sampling and its word distribution expectation calculation of the Gibbs of the W-LDA topic model. An average weighted value is adopted to avoid topic-related words from being submerged by high-frequency words, to improve the distinction of the topic. It further integrates the highest classification of the algorithm of support vector machine based on the extracted high-quality document-topic distribution and topic-word vectors. Finally, an efficient integration method is constructed for the analysis and extraction of emotional words, topic distribution calculations, and sentiment classification. Through tests on real teaching evaluation data and test set of public comment set, the results show that the method proposed in the paper has distinct advantages compared with other two typical algorithms in terms of subject differentiation, classification precision, and F1-measure.

Dynamic Text Categorizing Method using Text Mining and Association Rule

  • Kim, Young-Wook;Kim, Ki-Hyun;Lee, Hong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.103-109
    • /
    • 2018
  • In this paper, we propose a dynamic document classification method which breaks away from existing document classification method with artificial categorization rules focusing on suppliers and has changing categorization rules according to users' needs or social trends. The core of this dynamic document classification method lies in the fact that it creates classification criteria real-time by using topic modeling techniques without standardized category rules, which does not force users to use unnecessary frames. In addition, it can also search the details through the relevance analysis by calculating the relationship between the words that is difficult to grasp by word frequency alone. Rather than for logical and systematic documents, this method proposed can be used more effectively for situation analysis and retrieving information of unstructured data which do not fit the category of existing classification such as VOC (Voice Of Customer), SNS and customer reviews of Internet shopping malls and it can react to users' needs flexibly. In addition, it has no process of selecting the classification rules by the suppliers and in case there is a misclassification, it requires no manual work, which reduces unnecessary workload.

A Comparative Study of Dietary Related Zero-waste Patterns and Consumer Responses Before and After COVID-19 (코로나-19 이전과 이후 식생활 관련 제로웨이스트 운동 양상과 소비자 반응 비교)

  • Park, In-Hyoung;Park, You-min;Lee, Cheol;Sun, Jung-eun;Hu, Wendie;Chung, Jae-Eun
    • Human Ecology Research
    • /
    • v.60 no.1
    • /
    • pp.21-38
    • /
    • 2022
  • This study uses text mining compares and contrasts consumers' social media discourses on dietary related zero-waste movement before and after COVID-19. The results indicate that the amount of buzz on social networks for the zero- waste movement has been increasing after COVID-19. Additionally, the results of frequency analysis and topic modeling revealed that subjects associated with zero-waste movement were more diversified after COVID-19. Although the results of a sentiment analysis and word cloud visualization confirmed that consumers' positive responses toward the zero-waste have been increasing, they also revealed a need to educate and encourage those who are still not aware of the need for zero-waste. Finally, consumers mentioned only a small number of companies participating in zero-waste movement on SNS, indicating that the level of active involvement by such companies is much lower than that of consumers. Theoretical and educational implications as well as those for government policy-making are considered.