• Title/Summary/Keyword: word frequency analysis

Search Result 420, Processing Time 0.024 seconds

Analysis of Lexical Effect on Spoken Word Recognition Test (한국어 단음절 낱말 인식에 미치는 어휘적 특성의 영향)

  • Yoon, Mi-Sun;Yi, Bong-Won
    • MALSORI
    • /
    • no.54
    • /
    • pp.15-26
    • /
    • 2005
  • The aim of this paper was to analyze the lexical effects on spoken word recognition of Korean monosyllabic word. The lexical factors chosen in this paper was frequency, density and lexical familiarity of words. Result of the analysis was as follows; frequency was the significant factor to predict spoken word recognition score of monosyllabic word. The other factors were not significant. This result suggest that word frequency should be considered in speech perception test.

  • PDF

Analysis of Lexical Effect on Spoken Word Recognition Test (낱말 인식 검사에 대한 어휘적 특성의 영향 분석)

  • Yoon, Mi-Sun;Yi, Bong-Won
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.77-80
    • /
    • 2005
  • The aim of this paper was to analyze the lexical effects on spoken word recognition of Korean monosyllabic word. The lexical factors chosen in this paper was frequency, density and lexical familiarity of words. Result of the analysis was as follows; frequency was the significant factor to predict spoken word recognition score of monosyllabic word. The other factors were not significant. This result suggest that word frequency should be considered in speech perception test.

  • PDF

Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method (한의학 고문헌 데이터 분석을 위한 단어 임베딩 기법 비교: 자연어처리 방법을 적용하여)

  • Oh, Junho
    • Journal of Korean Medical classics
    • /
    • v.32 no.1
    • /
    • pp.61-74
    • /
    • 2019
  • Objectives : The purpose of this study is to help select an appropriate word embedding method when analyzing East Asian traditional medicine texts as data. Methods : Based on prescription data that imply traditional methods in traditional East Asian medicine, we have examined 4 count-based word embedding and 2 prediction-based word embedding methods. In order to intuitively compare these word embedding methods, we proposed a "prescription generating game" and compared its results with those from the application of the 6 methods. Results : When the adjacent vectors are extracted, the count-based word embedding method derives the main herbs that are frequently used in conjunction with each other. On the other hand, in the prediction-based word embedding method, the synonyms of the herbs were derived. Conclusions : Counting based word embedding methods seems to be more effective than prediction-based word embedding methods in analyzing the use of domesticated herbs. Among count-based word embedding methods, the TF-vector method tends to exaggerate the frequency effect, and hence the TF-IDF vector or co-word vector may be a more reasonable choice. Also, the t-score vector may be recommended in search for unusual information that could not be found in frequency. On the other hand, prediction-based embedding seems to be effective when deriving the bases of similar meanings in context.

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

  • Al-Sabahi, Kamal;Zuping, Zhang;Kang, Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.1
    • /
    • pp.254-276
    • /
    • 2019
  • Since the amount of information on the internet is growing rapidly, it is not easy for a user to find relevant information for his/her query. To tackle this issue, the researchers are paying much attention to Document Summarization. The key point in any successful document summarizer is a good document representation. The traditional approaches based on word overlapping mostly fail to produce that kind of representation. Word embedding has shown good performance allowing words to match on a semantic level. Naively concatenating word embeddings makes common words dominant which in turn diminish the representation quality. In this paper, we employ word embeddings to improve the weighting schemes for calculating the Latent Semantic Analysis input matrix. Two embedding-based weighting schemes are proposed and then combined to calculate the values of this matrix. They are modified versions of the augment weight and the entropy frequency that combine the strength of traditional weighting schemes and word embedding. The proposed approach is evaluated on three English datasets, DUC 2002, DUC 2004 and Multilingual 2015 Single-document Summarization. Experimental results on the three datasets show that the proposed model achieved competitive performance compared to the state-of-the-art leading to a conclusion that it provides a better document representation and a better document summary as a result.

Phonological phrase boundary and word frequency that influence the phonological word recognition (음운구 경계와 단어빈도가 한국어 음운단어 재인에 미치는 영향)

  • Kim, Jeahong;Shin, Hasun;Kim, Yeseul;Yun, Gwangyeol;Kim, Daseul;Shin, Jiyoung;Nam, Kichun
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.45-56
    • /
    • 2019
  • This study investigated the interaction between phonological phrase boundary and word frequency variable in Korean speech processing. A word monitoring task was performed to examine the interference caused by the frequency effect of target word depending on whether a phonological phrase is formed within the target word. Frequency of target word (high vs low) and phonological phrase boundary (within target word vs between target words) were applied as between and within subject condition respectively. Our results showed the significant main effect of the phonological phrase boundary and the significant interaction. In the post-hoc analysis, the high-frequency target words were detected significantly faster than the low-frequency target words only in the within phonological phrase boundary condition. Frequency effect in the between phonological phrase boundary condition did not appear. The results indicated that the phonological phrase boundary and word frequency variable played an important role in Korean speech processing. In particular, we discussed the possibility of processing the word frequency at the very early sensory information processing stage based on the interaction of two experimental factors.

A study on Korean language processing using TF-IDF (TF-IDF를 활용한 한글 자연어 처리 연구)

  • Lee, Jong-Hwa;Lee, MoonBong;Kim, Jong-Weon
    • The Journal of Information Systems
    • /
    • v.28 no.3
    • /
    • pp.105-121
    • /
    • 2019
  • Purpose One of the reasons for the expansion of information systems in the enterprise is the increased efficiency of data analysis. In particular, the rapidly increasing data types which are complex and unstructured such as video, voice, images, and conversations in and out of social networks. The purpose of this study is the customer needs analysis from customer voices, ie, text data, in the web environment.. Design/methodology/approach As previous study results, the word frequency of the sentence is extracted as a word that interprets the sentence has better affects than frequency analysis. In this study, we applied the TF-IDF method, which extracts important keywords in real sentences, not the TF method, which is a word extraction technique that expresses sentences with simple frequency only, in Korean language research. We visualized the two techniques by cluster analysis and describe the difference. Findings TF technique and TF-IDF technique are applied for Korean natural language processing, the research showed the value from frequency analysis technique to semantic analysis and it is expected to change the technique by Korean language processing researcher.

Research Trend Analysis of the Retail Industry: Focusing on the Department Store (유통업태 연구동향 분석: 백화점을 중심으로)

  • Hoe-Chang YANG
    • The Journal of Economics, Marketing and Management
    • /
    • v.11 no.5
    • /
    • pp.45-55
    • /
    • 2023
  • Purpose: As one of the continuous studies on the offline distribution industry, the purpose of this study is to find ways for offline stores to respond to the growth of online shopping by identifying research trends on department stores. Research design, data and methodology: To this end, this study conducted word frequency analysis, word co-occurrence frequency analysis, BERTopic, LDA, and dynamic topic modeling using Python 3.7 on a total of 551 English abstracts searched with the keyword 'department store' in scienceON as of October 10, 2022. Results: The results of word frequency analysis and co-occurrence frequency analysis revealed that research related to department stores frequently focuses on factors such as customers, consumers, products, satisfaction, services, and quality. BERTopic and LDA analyses identified five topics, including 'store image,' with 'shopping information' showing relatively high interest, while 'sales systems' were observed to have relatively lower interest. Conclusions: Based on the results of this study, it was concluded that research related to department stores has so far been conducted in a limited scope, and it is insufficient to provide clues for department stores to secure competitiveness against online platforms. Therefore, it is suggested that additional research be conducted on topics such as the true role of department stores in the retail industry, consumer reinterpretation, customer value and lifetime value, department stores as future retail spaces, ethical management, and transparent ESG management.

An Analysis of the Vowel Formants of the Young Males in the Buckeye Corpus (벅아이 코퍼스에서의 젊은 성인 남성의 모음 포먼트 분석)

  • Yoon, Kyu-Chul;Noh, Hye-Uk
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.41-49
    • /
    • 2012
  • The purpose of this paper is to extract the vowel formants of the ten young male speakers from the Buckeye Corpus of Conversational Speech [1] and to analyze them in comparison to earlier works in terms of various phonetic factors that are expected to affect the realization of the formant distribution. The first two formant frequency values were automatically extracted with a Praat script along with such factors as the place of articulation, the content versus function word information, syllabic stress information, the location in a word, location in utterance, speech rate of three consecutive words, and the word frequency in the corpus. The results indicated that the formant patterns from the corpus were very different from those of earlier works although the overall pattern was similar and that the factors were strongly responsible for the realization of the two formants. The purpose of this paper is to extract the vowel formants of the ten young male speakers from the Buckeye Corpus of Conversational Speech [1] and to analyze them in comparison to earlier works in terms of various phonetic factors that are expected to affect the realization of the formant distribution. The first two formant frequency values were automatically extracted with a Praat script along with such factors as the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The result indicated that the formant patterns from the corpus were very different from those of earlier works although the overall pattern was similar and that the factors were strongly responsible for the realization of the two formants.

Exploring Depression Research Trends Using BERTopic and LDA

  • Woo-Ryeong, YANG;Hoe-Chang, YANG
    • The Korean Journal of Food & Health Convergence
    • /
    • v.9 no.1
    • /
    • pp.19-28
    • /
    • 2023
  • The purpose of this study is to explore which areas have been more interested in depression research in Korea through analysis of academic papers related to depression, and then to provide insights that can solve future depression problems. 1,032 papers searched with the keyword "depression" in scienceON were analyzed using Python 3.7 for word frequency analysis, word co-occurrence analysis, BERTopic, LDA, and OLS regression analysis. The results of word frequency and co-occurrence frequency analysis showed that related words were composed around words such as patient, disorder and symptom. As a result of topic modeling, a total of 13 topics including 'childhood depression' and 'eating anxiety' were derived. And it has been identified as a topic of interest that 'suicidal thoughts', 'treatment', 'occupational health', and 'health treatment program' were statistically significant topics, while 'child depression' and 'female treatment' were relatively less. As a result of the analysis of research trends, future research will not only study physiological and psychological factors but also social and environmental causes, as well as it was suggested that various collaborative studies of experts in academia were needed such as convergence and complex perspectives for depression relief and treatment.

An Analysis of the Vowel Formants of the Young Females in the Buckeye Corpus (벅아이 코퍼스에서의 젊은 성인 여성의 모음 포먼트 분석)

  • Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.45-52
    • /
    • 2012
  • The purpose of this paper is to measure the first two vowel formants of the ten young female speakers from the Buckeye Corpus of Conversational Speech [1] automatically and then to analyze various potential factors that may affect the formant distribution of the eight peripheral vowels of English. The factors that were analyzed included the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The results indicate that the overall formant patterns of the female speakers were similar to those of earlier works. The effects of the factors on the realization of the two formants were also similar to those from the male speakers with minor differences.