• 제목/요약/키워드: word frequency analysis

검색결과 423건 처리시간 0.027초

한국어 단음절 낱말 인식에 미치는 어휘적 특성의 영향 (Analysis of Lexical Effect on Spoken Word Recognition Test)

  • 윤미선;이봉원
    • 대한음성학회지:말소리
    • /
    • 제54호
    • /
    • pp.15-26
    • /
    • 2005
  • The aim of this paper was to analyze the lexical effects on spoken word recognition of Korean monosyllabic word. The lexical factors chosen in this paper was frequency, density and lexical familiarity of words. Result of the analysis was as follows; frequency was the significant factor to predict spoken word recognition score of monosyllabic word. The other factors were not significant. This result suggest that word frequency should be considered in speech perception test.

  • PDF

낱말 인식 검사에 대한 어휘적 특성의 영향 분석 (Analysis of Lexical Effect on Spoken Word Recognition Test)

  • 윤미선;이봉원
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.77-80
    • /
    • 2005
  • The aim of this paper was to analyze the lexical effects on spoken word recognition of Korean monosyllabic word. The lexical factors chosen in this paper was frequency, density and lexical familiarity of words. Result of the analysis was as follows; frequency was the significant factor to predict spoken word recognition score of monosyllabic word. The other factors were not significant. This result suggest that word frequency should be considered in speech perception test.

  • PDF

한의학 고문헌 데이터 분석을 위한 단어 임베딩 기법 비교: 자연어처리 방법을 적용하여 (Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method)

  • 오준호
    • 대한한의학원전학회지
    • /
    • 제32권1호
    • /
    • pp.61-74
    • /
    • 2019
  • Objectives : The purpose of this study is to help select an appropriate word embedding method when analyzing East Asian traditional medicine texts as data. Methods : Based on prescription data that imply traditional methods in traditional East Asian medicine, we have examined 4 count-based word embedding and 2 prediction-based word embedding methods. In order to intuitively compare these word embedding methods, we proposed a "prescription generating game" and compared its results with those from the application of the 6 methods. Results : When the adjacent vectors are extracted, the count-based word embedding method derives the main herbs that are frequently used in conjunction with each other. On the other hand, in the prediction-based word embedding method, the synonyms of the herbs were derived. Conclusions : Counting based word embedding methods seems to be more effective than prediction-based word embedding methods in analyzing the use of domesticated herbs. Among count-based word embedding methods, the TF-vector method tends to exaggerate the frequency effect, and hence the TF-IDF vector or co-word vector may be a more reasonable choice. Also, the t-score vector may be recommended in search for unusual information that could not be found in frequency. On the other hand, prediction-based embedding seems to be effective when deriving the bases of similar meanings in context.

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

  • Al-Sabahi, Kamal;Zuping, Zhang;Kang, Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권1호
    • /
    • pp.254-276
    • /
    • 2019
  • Since the amount of information on the internet is growing rapidly, it is not easy for a user to find relevant information for his/her query. To tackle this issue, the researchers are paying much attention to Document Summarization. The key point in any successful document summarizer is a good document representation. The traditional approaches based on word overlapping mostly fail to produce that kind of representation. Word embedding has shown good performance allowing words to match on a semantic level. Naively concatenating word embeddings makes common words dominant which in turn diminish the representation quality. In this paper, we employ word embeddings to improve the weighting schemes for calculating the Latent Semantic Analysis input matrix. Two embedding-based weighting schemes are proposed and then combined to calculate the values of this matrix. They are modified versions of the augment weight and the entropy frequency that combine the strength of traditional weighting schemes and word embedding. The proposed approach is evaluated on three English datasets, DUC 2002, DUC 2004 and Multilingual 2015 Single-document Summarization. Experimental results on the three datasets show that the proposed model achieved competitive performance compared to the state-of-the-art leading to a conclusion that it provides a better document representation and a better document summary as a result.

음운구 경계와 단어빈도가 한국어 음운단어 재인에 미치는 영향 (Phonological phrase boundary and word frequency that influence the phonological word recognition)

  • 김제홍;신하선;김예슬;윤광열;김다슬;신지영;남기춘
    • 말소리와 음성과학
    • /
    • 제11권2호
    • /
    • pp.45-56
    • /
    • 2019
  • 본 연구는 한국어 말소리 단어를 처리할 때, 운율구성성분인 음운구 경계와 어휘변인인 단어빈도가 상호 작용하는지를 알아보았다. 이를 위해 4개의 음운구로 발화된 문장에서 참가자가 목표단어를 찾을 때, 음운구 경계에 걸침 유무에 따라서 생기는 방해효과를 단어찾기 과제(word monitoring task)를 통해서 조사하였다. 목표단어는 2음절의 고빈도와 저빈도 단어들이 실험자 내 조건으로, 4개의 음운구로 발화된 문장에서 각각 음운구 경계 간(목표단어: 대표, 음운구 경계: [이사회의] [반대] [표명이] [있었다]) 조건과 음운구 경계 내(목표단어: 마차, 음운구 경계: [세뱃돈은] [항상] [우리] [엄마 차지였다]) 조건이 실험자 간 조건으로 설계되었다. 실험 결과, 두 변인 중 음운구 경계의 주 효과가 유의미하였으며, 상호작용도 유의미하였다. 사후분석 결과 음운구 경계 내 그룹에서만 고빈도 목표단어를 저빈도 목표단어보다 유의미하게 빠르게 탐색하는 것으로 나타났고 음운구 경계 간 그룹에서는 목표단어의 빈도효과가 나타나지 않았다. 이 결과를 기반으로 음운 단어재인시 단어의 빈도변인이 초기 단계에 영향을 미치는 여부와 한국어 말소리 처리에서 두 변인의 중요성을 논의하였다.

TF-IDF를 활용한 한글 자연어 처리 연구 (A study on Korean language processing using TF-IDF)

  • 이종화;이문봉;김종원
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제28권3호
    • /
    • pp.105-121
    • /
    • 2019
  • Purpose One of the reasons for the expansion of information systems in the enterprise is the increased efficiency of data analysis. In particular, the rapidly increasing data types which are complex and unstructured such as video, voice, images, and conversations in and out of social networks. The purpose of this study is the customer needs analysis from customer voices, ie, text data, in the web environment.. Design/methodology/approach As previous study results, the word frequency of the sentence is extracted as a word that interprets the sentence has better affects than frequency analysis. In this study, we applied the TF-IDF method, which extracts important keywords in real sentences, not the TF method, which is a word extraction technique that expresses sentences with simple frequency only, in Korean language research. We visualized the two techniques by cluster analysis and describe the difference. Findings TF technique and TF-IDF technique are applied for Korean natural language processing, the research showed the value from frequency analysis technique to semantic analysis and it is expected to change the technique by Korean language processing researcher.

유통업태 연구동향 분석: 백화점을 중심으로 (Research Trend Analysis of the Retail Industry: Focusing on the Department Store)

  • Hoe-Chang YANG
    • 융합경영연구
    • /
    • 제11권5호
    • /
    • pp.45-55
    • /
    • 2023
  • Purpose: As one of the continuous studies on the offline distribution industry, the purpose of this study is to find ways for offline stores to respond to the growth of online shopping by identifying research trends on department stores. Research design, data and methodology: To this end, this study conducted word frequency analysis, word co-occurrence frequency analysis, BERTopic, LDA, and dynamic topic modeling using Python 3.7 on a total of 551 English abstracts searched with the keyword 'department store' in scienceON as of October 10, 2022. Results: The results of word frequency analysis and co-occurrence frequency analysis revealed that research related to department stores frequently focuses on factors such as customers, consumers, products, satisfaction, services, and quality. BERTopic and LDA analyses identified five topics, including 'store image,' with 'shopping information' showing relatively high interest, while 'sales systems' were observed to have relatively lower interest. Conclusions: Based on the results of this study, it was concluded that research related to department stores has so far been conducted in a limited scope, and it is insufficient to provide clues for department stores to secure competitiveness against online platforms. Therefore, it is suggested that additional research be conducted on topics such as the true role of department stores in the retail industry, consumer reinterpretation, customer value and lifetime value, department stores as future retail spaces, ethical management, and transparent ESG management.

벅아이 코퍼스에서의 젊은 성인 남성의 모음 포먼트 분석 (An Analysis of the Vowel Formants of the Young Males in the Buckeye Corpus)

  • 윤규철;노혜욱
    • 말소리와 음성과학
    • /
    • 제4권2호
    • /
    • pp.41-49
    • /
    • 2012
  • The purpose of this paper is to extract the vowel formants of the ten young male speakers from the Buckeye Corpus of Conversational Speech [1] and to analyze them in comparison to earlier works in terms of various phonetic factors that are expected to affect the realization of the formant distribution. The first two formant frequency values were automatically extracted with a Praat script along with such factors as the place of articulation, the content versus function word information, syllabic stress information, the location in a word, location in utterance, speech rate of three consecutive words, and the word frequency in the corpus. The results indicated that the formant patterns from the corpus were very different from those of earlier works although the overall pattern was similar and that the factors were strongly responsible for the realization of the two formants. The purpose of this paper is to extract the vowel formants of the ten young male speakers from the Buckeye Corpus of Conversational Speech [1] and to analyze them in comparison to earlier works in terms of various phonetic factors that are expected to affect the realization of the formant distribution. The first two formant frequency values were automatically extracted with a Praat script along with such factors as the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The result indicated that the formant patterns from the corpus were very different from those of earlier works although the overall pattern was similar and that the factors were strongly responsible for the realization of the two formants.

Exploring Depression Research Trends Using BERTopic and LDA

  • Woo-Ryeong, YANG;Hoe-Chang, YANG
    • 식품보건융합연구
    • /
    • 제9권1호
    • /
    • pp.19-28
    • /
    • 2023
  • The purpose of this study is to explore which areas have been more interested in depression research in Korea through analysis of academic papers related to depression, and then to provide insights that can solve future depression problems. 1,032 papers searched with the keyword "depression" in scienceON were analyzed using Python 3.7 for word frequency analysis, word co-occurrence analysis, BERTopic, LDA, and OLS regression analysis. The results of word frequency and co-occurrence frequency analysis showed that related words were composed around words such as patient, disorder and symptom. As a result of topic modeling, a total of 13 topics including 'childhood depression' and 'eating anxiety' were derived. And it has been identified as a topic of interest that 'suicidal thoughts', 'treatment', 'occupational health', and 'health treatment program' were statistically significant topics, while 'child depression' and 'female treatment' were relatively less. As a result of the analysis of research trends, future research will not only study physiological and psychological factors but also social and environmental causes, as well as it was suggested that various collaborative studies of experts in academia were needed such as convergence and complex perspectives for depression relief and treatment.

벅아이 코퍼스에서의 젊은 성인 여성의 모음 포먼트 분석 (An Analysis of the Vowel Formants of the Young Females in the Buckeye Corpus)

  • 윤규철
    • 말소리와 음성과학
    • /
    • 제4권4호
    • /
    • pp.45-52
    • /
    • 2012
  • The purpose of this paper is to measure the first two vowel formants of the ten young female speakers from the Buckeye Corpus of Conversational Speech [1] automatically and then to analyze various potential factors that may affect the formant distribution of the eight peripheral vowels of English. The factors that were analyzed included the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The results indicate that the overall formant patterns of the female speakers were similar to those of earlier works. The effects of the factors on the realization of the two formants were also similar to those from the male speakers with minor differences.