• 제목/요약/키워드: Word Frequency

검색결과 755건 처리시간 0.023초

2-5 세 아동의 자발적 발화에 나타난 한국어 음절 및 음운 빈도 (Syllable and Phoneme Frequencies in the Spontaneous Speech of 2-5 year-old Korean Children)

  • 김민정;배소영;고도흥
    • 음성과학
    • /
    • 제8권4호
    • /
    • pp.99-107
    • /
    • 2001
  • The purpose of this study was to investigate the syllable and phoneme frequencies in the spontaneous speech of some Korean children. Sixty four normally developing children aged from 2 to 5 were involved (male: female=1 : 1, 16 children in each age group). Fifty connected utterances were analyzed using the KCLA (Korean Computerized Language Analysis) 2.0 and Exel. The findings were as follows: 1) /i/ was the most frequently used syllable and was followed by /yo/, /k/, /s'/, /nen/ and so on. 2) The most frequently used Korean phonemes were syllable-initial consonant /k/, syllable- medial vowel /a/ and syllable-final consonant /n/. 3) All seven syllable final consonants (/p,t,k,m,n,n,l/) were used more frequently in the word-medial position than in the word-final position. Three syllable initial consonants(/k, I, s'/) were used more frequently in the word-medial position than in the word-initial position. The syllable and phoneme frequencies in the Korean children's spontaneous speech will provide valuable information in interpreting the severity of phonological disorder and in developing tools for the Korean phonological assessment and intervention.

  • PDF

Association Modeling on Keyword and Abstract Data in Korean Port Research

  • Yoon, Hee-Young;Kwak, Il-Youp
    • Journal of Korea Trade
    • /
    • 제24권5호
    • /
    • pp.71-86
    • /
    • 2020
  • Purpose - This study investigates research trends by searching for English keywords and abstracts in 1,511 Korean journal articles in the Korea Citation Index from the 2002-2019 period using the term "Port." The study aims to lay the foundation for a more balanced development of port research. Design/methodology - Using abstract and keyword data, we perform frequency analysis and word embedding (Word2vec). A t-SNE plot shows the main keywords extracted using the TextRank algorithm. To analyze which words were used in what context in our two nine-year subperiods (2002-2010 and 2010-2019), we use Scattertext and scaled F-scores. Findings - First, during the 18-year study period, port research has developed through the convergence of diverse academic fields, covering 102 subject areas and 219 journals. Second, our frequency analysis of 4,431 keywords in 1,511 papers shows that the words "Port" (60 times), "Port Competitiveness" (33 times), and "Port Authority" (29 times), among others, are attractive to most researchers. Third, a word embedding analysis identifies the words highly correlated with the top eight keywords and visually shows four different subject clusters in a t-SNE plot. Fourth, we use Scattertext to compare words used in the two research sub-periods. Originality/value - This study is the first to apply abstract and keyword analysis and various text mining techniques to Korean journal articles in port research and thus has important implications. Further in-depth studies should collect a greater variety of textual data and analyze and compare port studies from different countries.

Topic Analysis of Foreign Policy and Economic Cooperation: A Text Mining Approach

  • Jiaen Li;Youngjun Choi
    • Journal of Korea Trade
    • /
    • 제26권8호
    • /
    • pp.37-57
    • /
    • 2022
  • Purpose -International diplomacy is key for the cohesive economic growth of countries around the world. This study aims to identify the major topics discussed and make sense of word pairs used in sentences by Chinese senior leaders during their diplomatic visits. It also compares the differences between key topics addressed during diplomatic visits to developed and developing countries. Design/methodology - We employed three methods: word frequency, co-word, and semantic network analysis. Text data are crawling state and official visit news released by the Ministry of Foreign Affairs of the People's Republic of China regarding diplomatic visits undertaken from 2015-2019. Findings - The results show economic and diplomatic relations most prominently during state and official visits. The discussion topics were classified according to nine centrality keywords most central to the structure and had the maximum influence in China. Moreover, the results showed that China's diplomatic issues and strategies differ between developed and developing countries. The topics mentioned in developing countries were more diverse. Originality/value - Our study proposes an effective approach to identify key topics in Chinese diplomatic talks with other countries. Moreover, it shows that discussion topics differ for developed and developing countries. The findings of this research can help researchers conduct empirical studies on diplomacy relationships and extend our method to other countries. Additionally, it can significantly help key policymakers gain insights into negotiations and establish a good diplomatic relationship with China.

전화통화 빅데이터 분석에 관한 연구 (A Study on Phon Call Big Data Analytics)

  • 김정래;정찬기
    • 정보화연구
    • /
    • 제10권3호
    • /
    • pp.387-397
    • /
    • 2013
  • 본 연구는 전화통화에 의해 생성된 데이터에 대한 빅데이터 분석 접근을 제안한다. 전화통화 데이터의 분석모형은 자연어의 어휘식별을 위한 PVPF(Parallel Variable-length Phrase Finding) 알고리즘과 키워드의 사용빈도 측정을 위한 워드 카운트 알고리즘으로 구성된다. 제안한 분석모형에서는 먼저 PVPF 알고리즘에 의해 연계 단어 추출을 통해 어휘를 식별하며, MapReduce의 워드 카운트 알고리즘을 사용하여 식별된 어휘 및 단어의 사용빈도를 측정한다. 그 결과는 다양한 관점에서 해석될 수 있다. 제안 분석모형의 효과성을 보이기 위해 HDFS(Hadoop Distributed File System)를 기반으로 분석모형을 설계 구현하였으며, 전화통화 데이터를 실험 적용한다. 실험결과, 키워드 상관관계 분석 및 사용빈도 변화 분석을 통해 유의미한 결과를 도출한다.

Exploring Depression Research Trends Using BERTopic and LDA

  • Woo-Ryeong, YANG;Hoe-Chang, YANG
    • 식품보건융합연구
    • /
    • 제9권1호
    • /
    • pp.19-28
    • /
    • 2023
  • The purpose of this study is to explore which areas have been more interested in depression research in Korea through analysis of academic papers related to depression, and then to provide insights that can solve future depression problems. 1,032 papers searched with the keyword "depression" in scienceON were analyzed using Python 3.7 for word frequency analysis, word co-occurrence analysis, BERTopic, LDA, and OLS regression analysis. The results of word frequency and co-occurrence frequency analysis showed that related words were composed around words such as patient, disorder and symptom. As a result of topic modeling, a total of 13 topics including 'childhood depression' and 'eating anxiety' were derived. And it has been identified as a topic of interest that 'suicidal thoughts', 'treatment', 'occupational health', and 'health treatment program' were statistically significant topics, while 'child depression' and 'female treatment' were relatively less. As a result of the analysis of research trends, future research will not only study physiological and psychological factors but also social and environmental causes, as well as it was suggested that various collaborative studies of experts in academia were needed such as convergence and complex perspectives for depression relief and treatment.

한국어 형태소 분석을 위한 효율적 기분석 사전의 구성 방법 (Construction of an Efficient Pre-analyzed Dictionary for Korean Morphological Analysis)

  • 곽수정;김보겸;이재성
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제2권12호
    • /
    • pp.881-888
    • /
    • 2013
  • 기분석 사전은 형태소 분석기의 속도와 정확도를 향상시키고, 과분석을 줄이기 위해 사용된다. 하지만 기분석 사전에 저장된 어절 중에 저장된 형태소 분석 결과가 부족한 어절, 즉 불충분 분석 어절이 존재할 경우 오히려 형태소 분석기의 정확도를 떨어뜨리는 원인으로 작용할 수 있다. 본 논문에서는 세종 형태 분석 말뭉치(문어체, 2011)를 이용해 말뭉치의 크기와 어절 빈도의 변화에 따라 사전의 정답 제시율이 변화하는 양상을 측정하였다. 그리고 통계기반의 형태소 분석기인 SMA와 기분석 사전을 결합한 통합 시스템을 구성하여 기분석 사전의 충분 분석률이 99.82% 이상일 때 시스템 전체 성능이 향상되는 것을 확인하였다. 또한 160만 어절의 말뭉치를 이용할 때는 32회 이상 출현한 어절로, 630만 어절로 구성된 말뭉치를 이용할 때는 64회 이상 출현한 어절로 사전을 구성하는 것이 통합 시스템의 성능을 가장 높게 할 수 있었다.

한국어 /ㅛ/의 발음 양상 연구: 발음형 빈도와 음향적 특징을 중심으로 (Pronunciation of the Korean diphthong /jo/: Phonetic realizations and acoustic properties)

  • 이향원
    • 말소리와 음성과학
    • /
    • 제15권1호
    • /
    • pp.9-17
    • /
    • 2023
  • 이 연구의 목적은 한국어 이중모음 /ㅛ/가 다양한 언어학적 환경에서 어떠한 발음 변이 양상을 보이는지 밝히는 것이다. 특히 음성적 변이와 분포 범위의 연관성에 주목하여 /ㅛ/의 발음 양상을 논의하였다. 서울코퍼스의 여성 화자 10명의 발화에서 나타난 /ㅛ/의 운율적 위치(단음절, 어절 초, 어절 중, 어절 말)와 어휘 부류(내용어, 기능어)를 분석하였다. 각 환경에서 /ㅛ/의 출현 빈도를 파악한 결과, 운율적 위치에 따라 어휘 부류와 발음형 실현이 달라지는 양상을 보였다. 음향 분석을 통해 기능어에서 나타나는 /ㅛ/에서는 음성적 약화가 빈번하게 일어나는 것을 확인하였다. 어휘 부류는 /ㅛ/의 평균적인 음가를 달라지게 하지는 않았지만 개별 토큰의 분포 양상에서는 차이가 발견되었다. 이를 통해 언어학적 환경이 모음의 음성적 분포 양상에 영향을 미친다는 것을 알 수 있었다.

미용서비스업 DM의 인식이 만족도와 구전의도에 미치는 영향 - 20~40대 여성을 대상으로 - (The Effect of Awareness of DM in Beauty art on Satisfaction and Intention of Word-of-Mouth - On Women in Their Twenties to Forties -)

  • 박은준;김성남
    • 한국의류산업학회지
    • /
    • 제9권4호
    • /
    • pp.431-440
    • /
    • 2007
  • The study of the awareness on DM, the degree of satisfaction, and intention of word-of-mouth has found that when the confidence factor is stressed, intention of word-of-mouth is seen to rise, and when the perception factor is placed emphasis on, intention of word-of-mouth is also seen to be on the rise. The 790 questionnaires collected were analyzed by frequency, factor analysis, confidence degree, and regression analysis. This means that the higher the degree of satisfaction by means of the recognition, attention, trust of DM advertisement, the relationship reenforcement of DM advertisement and transmission of information is, the higher intention of word-of-mouth is. Beauty art business is characteristic of visits being connected with sales. The awareness has an effect on the degree satisfaction, and it affects intention of word-of-mouth. Therefore, it is thought that Beauty art business requires systematic, analytic and unique DM advertisement.

Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data

  • Zhang, Jie;Zhang, Jianing;Ma, Shuhao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권4호
    • /
    • pp.1400-1418
    • /
    • 2020
  • In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.

2020년 EBS 연계교재와 대학수학능력시험의 듣기 및 읽기 어휘 분석 (Vocabulary Analysis of Listening and Reading Texts in 2020 EBS-linked Textbooks and CSAT)

  • 강동호
    • 한국콘텐츠학회논문지
    • /
    • 제20권10호
    • /
    • pp.679-687
    • /
    • 2020
  • 본 연구의 목적은 BNC 어휘목록과 2015 교육부 기본 어휘를 중심으로 EBS 연계교재와 대학 수능시험의 어휘를 분석하고자 한다. 어휘점유율과 빈도를 분석하기 위해서 AntWordProfiler 어휘 분석프로그램이 사용되었다. 결과를 보면, 2020 EBS 수능 영어 듣기와 읽기 연계 교재는 각각 BNC 3,000 단어와 4,000 단어를 가지고 약 95%를 이해할 수 있다는 것을 보여준다. 그러나 EBS 듣기와 읽기 교재의 98%의 단어를 이해하기 위해서는 각각 4,000과 8,000 단어가 필요하다는 것을 알 수 있다. 다른 한편으로 2020 수능영어시험 듣기와 읽기의 95%를 이해하기위해서는 각각 2,000과 4,000 단어가 요구되며, 98%의 경우에는 추가적으로 4,000과 7,000의 단어가 필요하다. 결과적으로 EBS 연계교재가 대입수능영어시험보다 더 많은 어휘의 양을 요구한다는 것을 알 수 있다.