• Title/Summary/Keyword: Frequency of Words

Search Result 881, Processing Time 0.027 seconds

A Study of Development for Korean Phonotactic Probability Calculator (한국어 음소결합확률 계산기 개발연구)

  • Lee, Chan-Jong;Lee, Hyun-Bok;Choi, Hun-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.239-244
    • /
    • 2009
  • This paper is to develop the Korean Phonotactic Probability Calculator (KPPC) that anticipates the phonotactic probability in Korean. KPPC calculates the positional segment frequecncy, position-specific biphone frequency and position-specific triphone frequency. And KPPC also calculates the Neighborhood Density that is the number of words that sound similar to a target word. The Phonotactic Calculator that was developed in University of Kansas can be analyzed by the computer-readable phonemic transcription. This can calculate positional frequency and position-specific biphone frequency that were derived from 20,000 dictionary words. But KPPC calculates positional frequency, positional biphone frequency, positional triphone frequency and neighborhood density. KPPC can calculate by korean alphabet or computer-readable phonemic transcription. This KPPC can anticipate high phonotactic probability, low phonotactic probability, high neighborhood density and low neighborhood density.

RECENT RESEARCH AND DEVELOPING TREND OF ENGINEERING MANAGEMENT IN CHINA BASED ON TEXT MINING

  • Shaohua Jiang;Wenling Zhang;Zhaohong Qiu;Shaojun Wang
    • International conference on construction engineering and project management
    • /
    • 2009.05a
    • /
    • pp.814-820
    • /
    • 2009
  • With the rapid development of China economy, many engineering projects with large scale and investment were constructed in China and some were the biggest ones in the world. With the development of engineering practice, great progress in the research of engineering management of China was made and a large number of research findings were embodied in content of research papers and were represented by technical words. To know the state of arts in the research field of engineering management in China, three major parts, namely title, abstract and keywords of research papers in last five years from three representative Chinese journals about engineering management were chose as research materials. Unlike western languages, there are no delimiters between the words of Chinese, so the maximum matching and frequency statistics (MMFS) method, a text segmentation technique of text mining Chinese, was presented to extract the features consisting of technical words, phrases and words from the research materials. Recent research and developing trend of engineering management in China were found by comparing and analyzing the difference of technical words in the research materials of last five years.

  • PDF

A Study on the Recognition Analysis of Participants in Urban Regeneration Project Using Text Network Analysis Technique (NetMiner): Focused on the Urban Regeneration Leading Area in Suncheon-City

  • Gim, Eo-Jin;Koo, Ja-Hoon
    • International Journal of Advanced Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.246-254
    • /
    • 2019
  • The purpose of this study is to analyze the issues related to urban regeneration project at the present time through surveys and interviews of participants in the urban regeneration leading project of Suncheon city. Most of the comments were related to business fragmentation and things that should be improved in the future. The text network technique is applied to the subject analysis using unstructured text data. As a result of the frequency of appearance and analysis of page rank centrality between words, words of 'parking', 'need', 'lack', 'region' and 'resident' appeared at the top, and the result of analyzing the mediation centrality of key words showed 'culture', 'Need', 'region', 'inflow' and 'lack' appeared at the top. In the network analysis, the most central words appeared, and many words appeared in the important position in the sentence. Text network analysis has provided timely results in terms of sustainability after completion of the Suncheon City Regeneration Leading Project..

Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Part-of-Speech Tagged Corpus (품사 부착 말뭉치를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선)

  • Lim, Min-Kyu;Kim, Kwang-Ho;Kim, Ji-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.181-193
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using a part-of-speech (POS) tagged corpus. We investigate 152 POS tags defined in Lancaster-Oslo-Bergen (LOB) corpus and word-POS tag pairs. We derive a new vocabulary through word addition. Words paired with some POS tags have to be included in vocabularies with any size, but the vocabulary inclusion of words paired with other POS tags varies based on the target size of vocabulary. The 152 POS tags are categorized according to whether the word addition is dependent of the size of the vocabulary. Using expert knowledge, we classify POS tags first, and then apply different ways of word addition based on the POS tags paired with the words. The performance of the proposed method is measured in terms of coverage and is compared with those of vocabularies with the same size (5,000 words) derived from frequency lists. The coverage of the proposed method is measured as 95.18% for the test short message service (SMS) text corpus, while those of the conventional vocabularies cover only 93.19% and 91.82% of words appeared in the same SMS text corpus.

  • PDF

Analysis of ICT Education Trends using Keyword Occurrence Frequency Analysis and CONCOR Technique (키워드 출현 빈도 분석과 CONCOR 기법을 이용한 ICT 교육 동향 분석)

  • Youngseok Lee
    • Journal of Industrial Convergence
    • /
    • v.21 no.1
    • /
    • pp.187-192
    • /
    • 2023
  • In this study, trends in ICT education were investigated by analyzing the frequency of appearance of keywords related to machine learning and using conversion of iteration correction(CONCOR) techniques. A total of 304 papers from 2018 to the present published in registered sites were searched on Google Scalar using "ICT education" as the keyword, and 60 papers pertaining to ICT education were selected based on a systematic literature review. Subsequently, keywords were extracted based on the title and summary of the paper. For word frequency and indicator data, 49 keywords with high appearance frequency were extracted by analyzing frequency, via the term frequency-inverse document frequency technique in natural language processing, and words with simultaneous appearance frequency. The relationship degree was verified by analyzing the connection structure and centrality of the connection degree between words, and a cluster composed of words with similarity was derived via CONCOR analysis. First, "education," "research," "result," "utilization," and "analysis" were analyzed as main keywords. Second, by analyzing an N-GRAM network graph with "education" as the keyword, "curriculum" and "utilization" were shown to exhibit the highest correlation level. Third, by conducting a cluster analysis with "education" as the keyword, five groups were formed: "curriculum," "programming," "student," "improvement," and "information." These results indicate that practical research necessary for ICT education can be conducted by analyzing ICT education trends and identifying trends.

Error Correction and Praat Script Tools for the Buckeye Corpus of Conversational Speech (벅아이 코퍼스 오류 수정과 코퍼스 활용을 위한 프랏 스크립트 툴)

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.29-47
    • /
    • 2012
  • The purpose of this paper is to show how to convert the label files of the Buckeye Corpus of Spontaneous Speech [1] into Praat format and to introduce some of the Praat scripts that will enable linguists to study various aspects of spoken American English present in the corpus. During the conversion process, several types of errors were identified and corrected either manually or automatically by the use of scripts. The Praat script tools that have been developed can help extract from the corpus massive amounts of phonetic measures such as the VOT of plosives, the formants of vowels, word frequency information and speech rates that span several consecutive words. The script tools can extract additional information concerning the phonetic environment of the target words or allophones.

The Perception-Based Study of a Weak Syllable in English Words Containing Weak-Strong Pattern by Korean Learners (I) (약강구조를 포함하는 영어단어에 대한 영어학습자의 약음절 지각과 반응시간(I))

  • Shin Ji-Young;Kim Kee-Ho;Kim Hee-Sung
    • MALSORI
    • /
    • no.57
    • /
    • pp.31-42
    • /
    • 2006
  • The purpose of this study is to observe how Korean learners perceive an English weak syllable in words containing WS syllable pattern. According to the automated discrimination task using E-Prime, the ratio of correct answer(%) and reaction time of the stimuli with same syllable patterns were respectively higher and faster than those with different syllable patterns. Specifically, in the stimuli with different syllable patterns, the frequency(familiarity) of stressed word succeeding weak syllable and whether the weak syllable had coda in it were two important factors in distinguishing between a word with and without weak syllable. Even though the high English proficiency Koreans had faster reaction time than the low English proficiency Koreans, all Korean learners had a difficulty in perceiving the weak syllable at the beginning of a word.

  • PDF

The text-to-speech system assessment based on word frequency and word regularity effects (단어빈도와 단어규칙성 효과에 기초한 합성음 평가)

  • Nam Kichun;Choi Wonil;Lee Donghoon;Koo Minmo;Kim Jongjin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.105-108
    • /
    • 2002
  • In the present study, the intelligibility of the synthesized speech sounds was evaluated by using the psycholinguistic and fMRI techniques, In order to see the difference in recognizing words between the natural and synthesized speech sounds, word regularity and word frequency were varied. The results of Experiment1 and Experiment2 showed that the intelligibility difference of the synthesized speech comes from word regularity. There were smaller activation of the auditory areas in brain and slower recognition time for the regular words.

  • PDF

Acoustic Characteristics of Vowels in Korean Distant-Talking Speech (한국어 원거리 음성의 모음의 음향적 특성)

  • Lee Sook-hyang;Kim Sunhee
    • MALSORI
    • /
    • v.55
    • /
    • pp.61-76
    • /
    • 2005
  • This paper aims to analyze the acoustic effects of vowels produced in a distant-talking environment. The analysis was performed using a statistical method. The influence of gender and speakers on the variation was also examined. The speech data used in this study consist of 500 distant-talking words and 500 normal words of 10 speakers (5 males and 5 females). Acoustic features selected for the analysis were the duration, the formants (Fl and F2), the fundamental frequency and the total energy. The results showed that the duration, F0, F1 and the total energy increased in the distant-talking speech compared to normal speech; female speakers showed higher increase in all features except for the total energy and the fundamental frequency. In addition, speaker differences were observed.

  • PDF

A Social Network Analysis of Research Key Words Related Smoke Cessation in South Korea (연결망 분석을 활용한 우리나라 금연연구 동향분석)

  • An, Eun-Seong
    • Health Policy and Management
    • /
    • v.29 no.2
    • /
    • pp.138-145
    • /
    • 2019
  • Background: The purpose of this study is supposed to figure out the keyword network from 2009 to 2018 with social network analysis and provide the research data that can help the Korea government's policy making on smoking cessation. Methods: First, frequency analysis on the keyword was performed. After, in this study, I applied three classic centrality measures (degree centrality, betweenness centrality, and eigenvector centrality) with R 3.5.1. Moreover, I visualized the results as the word cloud and keyword network. Results: As a result of network analysis, 'smoking' and 'smoking cessation' were key words with high frequency, high degree centrality, and betweenness centrality. As a result of looking at trends in keyword, many study had been done on the keyword 'secondhand smoke' and 'adolescent' from 2009 to 2013, and 'cigarette graphic warning' and 'electronic cigarette' from 2014 to 2018. Conclusion: This study contributes to understand trends on smoking cessation study and seek further study with the keyword network analysis.