• Title/Summary/Keyword: Frequency of Words

Search Result 874, Processing Time 0.027 seconds

The Influence of Age of Acquisition in Hangul Word Recognition (한글단어재인에서 습득연령의 영향)

  • Lee, Hye-Won;Kim, Sun-Kyoung
    • Korean Journal of Cognitive Science
    • /
    • v.24 no.4
    • /
    • pp.339-363
    • /
    • 2013
  • The age of acquisition effect is the phenomenon in which the words acquired early in life are processed better than the words acquired later in life. Age of acquisition and word frequency are critical factors in lexical processing. In this study we examined the age of acquisition effects in Hangul word recognition. In Experiment 1, we examined the AoA effects in word naming and lexical decision tasks. The results showed that there was an interaction between task and age of acquisition. The AoA effects appeared only in the lexical decision task. In Experiment 2, we examined the relationship between age of acquisition and word frequency in the lexical decision task. The results showed that the two variables were significant. The early-acquired words were processed better than the words acquired later, and the words with high frequency were processed better than the words with low frequency. However, there was no interaction between the two variables. In Experiment 3, we examined how phonological changes in Hangul words influence the AoA effects. The results show that the AoA effects were similar whether phonological changes occur or not. Our results are discussed in terms of several theoretical hypotheses.

  • PDF

Appearance Frequency of 'Eco-Friendly' Emotion and Sensibility Words and their Changes (친환경 감성 어휘의 종류별 사용빈도 및 변화 양상)

  • Na, Young-Joo
    • Science of Emotion and Sensibility
    • /
    • v.14 no.2
    • /
    • pp.207-220
    • /
    • 2011
  • The purpose of this study is to investigate sensibility words related with eco-friendly in the two media fashion magazines and internet newspapers and to analysis their appearance frequency and changes by the year through 1999~2010. Most frequently used words are 'nature, eco, cotton, natural fiber, health, fresh, clear, preservation, harmony, com fiber, and Lohas'. The words are divided in 4 groups: 'Nature/Environment, Material/Fiber, Human, and Adjectives/Micell'. A point of appearing time is analyzed: 'ecology, memory-shape material, organic, spa' were used before 2000, 'nature environment, eco-friendly, stretch material, wellbeing, substitute, recycling' were in 2000-2001, 'smart material, eco material, green' in 2002-2003, 'coolbiz, Lohas, natural dye' in 2004-2005, 'herb medicine, sustainable, warmbiz' in 2006-2007, 'greensumer, greenlife, solar energy, forest bath' in 2008-2009. Looking into their changes, in early 2000, the words of eco-friendly emotion and sensibility had appeared frequently relatively, but later on they decreased, and again recently increased showing highest appearing frequency. 'Nature/Environment' words have appeared recently very much, while 'Human' sensibility words have not changed much or decreased a little. 'Adjective/Micell' words has increased little bit recently. 'Material/Fiber' words showed decrease at fashion magazine, while they increased at the pages of internet news.

  • PDF

Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data

  • Zhang, Jie;Zhang, Jianing;Ma, Shuhao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1400-1418
    • /
    • 2020
  • In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.

Predictors of Preschoolers' Reading Skills : Analysis by Age Groups and Reading Tasks (유아의 단어읽기 능력 예측변수 : 연령 집단별, 단어 유형별 분석)

  • Choi, Na-Ya;Yi, Soon-Hyung
    • Journal of Families and Better Life
    • /
    • v.26 no.4
    • /
    • pp.41-54
    • /
    • 2008
  • The purpose of this study was to investigate predictors concerning preschoolers' ability to read words, in terms of their sub-skills of alphabet knowledge, phonological awareness, and phonological processing. Fourteen literacy sub-tests and three types of reading tasks were administered to 289 kindergartners aged 4 to 6 in Busan. The main results are as follows. Sub-skills that predicted reading ability varied with children's age. Irrespective of children's age groups, knowledge of consonant names and digit naming speed commonly explained the reading of real words. In contrast, skills of syllable deletion and phoneme substitution and knowledge of alphabet composition principles were related to only 4-year-olds' reading skills. Exclusively included was digit memory in predicting 5-year-olds' reading abilities, and knowledge of vowel sounds in 6-year-olds' reading skills. The type of reading task also influenced reading ability. A few common variables such as knowledge of consonant names and vowel sounds, digit naming speed, and phoneme substitution skill explained all types of word reading. Syllable counting skills, however, had predictive value only for the reading of real words. Phoneme insertion skills and digit memory had predictive value for the reading of pseudo words and low frequency letters. Likewise, knowledge of consonant sounds and vowel stroke-adding principles were significant only for the reading of low frequency letters.

Keyword identifications on dimensions for service quality of Healthcare providers (헬스케어 서비스 리뷰를 활용한 서비스 품질 차원 별 중요 단어 파악 방안)

  • Lee, Hong Joo
    • Knowledge Management Research
    • /
    • v.19 no.4
    • /
    • pp.171-185
    • /
    • 2018
  • Studies on online review have carried out analysis of the rating and topic as a whole. However, it is necessary to analyze opinions on various dimensions of service quality. This study classifies reviews of healthcare services into service quality dimensions, and proposes a method to identify words that are mainly referred to in each dimension. Service quality was based on the dimensions provided by SERVQUAL, and patient reviews have collected from NHSChoice. The 2,000 sentences sampled were classified into service quality dimension of SERVQUAL and a method of extracting important keywords from sentences by service quality dimension was suggested. The RAKE algorithm is used to extract key words from a single document and an index is considered to consider frequently used words in various documents. Since we need to identify key words in various reviews, we have considered frequency and discrimination (IDF) at the same time, rather than identifying key words based only on the RAKE score. In SERVQUAL dimension, we identified the words that patients mentioned mainly, and also identified the words that patients mainly refer to by review rating.

Detecting Spelling Errors by Comparison of Words within a Document (문서내 단어간 비교를 통한 철자오류 검출)

  • Kim, Dong-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.12
    • /
    • pp.83-92
    • /
    • 2011
  • Typographical errors by the author's mistyping occur frequently in a document being prepared with word processors contrary to usual publications. Preparing this online document, the most common orthographical errors are spelling errors resulting from incorrectly typing intent keys to near keys on keyboard. Typical spelling checkers detect and correct these errors by using morphological analyzer. In other words, the morphological analysis module of a speller tries to check well-formedness of input words, and then all words rejected by the analyzer are regarded as misspelled words. However, if morphological analyzer accepts even mistyped words, it treats them as correctly spelled words. In this paper, I propose a simple method capable of detecting and correcting errors that the previous methods can not detect. Proposed method is based on the characteristics that typographical errors are generally not repeated and so tend to have very low frequency. If words generated by operations of deletion, exchange, and transposition for each phoneme of a low frequency word are in the list of high frequency words, some of them are considered as correctly spelled words. Some heuristic rules are also presented to reduce the number of candidates. Proposed method is able to detect not syntactic errors but some semantic errors, and useful to scoring candidates.

The Phonemic Characteristics of Disfluencies in Children and Adults Who Stutter (말더듬 아동과 성인에게서 나타난 비유창성의 음운특성)

  • Han, Jin-Soon;Lee, Eun-Ju;Sim, Hyun-Sub
    • Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.59-77
    • /
    • 2005
  • The aim of the present study is to investigate how the phonemic characteristics influence on the disfluencies of children and adults who stutter. The participants were 10 children(9 boys and 1 girl) and 10 male adults. After having the participants to read out the Paradise-Fluency Assessment(Sim, Shin & Lee, 2004) passages, each of the productions were divided into syllables and words, and then the frequencies and the ratios of their disfluenceis were analyzed according to the specified phonemic features. In terms of the frequency of the disfluency, the participants stuttered more in the words which start with consonant than vowel. But they showed more disfluencies in the words initiated with vowel than consonant when the ratio of each phoneme's presences were considered. There found different tendencies among the phonemic features related with their disfluencies occuring with ralatively high frequency or ratio. It was difficult to find out the exact relationships among the order of the sound acquisition, phonemic complexity, and the disfluencies. To study the exact influence of the phonemic features upon the disfluencies, it comes important to consider the frequency of the stuttering itself together with the ratio of the disfluencies in which the opportunity of the specific sound's presence was considered. To compare the results of the different studies which has similar purposes, it seems important to consider the tasks and the methodologies in depth.

  • PDF

Automatic Construction of Korean Unknown Word Dictionary using Occurrence Frequency in Web Documents (웹문서에서의 출현빈도를 이용한 한국어 미등록어 사전 자동 구축)

  • Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.27-33
    • /
    • 2008
  • In this paper, we propose a method of automatically constructing a dictionary by extracting unknown words from given eojeols in order to improve the performance of a Korean morphological analyzer. The proposed method is composed of a dictionary construction phase based on full text analysis and a dictionary construction phase based on web document frequency. The first phase recognizes unknown words from strings repeatedly occurred in a given full text while the second phase recognizes unknown words based on frequency of retrieving each string, once occurred in the text, from web documents. Experimental results show that the proposed method improves 32.39% recall by utilizing web document frequency compared with a previous method.

  • PDF

Analysis of key words published with the Korea Society of Emergency Medical Services journal using text mining (텍스트마이닝을 이용한 한국응급구조학회지 중심단어 분석)

  • Kwon, Chan-Yang;Yang, Hyun-Mo
    • The Korean Journal of Emergency Medical Services
    • /
    • v.24 no.1
    • /
    • pp.85-92
    • /
    • 2020
  • Purpose: The purpose of this study was to analyze the English abstract key words found within the Korea Society of Emergency Medical Services journal using text mining techniques to determine the adherence of these terms with Medical Subject Headings (MeSH) and identify key word trends. Methods: We analyzed 212 papers that were published from 2012 to 2019. R software, web scraping, and frequency analysis of key words were conducted using R's basic and text mining packages. Additionally, the Word Clouds package was used for visualization. Results: The average number of key words used per study was 3.9. Word cloud visualization revealed that CPR was most prominent in the first half and emergency medical technician was most frequently used during the second half. There were a total of 542 (64.9%) words that exactly matched the MeSH listed words. A total of 293 (35%) key words did not match MeSH listed words. Conclusion: Researchers should obey submission rules. Further, journals should update their respective submission rules. MeSH key words that are frequently cited should be suggested for use.

Effects of Preschoolers' Visual Perception on Reading Words in Hangul : Application of the Test of Visual Perception for Reading (유아의 시지각 발달과 읽기 : 수.방향.형태항상성 지각이 한글 단어 읽기에 미치는 영향)

  • Choi, Na-Ya
    • Korean Journal of Child Studies
    • /
    • v.30 no.2
    • /
    • pp.161-177
    • /
    • 2009
  • In this study of the relationship between preschoolers' visual perception and reading Hangul words, the 287 participants showed significant developmental change in visual perception between three to five years of age. The researcher developed the computer-based screening Test of Visual Perception for Reading (TVPR). Factor analysis confirmed three factors of TVPR : perception of number, direction, and form constancy. These factors correlated highly with four factors of motor-reduced visual perception of the Korean Developmental Test of Visual Perception (Moon et al. 2003). All factors of TVPR explained reading real words and pseudo words; direction and form constancy perception predicted reading low frequency letters. These findings confirm that preschoolers' skills in visual perception contribute to the reading of words in Hangul.

  • PDF