• Title/Summary/Keyword: high-frequency vocabulary

Search Result 23, Processing Time 0.029 seconds

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

  • Yuan, Yiguo;Li, Bin
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.31-41
    • /
    • 2021
  • This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

Textbooks Analysis to Select Vocabulary for Mathematics Education: Focusing on 1st and 2nd Graders in the Elementary School (교과서 분석 기반 수학교육용 어휘 선정 연구: 초등학교 1~2학년을 중심으로)

  • Kwon, Misun
    • Communications of Mathematical Education
    • /
    • v.37 no.4
    • /
    • pp.675-695
    • /
    • 2023
  • To learn mathematics effectively, understanding vocabulary is essential. Accordingly, as a way to present vocabulary for mathematics education, high-frequency vocabulary was extracted from the 2009 revised 1st and 2nd grade mathematics textbooks and the 2015 revised 1st and 2nd grade mathematics textbooks. At this time, mathematics textbooks were analyzed by grade and semester, and vocabulary with a common frequency of 5 or more was extracted. In order to use it effectively in school settings, common vocabulary for each grade and intensive vocabulary for each semester were presented. As a result of the study, 61 vocabulary words for first grade education and 121 vocabulary words for second grade education were selected. As a result of analysis by vocabulary level, various levels of vocabulary from grades 1 to 5 were used. As a result of analysis by vocabulary type, the proportion of academic words increased similarly, but the proportion of technical words was found to be highest in the first semester of the second year. Based on these results, the extracted vocabulary for mathematics education is used as a resource for vocabulary instruction for students' mathematics education in each grade to help students learn mathematics.

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

A study to analyze and improve vocabulary adequacy of field-reviewed textbooks for 1st and 2nd grade elementary school mathematics according to the 2022 revised curriculum (2022 개정 교육과정에 따른 초등학교 1~2학년 수학 교과서 현장검토본의 어휘 적정성 분석 및 개선 연구)

  • Lee, Dae Hyun;Kwon, Misun;Lee, Mi Jin;Sung, Chang-Geun
    • Education of Primary School Mathematics
    • /
    • v.27 no.1
    • /
    • pp.75-90
    • /
    • 2024
  • This study analyzed the vocabularies presented in the 1st to 2nd grade elementary school mathematics field review textbook according to the 2022 revised curriculum using a 9th grade vocabulary system and improved them. The result of the analysis shows that the frequency of vocabulary that was not appropriate for the students' level was found to be 6.67% in the first semester of the first year and 12.17% in the second semester of the first year. For the first semester of the second year, it was 11.73%, and for the second semester of the second year, it was 14.19%. This shows that the frequency of vocabulary that may be difficult for students gradually increases. Based on the analysis results, vocabulary that had a high difficulty level but was not essential in the textbook was deleted, and essential vocabulary or vocabulary that was difficult for students was presented with pictures added or revised in more detail. In addition, words that can be modified with similar words with low lexical difficulty were replaced and presented. In this way, research on vocabulary difficulty can identify aspects of vocabulary used in textbooks and can help develop high-quality textbooks by appropriately modifying vocabulary for effective mathematics learning.

Disease-Related Vocubulary and its translingual practice in Late 19th to Early 20th century (19세기 말 20세기 초 질병 어휘와 언어횡단적 실천)

  • Lee, Eunryoung
    • Journal of Sasang Constitutional Medicine
    • /
    • v.31 no.1
    • /
    • pp.65-78
    • /
    • 2019
  • Objectives This study aims to investigate how the Korean disease-related vocabulary is established or changed when it is translated into French or English. Through this, we examine changes in the meaning of diseases and the ecosystem of disease-related vocabulary in transition period of $19^{th}$ to $20^{th}$ century. Methods Korean disease-related vocabulary are extracted from a total of 148,000 Korean headwords included in our corpus of three bilingual dictionaries. Among them, the scope of analyisis is limited to group of vocabularies that include a high frequency words, disease(病) and symptom(症). Results The first type of change is the emergence of a neologism. In this case, coexistence of existing vocabulary and new words is observed. The second change is the appearance of loan words written in Hangul. The third is the case where the interpretation of meaning is changed while maintaining the word form. Finally, the fourth change is that the orthographic variants are displayed while maintaining the meaning of the existing vocabulary. Discussion Disease-related vocabulary increased greatly between 1897 and 1931. The increasing factor of vocabulary was the emergence of coined words, compound words and the influx of foreign words. The Korean language and the Western language made a new lexical form in order to introduce a new unknown concept to the Korean. We could also confirm that the way in which English word expanded its semantic field by modifying the way of representing the meaning of Korean Disease-related vocabulary.

A Comparison of Korean EFL Learners' Oral and Written Productions

  • Lee, Eun-Ha
    • English Language & Literature Teaching
    • /
    • v.12 no.2
    • /
    • pp.61-85
    • /
    • 2006
  • The purpose of the present study is to compare Korean EFL learners' speech corpus (i.e. oral productions) with their composition corpus (i.e. written productions). Four college students participated in the study. The composition corpus was collected through a writing assignment, and the speech corpus was gathered by audio-taping their oral presentations. The results of the data analysis indicate that (i) As for error frequency, young adult low-intermediate Korean EFL learners showed high frequency in determiners (mostly, indefinite articles), vocabulary (mostly, semantic errors), and prepositions. The frequency order did not show much difference between the speech corpus and the composition corpus; and (ii) When comparing the oral productions with the written productions, there were not many differences between them in terms of the contents, a style (i.e., colloquial vs. literary), vocabulary selection, and error types and frequency. Therefore, it is assumed that the proficiency in oral presentation of EFL learners at this learning stage heavily depends on how much/how well they are able to write. In other words, EFL learners' writing and speaking skills are closely co-related. It implies that the teacher does not need to separate teaching how to speak from teaching how to write. The teacher may use the same methods or strategies to help the learners improve their English speaking and writing skills. Furthermore, it will be more effective to teach writing before speaking since they have more opportunities to write than speak in the EFL contexts.

  • PDF

A Diachronic Lexical Analysis of the North Korean English Textbooks (북한 영어 교과서 어휘의 통시적 분석)

  • Kim, Jiyoung;Lee, Je-Young;Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.4
    • /
    • pp.331-341
    • /
    • 2017
  • This paper aims to analyze English vocabulary of the North Korean textbooks diachronically using the constructed English textbook corpus. The North Korea English textbooks attained from Information Center on North Korea of the Ministry of Unification are divided into before and after Kim Jong-Il era for the year of 1996 in which the curriculum revision has been conducted. They are stored as text files to analyse vocabularies using WordSmith Tools 7.0. The vocabulary size of the revised textbooks increased after the curriculum reorganization, but the number of vocabulary types and vocabulary diversity decreased. After the curriculum revision, it was found that lots of vocabulary related to the establishment of the Kim Jong-Il system appeared as the keyword. It was also found that some vocabularies reflected the economic and social life of North Korea. In addition, through comparison of the 100 high-frequency word list and keywords, it can be concluded that the vocabulary of the English textbooks of North Korea is gradually changing into communicative contents from contents related with written language.

Analysis on the Use of Picture and Letter Used in the Books of English Vocabulary for Children (아동영문어휘책에 제시된 그림과 문자의 사용에 대한 분석)

  • Lee, Mi-Young
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.1
    • /
    • pp.150-157
    • /
    • 2014
  • This thesis intends to grasp the degree of utilization of visual images by understanding the relational properties between picture and letter and considering the children as users, through the analysis of currently published books of English vocabulary for children. Accordingly, the types of picture used in the books of English vocabulary for children, the degree of utilization of picture, combination types of picture and letter, and semantic consistency of picture and letter are reviewed. As a result of analysis, the degree of utilization of picture is high in general, in order of illustration, cartoon, and the mix of illustration and cartoon. In the combination form of picture and letter, the degree of utilization appears in order of picture plus vocabulary, letters without illustration, and pictorial symbol. In particular, the higher semantic consistency of picture and letter, it is effective in learning, however, semantic consistency is low, generally. Pictorial symbol type shows the frequency of the highest combination type in the five groups of higher semantic consistency. In conclusion, the presented types of picture and letter, shown in the currently published books of English vocabulary for children, are similar types by the publishing companies, thus, effective design research should be required based on diverse levels of children.

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

Relative Difficulty of Various English Writings by Fuzzy Reasoning and Its Application to Selecting Teaching Materials

  • Ban, Hiromi;Dederick, Toby;Nambo, Hidetaka;Oyabu, Takashi
    • Industrial Engineering and Management Systems
    • /
    • v.3 no.1
    • /
    • pp.85-91
    • /
    • 2004
  • The writing styles of TIME and Newsweek are analyzed using a specially developed linguistic program. These two news magazines were chosen because of their wide popularity. As for the results, it became obvious that both the frequency curve of words and that of characters have not changed for the past 60 years. Also, we have found that the frequency curves have some inflection points and that the genre of English writings can be identified by these points. After counting the percentage of required vocabulary for junior high school students and high school students in English writings, we can derive the relative difficulties of them using fuzzy reasoning. Fuzzy rules are constructed using features of the characteristic curves. We feel it would be a good guide index when selecting textbooks or supplementary readers.