• Title/Summary/Keyword: language size

Search Result 404, Processing Time 0.027 seconds

Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling (언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석)

  • Jung Eui-Jung;Lee Youngjik
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.87-90
    • /
    • 2002
  • This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

  • PDF

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

  • Atwan, Jaffar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.65-74
    • /
    • 2022
  • In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.

A Language Model based on VCCV of Sentence Speech Recognition (문장 음성 인식을 위한 VCCV기반의 언어 모델)

  • 박선희;홍광석
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2419-2422
    • /
    • 2003
  • To improve performance of sentence speech recognition systems, we need to consider perplexity of language model and the number of words of dictionary for increasing vocabulary size. In this paper, we propose a language model of VCCV units for sentence speech recognition. For this, we choose VCCV units as a processing units of language model and compare it with clauses and morphemes. Clauses and morphemes have many vocabulary and high perplexity. But VCCV units have small lexicon size and limited vocabulary. An advantage of VCCV units is low perplexity. This paper made language model using bigram about given text. We calculated perplexity of each language processing unit. The perplexity of VCCV units is lower than morpheme and clause.

  • PDF

Joint Attention and Language Development in Infants from Multi-Cultural Families (다문화 가정 유아들의 함께 주의하기와 언어발달)

  • Park, Young-Shin
    • Korean Journal of Child Studies
    • /
    • v.31 no.6
    • /
    • pp.35-50
    • /
    • 2010
  • Joint attention, language development, and the relationship between these two variables were compared in infants from multi-cultural and Korean families. Joint attention was observed in both the Early Social Communication Scale (ESCS) and in infant-mother free play. Language development was evaluated by means of the MacArthur-Bates Communicative Development Inventory-Korean. There were no group differences in initiating and responding to joint attention in ESCS. However, in infant-mother free play, joint attention episodes were less and shorter in duration with infants from multi-cultural families than in Korean infants. The size of both the expressive and receptive vocabulary was also smaller in infants from multi-cultural families than in Korean infants. In terms of Korean infants, mean duration of joint attention episodes in free play showed a significant positive correlation with the size of the expressive vocabulary and initiating joint attention in ESCS also showed a significant positive correlation with the size of receptive vocabulary. However, none of the measures of joint attention indicated a significant relationship with the size of either expressive or receptive vocabulary in infants from multi-cultural families.

Generalization of Syntax-Directed Compiling by Precedance Analyzer (씬택스 컴파일링의 연산자식 분석기를 통한 일반화)

  • Young Taik Kim
    • 전기의세계
    • /
    • v.22 no.6
    • /
    • pp.5-7
    • /
    • 1973
  • This paper describes a new technique for syntax-directed compiling of algol 60 programming language. A relatively large size of language specification is devided into two parts, one called language body and the other called language branch. The language body is compiled by the syntax-directed compiling technique and the other is compiled by the precedance method to minimize the compiling time. Also the boundary of the potions were studied during this study for the optimization.

  • PDF

Immersive Learning Technologies in English Language Teaching: A Meta-Analysis

  • Altun, Hamide Kubra;Lee, Jeongmin
    • International Journal of Contents
    • /
    • v.16 no.3
    • /
    • pp.18-32
    • /
    • 2020
  • The aim of this study was to perform a meta-analysis of the learning outcomes of immersive learning technologies in English language teaching (ELT). This study examined 12 articles, yielding a total of 20 effect sizes. The Comprehensive Meta-Analysis (CMA) program was employed for data analysis. The findings revealed that the overall effect size was 0.84, implying a large effect size. Additionally, the mean effect sizes of the dependent variables revealed a large effect size for both the cognitive and affective domains. Furthermore, the study analyzed the impact of moderator variables such as sample scale, technology type, tool type, work type, program type, duration (sessions), the degree of immersion, instructional technique, and augmented reality (AR) type. Among the moderators, the degree of immersion was found to be statistically significant. In conclusion, the study results suggested that immersive learning technologies had a positive impact on learning in ELT.

Phonological variability with consonant inventory size in late-talkers and normal children (말 늦은 아동과 일반 아동의 자음 목록 크기에 따른 음운변이성)

  • Kim, Hyejin;Lee, Ran;Lee, Eunju
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.175-181
    • /
    • 2015
  • This study aims to compare the differences between 'consonant inventory size' and 'phonological variability' in order to examine the phonological development and characteristics of the late-talkers and typically developing expressive language agematched children and to consider the correlations between them. The study participants included fifteen late-talkers and fifteen typically developing expressive language age-matched children(TED group). The results are as follows. First, as regards consonant inventory size, there was a significant difference between late-talkers and TED group. The late-talkers' consonant inventory size was less than TED group. Second, as regards phonological variability, there was a significant difference between late-talkers and TED group. The late-talkers' phonological variability was higher than TED group. Third, in the case of late-talkers, there was no significant correlation between consonant inventory size and phonological variability; however in the case of TED group, there was a significant negative correlation between consonant inventory size and phonological variability. Therefore, phonological ability should be considered in evaluation and intervention of late-talkers.

A Retrospective, Quantitative Review of the ETAK Journals

  • Lee, Eunpyo;Shin, Myeong-Hee
    • English Language & Literature Teaching
    • /
    • v.18 no.3
    • /
    • pp.135-148
    • /
    • 2012
  • This is a retrospective, quantitative review of the English Teachers Association in Korea, namely the ETAK and its journals during the period of 18 years ever since the establishment in August 1994. It examines the history of the association, its domestic and international conferences, and most importantly, its articles. The purpose was to learn how it has emerged into a full-fledged organization, what the preferred language of the article has been, how the volume size has changed, and how many foreign scholars' articles have been contributed. It also looked into the number of authors each article was written by to examine the trend of cooperative work in the field of English education. Classification of the research topic was focused on the 4 skills of the language, grammar and vocabulary, literature, linguistics and all the rest areas were categorized into others. From the results of the study, suggestions for the future ETAK in the Korean English teaching were to be given.

  • PDF

A Semi-supervised Learning of HMM to Build a POS Tagger for a Low Resourced Language

  • Pattnaik, Sagarika;Nayak, Ajit Kumar;Patnaik, Srikanta
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.4
    • /
    • pp.207-215
    • /
    • 2020
  • Part of speech (POS) tagging is an indispensable part of major NLP models. Its progress can be perceived on number of languages around the globe especially with respect to European languages. But considering Indian Languages, it has not got a major breakthrough due lack of supporting tools and resources. Particularly for Odia language it has not marked its dominancy yet. With a motive to make the language Odia fit into different NLP operations, this paper makes an attempt to develop a POS tagger for the said language on a HMM (Hidden Markov Model) platform. The tagger judiciously considers bigram HMM with dynamic Viterbi algorithm to give an output annotated text with maximum accuracy. The model is experimented on a corpus belonging to tourism domain accounting to a size of approximately 0.2 million tokens. With the proportion of training and testing as 3:1, the proposed model exhibits satisfactory result irrespective of limited training size.

A Meta-Analysis on the Effects of Activities Using Picture Books on Language Development in Young Children (그림책을 활용한 활동이 유아의 언어발달에 미치는 효과에 대한 메타분석)

  • Shim, Gyeong-Hwa;Lim, Yangmi;Park, Eun-Young
    • Korean Journal of Childcare and Education
    • /
    • v.15 no.4
    • /
    • pp.115-134
    • /
    • 2019
  • Objective: This study was aimed to analyze the effects of activities using picture books for young children's language development and to identify factors that caused differences in these effects by applying meta-analysis. Methods: We conducted a homogeneity test of effect sizes on 21 Korean studies published in academic journals from 1990 to February 2018 and calculated the effect size by applying a random effect model. Additionally, we conducted a meta-ANOVA to investigate whether the effect sizes differed by types of language development, picture book activities, and environmental variables-such as place, time, and agent. Results: The results indicated that the effect sizes of the 21 studies were heterogeneous and the total effect size was 0.90, which was significantly large according to Cohen's standard. The effect sizes also varied by types of language development, picture book activities, and environmental variables. Conclusion/Implications: To increase the effects of activities using picture books for young children's language development, this study suggested the importance of picture book activities to be integrated with other play areas, teaching methods, and other print materials for the development of literacy abilities, and the link between home and early childhood education institutions.