• Title/Summary/Keyword: word coverage

Search Result 24, Processing Time 0.021 seconds

Vocabulary Analysis of Listening and Reading Texts in 2020 EBS-linked Textbooks and CSAT (2020년 EBS 연계교재와 대학수학능력시험의 듣기 및 읽기 어휘 분석)

  • Kang, Dongho
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.679-687
    • /
    • 2020
  • The present study aims to investigate lexical coverage of BNC (British National Corpus) word lists and 2015 Basic Vocabulary of Ministry of Education in 2020 EBS-linked textbooks and CSAT. For the data analysis, AntWordProfiler was used to find lexical coverage and frequency. The findings showed that Students can understand 95% of the tokens with a vocabulary of BNC 3,000 and 4,000 word-families in 2020 EBS-linked listening and reading books respectively. 98% can be understood with 4,000 word-families in the EBS-linked listening book while the same lexical coverage can be covered with 8,000 word-families in the EBS-linked reading textbook. By the way, 95% of the tokens can be understood with 2,000 and 4,000 word-families in 2020 CSAT listening and reading tests respectively, while 98% requires 4,000 and 7,000 word-families in the 2020 listening and reading tests respectively. In summary, students should understand more words in 2020 EBS-linked textbooks than in 2020 CSAT tests confirming Kim's (2016) findings. In summary, students should understand more words in 2020 EBS-linked textbooks than in 2020 CSAT tests.

Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Part-of-Speech Tagged Corpus (품사 부착 말뭉치를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선)

  • Lim, Min-Kyu;Kim, Kwang-Ho;Kim, Ji-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.181-193
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using a part-of-speech (POS) tagged corpus. We investigate 152 POS tags defined in Lancaster-Oslo-Bergen (LOB) corpus and word-POS tag pairs. We derive a new vocabulary through word addition. Words paired with some POS tags have to be included in vocabularies with any size, but the vocabulary inclusion of words paired with other POS tags varies based on the target size of vocabulary. The 152 POS tags are categorized according to whether the word addition is dependent of the size of the vocabulary. Using expert knowledge, we classify POS tags first, and then apply different ways of word addition based on the POS tags paired with the words. The performance of the proposed method is measured in terms of coverage and is compared with those of vocabularies with the same size (5,000 words) derived from frequency lists. The coverage of the proposed method is measured as 95.18% for the test short message service (SMS) text corpus, while those of the conventional vocabularies cover only 93.19% and 91.82% of words appeared in the same SMS text corpus.

  • PDF

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Knowledgebase (지식베이스를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선)

  • Kim, Kwang-Ho;Lim, Min-Kyu;Kim, Ji-Hwan
    • MALSORI
    • /
    • v.68
    • /
    • pp.115-126
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using knowledgebase. A vocabulary in CSR is normally derived from a word frequency list. Therefore, the vocabulary coverage is dependent on a corpus. In the previous research, we presented an improved way of vocabulary generation using part-of-speech (POS) tagged corpus. We analyzed all words paired with 101 among 152 POS tags and decided on a set of words which have to be included in vocabularies of any size. However, for the other 51 POS tags (e.g. nouns, verbs), the vocabulary inclusion of words paired with such POS tags are still based on word frequency counted on a corpus. In this paper, we propose a corpus independent word inclusion method for noun-, verb-, and named entity(NE)-related POS tags using knowledgebase. For noun-related POS tags, we generate synonym groups and analyze their relative importance using Google search. Then, we categorize verbs by lemma and analyze relative importance of each lemma from a pre-analyzed statistic for verbs. We determine the inclusion order of NEs through Google search. The proposed method shows better coverage for the test short message service (SMS) text corpus.

  • PDF

A Test Algorithm for Word-Line and Bit-line Sensitive Faults in High-Density Memories (고집적 메모리에서 Word-Line과 Bit-Line에 민감한 고장을 위한 테스트 알고리즘)

  • 강동철;양명국;조상복
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.4
    • /
    • pp.74-84
    • /
    • 2003
  • Conventional test algorithms do not effectively detect faults by word-line and bit-line coupling noise resulting from the increase of the density of memories. In this paper, the possibility of faults caused by word-line coupling noise is shown, and new fault model, WLSFs(Word-Line Sensitive Fault) is proposed. We also introduce the algorithm considering both word-line and bit-line coupling noise simultaneously. The algorithm increases probability of faults which means improved fault coverage and more effective test algorithm, compared to conventional ones. The proposed algorithm can also cover conventional basic faults which are stuck-at faults, transition faults and coupling faults within a five-cell physical neighborhood.

A Corpus-based English Syntax Academic Word List Building and its Lexical Profile Analysis (코퍼스 기반 영어 통사론 학술 어휘목록 구축 및 어휘 분포 분석)

  • Lee, Hye-Jin;Lee, Je-Young
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.132-139
    • /
    • 2021
  • This corpus-driven research expounded the compilation of the most frequently occurring academic words in the domain of syntax and compared the extracted wordlist with Academic Word List(AWL) of Coxhead(2000) and General Service List(GSL) of West(1953) to examine their distribution and coverage within the syntax corpus. A specialized 546,074 token corpus, composed of widely used must-read syntax textbooks for English education majors, was loaded into and analyzed with AntWordProfiler 1.4.1. Under the parameter of lexical frequency, the analysis identified 288(50.5%) AWL word forms, appeared 16 times or more, as well as 218(38.2%) AWL items, occurred not exceeding 15 times. The analysis also indicated that the coverage of AWL and GSL accounted for 9.19% and 78.92% respectively and the combination of GSL and AWL amounted to 88.11% of all tokens. Given that AWL can be instrumental in serving broad disciplinary needs, this study highlighted the necessity to compile the domain-specific AWL as a lexical repertoire to promote academic literacy and competence.

Parallel Testing Circuits with Versatile Data Patterns for SOP Image SRAM Buffer (SOP Image SRAM Buffer용 다양한 데이터 패턴 병렬 테스트 회로)

  • Jeong, Kyu-Ho;You, Jae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.9
    • /
    • pp.14-24
    • /
    • 2009
  • Memory cell array and peripheral circuits are designed for system on panel style frame buffer. Moreover, a parallel test methodology to test multiple blocks of memory cells is proposed to overcome low yield of system on panel processing technologies. It is capable of faster fault detection compared to conventional memory tests and also applicable to the tests of various embedded memories and conventional SRAMs. The various patterns of conventional test vectors can be used to enhance fault coverage. The proposed testing method is also applicable to hierarchical bit line and divided word line, one of design trends of recent memory architectures.

Korean Part-of-Speech Tagging System Using Resolution Rules for Individual Ambiguous Word (어절별 중의성 해소 규칙을 이용한 혼합형 한국어 품사 태깅 시스템)

  • Park, Hee-Geun;Ahn, Young-Min;Seo, Young-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.6
    • /
    • pp.427-431
    • /
    • 2007
  • In this paper we describe a Korean part-of-speech tagging approach using resolution rules for individual ambiguous word and statistical information. Our tagging approach resolves lexical ambiguities by common rules, rules for individual ambiguous word, and statistical approach. Common rules are ones for idioms and phrases of common use including phrases composed of main and auxiliary verbs. We built resolution rules for each word which has several distinct morphological analysis results to enhance tagging accuracy. Each rule may have morphemes, morphological tags, and/or word senses of not only an ambiguous word itself but also words around it. Statistical approach based on HMM is then applied for ambiguous words which are not resolved by rules. Experiment shows that the part-of-speech tagging approach has high accuracy and broad coverage.

An Analysis of the Perception of News coverage about Inclusive Education Using Big Data (빅데이터를 활용한 통합교육 언론보도에 대한 인식분석)

  • Juhyang Kim;Jeongrang Kim
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.6
    • /
    • pp.543-552
    • /
    • 2022
  • This study tried to analyze the social perception of news coverage on inclusive education by using big data analysis techniques. News articles were collected according to the 5-year policy period for the development of special education, and news big data was analyzed. As a result, the frequency of media reports during the five-year policy period of special education development from 1998 in the first year to 2022 in the fifth year was steadily increased. During this period, the top topic words in news coverage changed from words conceptualizing simple definitions to words expressing the active will of students with disabilities for the actual right to education. In addition, as a result of emotional analysis of the overall keywords in the inclusive education news coverage, it was found that the positive word ratio was high. Through this study, it can be seen that interest in news coverage on inclusive education is increasing quantitatively in accordance with changes in special education policies, and the demand for inclusive education is being concreted in the direction of guaranteeing the actual right to education of students with disabilities.

Design of Data Retention Test Circuit for Large Capacity DRAMs (대용량 Dynamic RAM의 Data Retention 테스트 회로 설계)

  • 설병수;김대환;유영갑
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.30A no.9
    • /
    • pp.59-70
    • /
    • 1993
  • An efficient test method based on march test is presented to cover line leakage failures associated with bit and word lines or mega bit DRAM chips. A modified column march (Y-march) pattern is derived to improve fault coverage against the data retention failure. Time delay concept is introduced to develop a new column march test algorithm detecting various data retention failures. A built-in test circuit based on the column march pattern is designed and verified using logic simulation, confirming correct test operations.

  • PDF