• Title/Summary/Keyword: word decoding

Search Result 56, Processing Time 0.023 seconds

Examining Line-breaks in Korean Language Textbooks: the Promotion of Word Spacing and Reading Skills (한국어 교재의 행 바꾸기 -띄어쓰기와 읽기 능력의 계발 -)

  • Cho, In Jung;Kim, Danbee
    • Journal of Korean language education
    • /
    • v.23 no.1
    • /
    • pp.77-100
    • /
    • 2012
  • This study investigates issues in relation to text segmenting, in particular, line breaks in Korean language textbooks. Research on L1 and L2 reading has shown that readers process texts by chunking (grouping words into phrases or meaningful syntactic units) and, therefore, phrase-cued texts are helpful for readers whose syntactic knowledge has not yet been fully developed. In other words, it would be important for language textbooks to avoid awkward syntactic divisions at the end of a line, in particular, those textbooks for beginners and intermediate level learners. According to our analysis of a number of major Korean language textbooks for beginner-level learners, however, many textbooks were found to display line-breaks of awkward syntactic division. Moreover, some textbooks displayed frequent instances where a single word (or eojeol in the case of Korean) is split between different lines. This can hamper not only learners' learning of the rules of spaces between eojeols in Korean, but also learners' development in automatic word recognition, which is an essential part of reading processes. Based on the findings of our textbook analysis and of existing research on reading, this study suggests ways to overcome awkward line-breaks in Korean language textbooks.

Analysis a LDPC code in the VDSL system (VDSL 시스템에서의 LDPC 코드 연구)

  • Joh, Kyung-Hyun;Kang, Hee-Hoon;Yi, Sang-Hoi;Na, Kuk-Hwan
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.999-1000
    • /
    • 2006
  • The LDPC Code is focusing a powerful FEC(Forward Error Correction) codes for 4G Mobile Communication system. LDPC codes are used minimizing channel errors by modeling AWGN Channel as VDSL system. The performance of LDPC code is better than that of turbo code in long code word on iterative decoding algorithm. LDPC code are encoded by sparse parity check matrix. there are decoding algorithms for a LDPC code, Bit Flipping, Message passing, Sum-Product. Because LDPC Codes use low density parity bit, mathematical complexity is low and relating processing time becomes shorten.

  • PDF

Acoustic Characteristics and Pitch Accent Realization in English Elliptical Sentences - VP-ellipsis, sluicing, gapping - (영어 생략구문의 음성적 특성과 피치악센트 실현 양상-동사구 생략, 슬루싱, 공소화를 중심으로-)

  • Kim, Hee-Sung
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.119-136
    • /
    • 2004
  • Ellipsis is the figure of speech characterized by the deliberate omission of words that are obviously understood, but that must be supplied to make a construction grammatically or semantically complete. The purpose of this study is to examine how ellipsis affects its adjacent elements acoustically and phonologically in English VP-ellipsis, sluicing and gapping. In the experiment, the realizations by English native speakers are set as the criteria for the observing point and are compared to Korean speakers' realizations. For the results, while English native speakers utilized various acoustic information such as word duration and pitch range and phonological information such as pith accent realization in order to intend the cues for decoding the missing constituent, Korean English learners relied on only duration information and could not use various information effectively.

  • PDF

A high speed huffman decoder using new ternary CAM (새로운 Ternary CAM을 이용한 고속 허프만 디코더 설계)

  • 이광진;김상훈;이주석;박노경;차균현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.7
    • /
    • pp.1716-1725
    • /
    • 1996
  • In this paper, the huffman decoder which is a part of the decoder in JPEG standard format is designed by using a new Ternary CAM. First, the 256 word * 16 bit-size new bit-word all parallel Ternary CAM system is designed and verified using SPICE and CADENCE Verilog-XL, and then the verified novel Ternary CAM is applied to the new huffman decoder architecture of JPEG. So the performnce of the designed CAM cell and it's block is verified. The new Ternary CAM has various applications because it has search data mask and storing data mask function, which enable bit-wise search and don't care state storing. When the CAM is used for huffman look-up table in huffman decoder, the CAM is partitioned according to the decoding symbol frequency. The scheme of partitioning CAM for huffman table overcomes the drawbacks of all-parallel CAM with much power and load. So operation speed and power consumption are improved.

  • PDF

Implementation of a 16-Bit Fixed-Point MPEG-2/4 AAC Decoder for Mobile Audio Applications

  • Kim, Byoung-Eul;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.3C
    • /
    • pp.240-246
    • /
    • 2008
  • An MPEG-2/4 AAC decoder on 16-bit fixed-point processor is presented in this paper. To meet audio quality criteria, despite small word length, special design methods for 16-bit foxed-point AAC decoder were devised. This paper presents particular algorithms for 16-bit AAC decoding. We have implemented an efficient AAC decoder using the proposed algorithms. Audio contents can be replayed in the decoder without quality degradation.

Encoding and Decoding using Cyclic Product Code (순환곱셈코드를 이용한 인코딩 및 디코딩)

  • 김신령;강창언
    • Proceedings of the Korean Institute of Communication Sciences Conference
    • /
    • 1984.10a
    • /
    • pp.11-14
    • /
    • 1984
  • When the received sequence is not identical to the transmitted code word due to the channel nose effect, it is necessary to detect and correct errors. In this paper, it is shown how to construct the encoder and the decoder using cyclic product codes. this system combines random and burst error correction and is easily decodable. Performance has been obtained as expected.

  • PDF

Phoneme distribution and syllable structure of entry words in the CMU English Pronouncing Dictionary

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.11-16
    • /
    • 2016
  • This study explores the phoneme distribution and syllable structure of entry words in the CMU English Pronouncing Dictionary to provide phoneticians and linguists with fundamental phonetic data on English word components. Entry words in the dictionary file were syllabified using an R script and examined to obtain the following results: First, English words preferred consonants to vowels in their word components. In addition, monophthongs occurred much more frequently than diphthongs. When all consonants were categorized by manner and place, the distribution indicated the frequency order of stops, fricatives, and nasals according to manner and that of alveolars, bilabials and velars according to place. These results were comparable to the results obtained from the Buckeye Corpus (Yang, 2012). Second, from the analysis of syllable structure, two-syllable words were most favored, followed by three- and one-syllable words. Of the words in the dictionary, 92.7% consisted of one, two or three syllables. This result may be related to human memory or decoding time. Third, the English words tended to exhibit discord between onset and coda consonants and between adjacent vowels. Dissimilarity between the last onset and the first coda was found in 93.3% of the syllables, while 91.6% of the adjacent vowels were different. From the results above, the author concludes that an analysis of the phonetic symbols in a dictionary may lead to a deeper understanding of English word structures and components.

LSTM based sequence-to-sequence Model for Korean Automatic Word-spacing (LSTM 기반의 sequence-to-sequence 모델을 이용한 한글 자동 띄어쓰기)

  • Lee, Tae Seok;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.7 no.4
    • /
    • pp.17-23
    • /
    • 2018
  • We proposed a LSTM-based RNN model that can effectively perform the automatic spacing characteristics. For those long or noisy sentences which are known to be difficult to handle within Neural Network Learning, we defined a proper input data format and decoding data format, and added dropout, bidirectional multi-layer LSTM, layer normalization, and attention mechanism to improve the performance. Despite of the fact that Sejong corpus contains some spacing errors, a noise-robust learning model developed in this study with no overfitting through a dropout method helped training and returned meaningful results of Korean word spacing and its patterns. The experimental results showed that the performance of LSTM sequence-to-sequence model is 0.94 in F1-measure, which is better than the rule-based deep-learning method of GRU-CRF.

Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition (타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법)

  • Kim, Min-A;Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Cho, Sung-Eui;Lee, Seong-Ro
    • MALSORI
    • /
    • no.65
    • /
    • pp.93-103
    • /
    • 2008
  • In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

  • PDF

A Method for Automatic Detection of Character Encoding of Multi Language Document File (다중 언어로 작성된 문서 파일에 적용된 문자 인코딩 자동 인식 기법)

  • Seo, Min Ji;Kim, Myung Ho
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.4
    • /
    • pp.170-177
    • /
    • 2016
  • Character encoding is a method for changing a document to a binary document file using the code table for storage in a computer. When people decode a binary document file in a computer to be read, they must know the code table applied to the file at the encoding stage in order to get the original document. Identifying the code table used for encoding the file is thus an essential part of decoding. In this paper, we propose a method for detecting the character code of the given binary document file automatically. The method uses many techniques to increase the detection rate, such as a character code range detection, escape character detection, character code characteristic detection, and commonly used word detection. The commonly used word detection method uses multiple word database, which means this method can achieve a much higher detection rate for multi-language files as compared with other methods. If the proportion of language is 20% less than in the document, the conventional method has about 50% encoding recognition. In the case of the proposed method, regardless of the proportion of language, there is up to 96% encoding recognition.