• Title/Summary/Keyword: word length

Search Result 229, Processing Time 0.023 seconds

A Segmentation-Based HMM and MLP Hybrid Classifier for English Legal Word Recognition (분할기반 은닉 마르코프 모델과 다층 퍼셉트론 결합 영문수표필기단어 인식시스템)

  • 김계경;김진호;박희주
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.3
    • /
    • pp.200-207
    • /
    • 2001
  • In this paper, we propose an HMM(Hidden Markov modeJ)-MLP(Multi-layer perceptron) hybrid model for recognizing legal words on the English bank check. We adopt an explicit segmentation-based word level architecture to implement an HMM engine with nonscaled and non-normalized symbol vectors. We also introduce an MLP for implicit segmentation-based word recognition. The final recognition model consists of a hybrid combination of the HMM and MLP with a new hybrid probability measure. The main contributions of this model are a novel design of the segmentation-based variable length HMMs and an efficient method of combining two heterogeneous recognition engines. ExperimenLs have been conducted using the legal word database of CENPARMI with encouraging results.

  • PDF

The Effects of Korean Lexical Characteristics on Memory Span (한국어 어휘특성들이 기억폭에 미치는 효과)

  • Park Tae-Jin;Park Sun-Hee;Kim Tae-Ho
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.1
    • /
    • pp.15-27
    • /
    • 2006
  • The effects of the number of Hangul syllable, the nunber/location of batchim in a Hangul word, and compound/noncompound Hangul word on memory span were examined. The results were that (1) the more syllables a word had, the lower us memory span was, (2) the more batchims a two-syllable word had, the lower its memory span was (Korean batchim effect on memory span), (3) noncompound word had higher memory span than compound word. The reading speed of above mentioned words was measured and the results were that (1) the more syllables a word had, the slower its reading speed was, (2) but the reading speed of a two-syllable word was forest when it had a batchim on second syllable than when it had no batchim or had a batchim on first syllable or batchims on both syllables (Korean ending batchim effect on reading speed), (3) noncompound word was read faster thu compound word. Korean ending batchim effect on reading speed was not compatible with the explanation by articulatory loop bur compatible with the explanation by visual cache where the orthographic information was represented. The results suggest that memory span was influenced nor only by phonological information but also by orthographic information.

  • PDF

HMM-based Korean Named Entity Recognition (HMM에 기반한 한국어 개체명 인식)

  • Hwang, Yi-Gyu;Yun, Bo-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.229-236
    • /
    • 2003
  • Named entity recognition is the process indispensable to question answering and information extraction systems. This paper presents an HMM based named entity (m) recognition method using the construction principles of compound words. In Korean, many named entities can be decomposed into more than one word. Moreover, there are contextual relationships among nouns in an NE, and among an NE and its surrounding words. In this paper, we classify words into a word as an NE in itself, a word in an NE, and/or a word adjacent to an n, and train an HMM based on NE-related word types and parts of speech. Proposed named entity recognition (NER) system uses trigram model of HMM for considering variable length of NEs. However, the trigram model of HMM has a serious data sparseness problem. In order to solve the problem, we use multi-level back-offs. Experimental results show that our NER system can achieve an F-measure of 87.6% in the economic articles.

Channel estimation scheme of terrestrial DTV transmission employing unique-word based SC-FDE (Unique-word 채용한 SC-FDE 기반 지상파 DTV 전송의 채널 추정 기법)

  • Shin, Dong-Chul;Kim, Jae-Kil;Ahn, Jae-Min
    • Journal of Broadcast Engineering
    • /
    • v.16 no.2
    • /
    • pp.207-215
    • /
    • 2011
  • A signal passed through multi-path channel suffers ISI(Inter-Symbol Interference) and severe distortions caused by channel delay spread and noise components at the SC-FDE(Single Carrier with Frequency Domain Equalizer) transmission. Conventional UW(Unique-Word) based SC-FDE iterative channel estimation improves channel estimation performance by smoothing estimated CIR(Channel Impulse Response) of the noise components outside the channel length at time domain and restoring the broken cyclic property through UW reconstruction. In this paper, we propose channel estimation scheme through noise suppression within channel length. To suppress the noise, we estimate noise standard deviation as estimated CIR of the noise components outside the channel length and make criteria of the noise standard deviation gain that doesn't affect the original signal samples. When estimated CIR samples within channel length are less than the criteria value using the noise standard deviation and gain, the noise components are removed. Simulation results show that the proposed channel estimation scheme brings good channel MSE(Mean Square Error) and good BER(Bit Error Rate) performance.

Ordering a Left-branching Language: Heaviness vs. Givenness

  • Choi, Hye-Won
    • Language and Information
    • /
    • v.13 no.1
    • /
    • pp.39-56
    • /
    • 2009
  • This paper investigates ordering alternation phenomena in Korean using the dative construction data from Sejong Corpus of Modern Korean (Kim, 2000). The paper first shows that syntactic weight and information structure are distinct and independent factors that influence word order in Korean. Moreover, it reveals that heaviness and givenness compete each other and exert diverging effects on word order, which contrasts the converging effects of these factors shown in word orders of right-branching languages like English. The typological variation of syntactic weight effect poses interesting theoretical and empirical questions, which are discussed in relation to processing efficiency in ordering.

  • PDF

An Analysis on Strategies and Errors in Word Problems of Linear Equation for Middle School Students (중학생들의 일차 방정식에 관한 문장제 해결 전략 및 오류 분석)

  • 이정은;김원경
    • The Mathematical Education
    • /
    • v.38 no.1
    • /
    • pp.77-85
    • /
    • 1999
  • In this paper, we analyze strategies and error patterns in solving word problems of linear equation for middle school students. From a test conducted to the sampled 106 second grade middle school students, we obtain the following results: (1)The most difficult types of word problem are velosity and density related problems. The second one is length related problems and the easist one is number related problems. (2)Regardless of the types of word problem, the most familiar strategy is the constructing algebraic equations. However, the most successful strategy is the trial and error. (3)Most likely error patterns are the use of inadequate formulas and wrong trial and errors. Based on these results, a teaching program with various schema is developed and shown to be effective for mid level students in classroom.

  • PDF

Integrated Char-Word Embedding on Chinese NER using Transformer (트랜스포머를 이용한 중국어 NER 관련 문자와 단어 통합 임배딩)

  • Jin, ChunGuang;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.415-417
    • /
    • 2021
  • Since the words and words in Chinese sentences are continuous and the length of vocabulary is huge, Chinese NER(Named Entity Recognition) always based on character representation. In recent years, many Chinese research has been reconsidered how to integrate the word information into the Chinese NER model. However, the traditional sequence model has complex structure, the slow inference speed, and an additional dictionary information is needed, which is difficult to implement in the industry. The approach in this paper has the state of the art and parallelizable, which is integrated the char-word embeddings, so that the model learns word information. The proposed model is easy to implement, and outperforms traditional model in terms of speed and efficiency, which is improved f1-score on two dataset.

Effects of Korean Syllable Structure on English Pronunciation

  • Lee, Mi-Hyun;Ryu, Hee-Kwan
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.364-364
    • /
    • 2000
  • It has been widely discussed in phonology that syllable structure of mother tongue influences one's acquisition of foreign language. However, the topic was hardly examined experimentally. So, we investigated effects of Korean syllable structure when Korean speakers pronounce English words, especially focusing on consonant strings that are not allowed in Korean. In the experiment, all the subjects are divided into 3 groups, that is, native, experienced, and inexperienced speakers. Native group consists of 1 male English native speaker. Experienced and inexperienced are each composed of 3 male Korean speakers. These 2 groups are divided by the length of residence in the country using English as a native language. 41 mono-syllable words are prepared considering the position (onset vs. coda), characteristic (stops, affricates, fricatives), and number of consonant. Then, the length of the consonant cluster is measured. To eliminate tempo effect, the measured length is normalized using the length of the word 'say' in the carrier sentence. Measurement of consonant cluster is the relative time period between the initiation of energy (onset I coda) which is acoustically representative of noise (consonant portion) and voicing. bar (vowel portion) in a syllable. Statistical method is used to estimate the differences among 3 groups. For each word, analysis of variance (ANDY A) and Post Hoc tests are carried out.

  • PDF

A Study on Phon Call Big Data Analytics (전화통화 빅데이터 분석에 관한 연구)

  • Kim, Jeongrae;Jeong, Chanki
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.3
    • /
    • pp.387-397
    • /
    • 2013
  • This paper proposes an approach to big data analytics for phon call data. The analytical models for phon call data is composed of the PVPF (Parallel Variable-length Phrase Finding) algorithm for identifying verbal phrases of natural language and the word count algorithm for measuring the usage frequency of keywords. In the proposed model, we identify words using the PVPF algorithm, and measure the usage frequency of the identified words using word count algorithm in MapReduce. The results can be interpreted from various viewpoints. We design and implement the model based HDFS (Hadoop Distributed File System), verify the proposed approach through a case study of phon call data. So we extract useful results through analysis of keyword correlation and usage frequency.

Text Compression by Word and Etymology Dictionary (단어, 어원 Dictionary에 의한 Text 압축)

  • Lee, Jae-Young;Sung, Koeng-Mo;Lee, Chong-Kak
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.607-611
    • /
    • 1988
  • In this paper, a text compression method is proposed which is capable of reducing mean bits per character by word and etymology dictionary. This dictionary consists of 256 words and 512 etymologies with 10 bits length codes. Using this dictionary, a mean rate of 3.44 bits per character is achieved.

  • PDF