• 제목/요약/키워드: unknown word

검색결과 70건 처리시간 0.023초

A comparison of phonological error patterns in the single word and spontaneous speech of children with speech sound disorders (말소리장애 아동의 단어와 자발화 문맥의 음운오류패턴 비교)

  • Park, kayeon;Kim, Soo-Jin
    • Phonetics and Speech Sciences
    • /
    • 제7권3호
    • /
    • pp.165-173
    • /
    • 2015
  • This study was aim to compare the phonological error patterns and PCC(Percentage of Correct Consonants) derived from the single word and spontaneous speech contexts of the speech sound disorders with unknown origin(SSD). The present study suggest that the development phonological error patterns and non-developmental error patterns of the target children, in according to speech context. The subjects were 15 children with SSD up to the age of 5 from 3 years of age. This research use 37 words of APAC(Assessment of Phonology & Articulation for Children) in the single word context and 100 eojeol in the spontaneous speech context. There was no difference of PCC between the single word and the spontaneous speech contexts. Significantly different developmental phonological error patterns between the single word and the spontaneous speech contexts were syllable deletion, word-medial onset deletion, liquid deletion, gliding, affrication, fricative other error, tensing, regressive assimilation. Significantly different non-developmental phonological error patterns were backing, addtion of phoneme, aspirating. The study showed that there was no difference of PCC between elicited single word and spontaneous conversational context. And there were some different phonological error patterns derived from the two contexts of the speech sound disorders. The more important interventions target is the error patterns of the spontaneous speech contexts for the immediate generalization and rising overall intelligibility.

Ternary Decomposition and Dictionary Extension for Khmer Word Segmentation

  • Sung, Thaileang;Hwang, Insoo
    • Journal of Information Technology Applications and Management
    • /
    • 제23권2호
    • /
    • pp.11-28
    • /
    • 2016
  • In this paper, we proposed a dictionary extension and a ternary decomposition technique to improve the effectiveness of Khmer word segmentation. Most word segmentation approaches depend on a dictionary. However, the dictionary being used is not fully reliable and cannot cover all the words of the Khmer language. This causes an issue of unknown words or out-of-vocabulary words. Our approach is to extend the original dictionary to be more reliable with new words. In addition, we use ternary decomposition for the segmentation process. In this research, we also introduced the invisible space of the Khmer Unicode (char\u200B) in order to segment our training corpus. With our segmentation algorithm, based on ternary decomposition and invisible space, we can extract new words from our training text and then input the new words into the dictionary. We used an extended wordlist and a segmentation algorithm regardless of the invisible space to test an unannotated text. Our results remarkably outperformed other approaches. We have achieved 88.8%, 91.8% and 90.6% rates of precision, recall and F-measurement.

The Comparison of Linguistic and Psychological Characteristics in the Writing of Korean and Korean-Chinese Adolescents (한국 및 중국 조선족 청소년의 글에 나타난 언어학적, 심리학적 특성 비교)

  • Park, Min-Jung;Park, Hyewon
    • Korean Journal of Child Studies
    • /
    • 제29권3호
    • /
    • pp.357-373
    • /
    • 2008
  • This study compared the writing of Korean and Korean-Chinese adolescents using K-LIWC (Korean-Linguistic Inquiry Word Count Lee & Yoon, 2005). Three hundred ten (70 : Ulsan, Korea 90 : Yanji, and 150 : Shenyang, China) middle school students wrote a self introductory essay for unknown friends. K-LIWC yielded counts and percentages of word categories using the parts of speech of the Korean language and psychological (emotional, cognitive, sensory/perceptual, social, physical/functional and metaphysical processes) criteria. Results showed that use of pre-noun and present tense correlated with negative mood of the subjects. The writings of Korean-Chinese in Shenyang showed the most negative emotions among the three groups. This was interpreted to be a reflection of better protective factors for Korean-Chinese adolescents in Yanji compared with Shenyang.

  • PDF

A Review of Timing Factors in Speech

  • Yun, Il-Sung
    • Speech Sciences
    • /
    • 제7권3호
    • /
    • pp.87-98
    • /
    • 2000
  • Timing in speech is determined by many factors. In this paper, we introduce and discuss some factors that have generally been regarded as important in speech timing. They include stress, syllable structure, consonant insertion or deletion, tempo, lengthening at clause, phrase and word boundaries, preconsonantal vowel shortening, and compensation between segments or within phonological units (e.g., word, foot), compression due to the increase of syllables in word or foot level, etc. and each of them may playa crucial role in the structuring of speech timing in a language. But some of these timing factors must interact with each other rather than be independent and the effects of each factor on speech timing will vary from language to language. On the other hand, there could well be many other factors unknown so far. Finding out and investigating new timing factors and reinterpreting the already-known timing factors should enhance our understanding of timing structures in a given language or languages.

  • PDF

DroidVecDeep: Android Malware Detection Based on Word2Vec and Deep Belief Network

  • Chen, Tieming;Mao, Qingyu;Lv, Mingqi;Cheng, Hongbing;Li, Yinglong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권4호
    • /
    • pp.2180-2197
    • /
    • 2019
  • With the proliferation of the Android malicious applications, malware becomes more capable of hiding or confusing its malicious intent through the use of code obfuscation, which has significantly weaken the effectiveness of the conventional defense mechanisms. Therefore, in order to effectively detect unknown malicious applications on the Android platform, we propose DroidVecDeep, an Android malware detection method using deep learning technique. First, we extract various features and rank them using Mean Decrease Impurity. Second, we transform the features into compact vectors based on word2vec. Finally, we train the classifier based on deep learning model. A comprehensive experimental study on a real sample collection was performed to compare various malware detection approaches. Experimental results demonstrate that the proposed method outperforms other Android malware detection techniques.

Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words

  • Lee, Tae-Seok;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Information Processing Systems
    • /
    • 제18권3호
    • /
    • pp.344-358
    • /
    • 2022
  • Text summarization is the task of producing a shorter version of a long document while accurately preserving the main contents of the original text. Abstractive summarization generates novel words and phrases using a language generation method through text transformation and prior-embedded word information. However, newly coined words or out-of-vocabulary words decrease the performance of automatic summarization because they are not pre-trained in the machine learning process. In this study, we demonstrated an improvement in summarization quality through the contextualized embedding of BERT with out-of-vocabulary masking. In addition, explicitly providing precise pointing and an optional copy instruction along with BERT embedding, we achieved an increased accuracy than the baseline model. The recall-based word-generation metric ROUGE-1 score was 55.11 and the word-order-based ROUGE-L score was 39.65.

Phase-based Model Using Web Documents for Korean Unknown Word Recognition (웹문서를 이용한 단계별 한국어 미등록어 인식 모델)

  • Park, So-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • 제13권9호
    • /
    • pp.1898-1904
    • /
    • 2009
  • Recently, real documents such as newspapers as well as blogs include newly coined words such as "Wikipedia". However, most previous information processing technologies cannot deal with these newly coined words because they construct their dictionaries based on materials acquired during system development. In this paper, we propose a model to automatically recognize Korean unknown words excluded from the previously constructed dictionary. The proposed model consists of an unknown noun recognition phase based on full text analysis, an unknown verb recognition phase based on web document frequency, and an unknown noun recognition phase based on web document frequency. The proposed model can recognize accurately the unknown words occurred once and again in a document by the full text analysis. Also, the proposed model can recognize broadly the unknown words occurred once in the document by using web documents. Besides, the proposed model fan recognize both a Korean unknown verb, which syllables can be changed from its base form by inflection, and a Korean unknown noun, which syllables are not changed in any eojeol. Experimental results shows that the proposed model improves precision 1.01% and recall 8.50% as compared with a previous model.

Korean Unknown-noun Recognition using Strings Following Nouns in Words (명사후문자열을 이용한 미등록어 인식)

  • Park, Ki-Tak;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • 제17권4호
    • /
    • pp.576-584
    • /
    • 2017
  • Unknown nouns which are not in a dictionary make problems not only morphological analysis but also almost all natural language processing area. This paper describes a recognition method for Korean unknown nouns using strings following nouns such as postposition, suffix and postposition, suffix and eomi, etc. We collect and sort words including nouns from documents and divide a word including unknown noun into two parts, candidate noun and string following the noun, by finding same prefix morphemes from more than two unknown words. We use information of strings following nouns extracted from Sejong corpus and decide unknown noun finally. We obtain 99.64% precision and 99.46% recall for unknown nouns occurred more than two forms in news of two portal sites.

Speaker-adaptive Word Recognition Using Mapped Membership Function (사상멤버쉽함수에 의한 화자적응 단어인식)

  • Lee, Ki-Yeong;Choi, Kap-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • 제11권3호
    • /
    • pp.40-52
    • /
    • 1992
  • In this paper, we propose the speaker adaptive word recognition method using a mapped membership function, in order to absorb a fluctuation owing to personal difference which is a problem of speaker independent speech recognition. In the training procedure of this method, the mapped membership function is made with the fuzzy theory introducded into a mapped codebook, between an unknown speaker's spectrum pattern and a standard speaker's one. In the recognition procedure, an input pattern of an unknown speaker is reconstructed to the pattern which is adapted to that of a standard speaker by the mapped membership function. To show the validity of this method, word recognition experiments are carried out using 28 DDD area names. The recognition rate of the conventional speaker-adaptive method using a mapped codebook by VQ is 64.9[%], and that made by a fuzzy VQ is 76.2[%]. Throughout the experiment using a mapped membership function, we can achieve 95.4[%] recognition rate. This shows that our proposed method is more excellent in recognition performance. Moreover, this method doesn't need an iterative training procedure to make the mapped membership function, and memory capacity and computation requirements for this method are reduced to 1/30 and 1/500 time of those for the conventional method using a mapped codebook, respectively.

  • PDF

A Study on Phoneme Likely Units to Improve the Performance of Context-dependent Acoustic Models in Speech Recognition (음성인식에서 문맥의존 음향모델의 성능향상을 위한 유사음소단위에 관한 연구)

  • 임영춘;오세진;김광동;노덕규;송민규;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • 제22권5호
    • /
    • pp.388-402
    • /
    • 2003
  • In this paper, we carried out the word, 4 continuous digits. continuous, and task-independent word recognition experiments to verify the effectiveness of the re-defined phoneme-likely units (PLUs) for the phonetic decision tree based HM-Net (Hidden Markov Network) context-dependent (CD) acoustic modeling in Korean appropriately. In case of the 48 PLUs, the phonemes /ㅂ/, /ㄷ/, /ㄱ/ are separated by initial sound, medial vowel, final consonant, and the consonants /ㄹ/, /ㅈ/, /ㅎ/ are also separated by initial sound, final consonant according to the position of syllable, word, and sentence, respectively. In this paper. therefore, we re-define the 39 PLUs by unifying the one phoneme in the separated initial sound, medial vowel, and final consonant of the 48 PLUs to construct the CD acoustic models effectively. Through the experimental results using the re-defined 39 PLUs, in word recognition experiments with the context-independent (CI) acoustic models, the 48 PLUs has an average of 7.06%, higher recognition accuracy than the 39 PLUs used. But in the speaker-independent word recognition experiments with the CD acoustic models, the 39 PLUs has an average of 0.61% better recognition accuracy than the 48 PLUs used. In the 4 continuous digits recognition experiments with the liaison phenomena. the 39 PLUs has also an average of 6.55% higher recognition accuracy. And then, in continuous speech recognition experiments, the 39 PLUs has an average of 15.08% better recognition accuracy than the 48 PLUs used too. Finally, though the 48, 39 PLUs have the lower recognition accuracy, the 39 PLUs has an average of 1.17% higher recognition characteristic than the 48 PLUs used in the task-independent word recognition experiments according to the unknown contextual factor. Through the above experiments, we verified the effectiveness of the re-defined 39 PLUs compared to the 48PLUs to construct the CD acoustic models in this paper.