• Title/Summary/Keyword: Korean corpus

Search Result 1,199, Processing Time 0.026 seconds

Corpus-Based Ambiguity-Driven Learning of Context- Dependent Lexical Rules for Part-of-Speech Tagging (품사태킹을 위한 어휘문맥 의존규칙의 말뭉치기반 중의성주도 학습)

  • 이상주;류원호;김진동;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.1
    • /
    • pp.178-178
    • /
    • 1999
  • Most stochastic taggers can not resolve some morphological ambiguities that can be resolved only by referring to lexical contexts because they use only contextual probabilities based ontag n-grams and lexical probabilities. Existing lexical rules are effective for resolving such ambiguitiesbecause they can refer to lexical contexts. However, they have two limitations. One is that humanexperts tend to make erroneous rules because they are deterministic rules. Another is that it is hardand time-consuming to acquire rules because they should be manually acquired. In this paper, wepropose context-dependent lexical rules, which are lexical rules based on the statistics of a taggedcorpus, and an ambiguity-driven teaming method, which is the method of automatically acquiring theproposed rules from a tagged corpus. By using the proposed rules, the proposed tagger can partiallyannotate an unseen corpus with high accuracy because it is a kind of memorizing tagger that canannotate a training corpus with 100% accuracy. So, the proposed tagger is useful to improve theaccuracy of a stochastic tagger. And also, it is effectively used for detecting and correcting taggingerrors in a manually tagged corpus. Moreover, the experimental results show that the proposed methodis also effective for English part-of-speech tagging.

Automatic Recognition of Corpus Callosum of Midsagittal Brain MR Images (중앙시상 두뇌자기공명영상의 뇌량자동인식)

  • Lee, Cheol-Hui;Heo, Sin
    • Journal of Biomedical Engineering Research
    • /
    • v.20 no.1
    • /
    • pp.59-68
    • /
    • 1999
  • In this paper, we propose an algorithm to locate the corpus callosum automatically from midsagittal brain MR images using the statistical characteristics and shape information of the corpus callosum. In the proposed algorithm, we first extract regions satisfying the statistical characteristics of the corpus callosum and then find a region matching the shape information. In order to match the shape information, a new directed window region-growing algorithm is proposed instead of using conventional contour matching algorithms. Using the proposed algorithm, we adaptively relax the statistical requirement until we find a region matching the shape information. Experiments show promising results.

  • PDF

The fundamental frequency (f0) distribution of American speakers in a spontaneous speech corpus

  • Byunggon Yang
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.11-16
    • /
    • 2024
  • The fundamental frequency (f0), representing an acoustic measure of vocal fold vibration, serves as an indicator of the speaker's emotional state and language-specific pattern in daily conversations. This study aimed to examine the f0 distribution in an English corpus of spontaneous speech, establishing normative data for American speakers. The corpus involved 40 participants engaging in free discussions on daily activities and personal viewpoints. Using Praat, f0 values were collected filtering outliers after removing nonspeech sounds and interviewer voices. Statistical analyses were performed with R. Results indicated a median f0 value of 145 Hz for all the speakers. The f0 values for all speakers exhibited a right-skewed, pointy distribution within a frequency range of 216 Hz from 75 Hz to 339 Hz. The female f0 range was wider than that of males, with a median of 113 Hz for males and 181 Hz for females. This spontaneous speech corpus provides valuable insights for linguists into f0 variation among individuals or groups in a language. Further research is encouraged to develop analytical and statistical measures for establishing reliable f0 standards for the general population.

Conservative treatment of corpus callosum hemorrhage due to a falling coconut in Indonesia: a case report

  • Hanan Anwar Rusidi;Ferry Wijanarko
    • Journal of Trauma and Injury
    • /
    • v.37 no.1
    • /
    • pp.79-82
    • /
    • 2024
  • The potential for traumatic brain injury resulting from falling coconuts is frequently overlooked. These incidents can cause focal lesions in the form of brain hemorrhage. Corpus callosum hemorrhage due to blunt trauma from a falling object is rare and typically associated with poor prognosis. The purpose of this report is to detail a case of corpus callosum hemorrhage caused by a coconut fall and to discuss the conservative management approach employed. We report the case of a 54-year-old woman who was admitted to the hospital with symptoms of unconsciousness, headache, and expressive aphasia after being struck by a falling coconut. Notably, hemorrhage was detected within the body of the corpus callosum, as revealed by imaging findings. The patient received intensive monitoring and treatment in the intensive care unit, including oxygen therapy, saline infusion, an osmotic diuretic, analgesics, and medication to prevent stress ulcers. The patient demonstrated marked clinical improvement while undergoing conservative treatment. Despite the typically unfavorable prognosis of these rare injuries, our patient exhibited meaningful clinical improvement with conservative treatment. Timely diagnosis and appropriate interventions were crucial in managing the patient's condition. This report emphasizes the importance of considering traumatic brain injury caused by falling coconuts and highlights the need for further research and awareness in this area.

Sentence-Chain Based Seq2seq Model for Corpus Expansion

  • Chung, Euisok;Park, Jeon Gue
    • ETRI Journal
    • /
    • v.39 no.4
    • /
    • pp.455-466
    • /
    • 2017
  • This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence-chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4-times the number of n-grams with superior performance for English text.

A study of flaps in American English based on the Buckeye Corpus (Buckeye corpus에 나타난 탄설음화 현상 분석)

  • Hwang, Byeonghoo;Kang, Seokhan
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.9-18
    • /
    • 2018
  • This paper presents an acoustic and phonological study of the alveolar flaps in American English. Based on the Buckeye Corpus, the flapping tokens produced by twenty men are analyzed at both lexical and post-lexical levels. The data, analyzed with Pratt speech analysis, include duration, F2 and F3 in voicing during the flap, as well as duration, F1, F2, F3, and f0 in the adjacent vowels. The results provide evidence on two issues: (1) The different ways in which voiced and voiceless alveolar stops give rise to neutralized flapping stops by following lexical and post-lexical levels, (2) The extent to which the vowel features (height, frontness, and tenseness) affect flapping sounds. The results show that flaps are affected by pre-consonantal vowel features at the lexical as well as post-lexical levels. Unlike previous studies, this study uses the Praat method to distinguish flapped from unflapped tokens in the Buckeye Corpus and examines connections between the lexical and post-lexical levels.

A Study on the Vowel Duration of the Buckeye Corpus (벅아이 코퍼스의 모음 길이 연구)

  • Chung, Hyejung;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.103-110
    • /
    • 2015
  • The purpose of this study is to assess the vowel property by examining the vowel duration of the American English vowles found in the Buckeye corpus[6]. The vowel durations were analyzed in terms of various linguistic factors including the number of syllables of the word containing the vowel, the location of the vowel in a word, types of stress, function versus content word, the word frequency in the corpus and the speech rate calculated from the three consecutive words. The findings from this work agreed mostly with those from earlier studies, but with some exceptions. The relationship between the speech rate and the vowel duration proved non-linear.

Chemical Ingredients of Cordyceps militaris

  • Hur, Hyun
    • Mycobiology
    • /
    • v.36 no.4
    • /
    • pp.233-235
    • /
    • 2008
  • Medicinal mushrooms, including Cordyceps militaris, have received attention in Korea because of their biological activities. In the fruiting body and in corpus of C. militaris, the total free amino acid content was 69.32 mg/g and 14.03 mg/g, respectively. In the fruiting body, the most abundant amino acids were lysine, glutamic acid, proline and threonine. The fruiting body was rich in unsaturated fatty acids, which comprised about 70% of the total fatty acids. The most abundant unsaturated acid was linoleic acid. There were differences in adenosine and cordycepin contents between the fruiting body and the corpus. The adenosine concentration was 0.18% in the fruiting body and 0.06% in the corpus, and the cordycepin concentration was 0.97% in the fruiting body and 0.36% in the corpus.

An Analysis of the Vowel Formants of the Young Females in the Buckeye Corpus (벅아이 코퍼스에서의 젊은 성인 여성의 모음 포먼트 분석)

  • Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.45-52
    • /
    • 2012
  • The purpose of this paper is to measure the first two vowel formants of the ten young female speakers from the Buckeye Corpus of Conversational Speech [1] automatically and then to analyze various potential factors that may affect the formant distribution of the eight peripheral vowels of English. The factors that were analyzed included the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The results indicate that the overall formant patterns of the female speakers were similar to those of earlier works. The effects of the factors on the realization of the two formants were also similar to those from the male speakers with minor differences.

A corpus-based study on the effects of voicing and gender on American English Fricatives (성대진동 및 성별이 미국영어 마찰음에 미치는 효과에 관한 코퍼스 기반 연구)

  • Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.7-14
    • /
    • 2018
  • The paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of voicing in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2,342 different sentences, and comprises more than five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender, voicing, and place of articulation as independent factors. The results of the acoustic analyses revealed that acoustic signals interact in a complex way to signal the gender, place, and voicing of fricatives. Classification experiments using a multiclass support vector machine (SVM) revealed that 78.7% of fricatives are correctly classified. The majority of errors stem from the misclassification of /θ/ as [f] and /ʒ/ as [z]. The average accuracy of gender classification is 78.7%. Most errors result from the classification of female speakers as male speakers. The paper contributes to the understanding of the effects of voicing and gender on fricatives in a large-scale speech corpus.