• Title/Summary/Keyword: corpus size

Search Result 117, Processing Time 0.027 seconds

Language Model Adaptation for Broadcast News Recognition (방송 뉴스 인식을 위한 언어 모델 적응)

  • Kim Hyun Suk;Jeon Hyung Bae;Kim Sanghun;Choi Joon Ki;Yun Seung
    • MALSORI
    • /
    • no.51
    • /
    • pp.99-115
    • /
    • 2004
  • In this parer, we propose LM adaptation for broadcast news recognition. We collect information of recent articles from the internet on real time, make a recent small size LM, and then interpolate recent LM with a existing LM composed of existing large broadcast news corpus. We performed interpolation experiments to get the best type of articles from recent corpus because collected recent corpus is composed of articles which are related with test set, and which are unrelated. When we made an adapted LM using recent LM with similar articles to test set through Tf-Idf method and existing LM, we got the best result that ERR of pseudo-morpheme based recognition performance has 17.2 % improvement and the number of OOV has reduction from 70 to 27.

  • PDF

An Analysis of the Vowel Formants of the Young versus Old Speakers in the Buckeye Corpus (벅아이 코퍼스에서의 연령별 모음 포먼트 분석)

  • Km, Ji-Eun;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.29-35
    • /
    • 2012
  • The purpose of this study was to measure the first two vowel formants of the forty male and female speakers (twenty young vs. old male speakers and twenty young vs. old female speakers) from the Buckeye Corpus of Conversational Speech and to examine the vowel formant changes across two generations (younger vs. older). The results indicated that the vowel space of the younger generation (in their thirties or less) shifted to the lower left position compared to those of the older generation (in their forties or more) in both male and female speakers. When the results were compared to those of Peterson & Barney (1952), it appears that differences can be found in the size of the vowel spaces through time.

Interhemispheric Osteolipoma with Agenesis of the Corpus Callosum

  • Park, Yong-Sook;Kwon, Jeong-Taik;Park, Un-Sub
    • Journal of Korean Neurosurgical Society
    • /
    • v.47 no.2
    • /
    • pp.148-150
    • /
    • 2010
  • Osteolipoma is an ossified lipoma with distinct components of fat and bone. We present a case of interhemispheric osteolipoma associated with total agenesis of the corpus callosum. A 20-year-old man complained of severe headache, nausea and vomiting. Brain computed tomography showed a low-density mass in an interhemispheric fissure, with high T1 and T2 magnetic resonance signals compatible with fat. The mass measured $4.9\;{\times}\;2.9\;cm$ in size and showed peripheral calcifications. There was another small piece of same signal mass within the lateral ventricular choroid plexus. The interhemispheric lesion was removed by an interhemispheric approach. Osteolipoma is rare in interhemispheric region, however, it should be a differential diagnosis of lesions with fat intensity mass and calcifications.

Enhancement of a language model using two separate corpora of distinct characteristics

  • Cho, Sehyeong;Chung, Tae-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.357-362
    • /
    • 2004
  • Language models are essential in predicting the next word in a spoken sentence, thereby enhancing the speech recognition accuracy, among other things. However, spoken language domains are too numerous, and therefore developers suffer from the lack of corpora with sufficient sizes. This paper proposes a method of combining two n-gram language models, one constructed from a very small corpus of the right domain of interest, the other constructed from a large but less adequate corpus, resulting in a significantly enhanced language model. This method is based on the observation that a small corpus from the right domain has high quality n-grams but has serious sparseness problem, while a large corpus from a different domain has more n-gram statistics but incorrectly biased. With our approach, two n-gram statistics are combined by extending the idea of Katz's backoff and therefore is called a dual-source backoff. We ran experiments with 3-gram language models constructed from newspaper corpora of several million to tens of million words together with models from smaller broadcast news corpora. The target domain was broadcast news. We obtained significant improvement (30%) by incorporating a small corpus around one thirtieth size of the newspaper corpus.

Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer (말뭉치와 형태소 분석기를 활용한 한국어 자동 띄어쓰기)

  • Shim, Kwangseob
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.68-75
    • /
    • 2015
  • This paper proposes a method for the automatic word spacing of unsegmented Korean sentences. In our method, eojeol monograms are used for word spacing as opposed to the syllable n-grams that have been used in previous studies. The use of a Korean morphological analyzer is limited to the correction of typical word spacing errors. Our method gives a 98.06% syllable accuracy and a 94.15% eojeol recall, when 10-fold cross-validated with the Sejong corpus, after filtering out non-hangul eojeols. The processing rate is 250K eojeols or 1.8 MB per second on a typical personal computer. Syllable accuracy and eojeol recall are related to the size of the eojeol dictionary, better performance is expected with a bigger corpus.

Extending Korean PropBank for Korean Semantic Role Labeling and Applying Domain Adaptation Technique (한국어 의미역 결정을 위한 Korean PropBank 확장 및 도메인 적응 기술 적용)

  • Bae, Jangseong;Lee, Changki
    • Korean Journal of Cognitive Science
    • /
    • v.26 no.4
    • /
    • pp.377-392
    • /
    • 2015
  • Korean semantic role labeling (SRL) is usually performed by a machine learning and requires a lot of corpus. However, the Korean PropBank used in Korean SRL system is less than PropBank. It leads to a low performance. Therefore, we expand the annotated corpus and verb frames for Korean SRL system to expand the Korean PropBank corpus. Most of the SRL system have a domain-dependent performance so, the performance may decrease if domain was changed. In this paper, we use the domain adaptation technique to reduce decreasing performance with the existing corpus and the small size of new domain corpus. We apply the domain adaptation technique to Structural SVM and Deep Neural Network. The experimental result show the effectiveness of the domain adaptation technique.

Effect of Unilateral Ovariectomy on Development of Ovarian Follicle, Corpus Luteum and Serum Progesteron Level in Immature Female Rats (미성숙 암흰쥐에 있어서 편측난소척출이 난포발육, 황체 및 혈청 Progesterone 수준에 미치는 영향)

  • 정재혁;김종대;정영채;김창근
    • Korean Journal of Animal Reproduction
    • /
    • v.9 no.2
    • /
    • pp.97-104
    • /
    • 1985
  • This study was conducted to investigate the effects of unilateral ovariectomy on the weight of the remaining ovary, the change of number of ovarian follicle, number of corpus luteum and serum progesterone level. Sixty Sprague-Dawley female rats, 23$\pm$2 days old, were divided into 2 groups (control and unilaterally ovariectomized goup) with 30 heads per groups. Each group was again subdivided into 6 groups according to 6 experimental periods; Day 4, 8, 12, 16, 20 and 24 after uniteral ovariectomy. Five arts at every 4 day intervals were sacrificed for the measuring of ovarian weight and for quantitative histologic examination of ovary and at the same time, blood samples were taken for the determination of serum progesterone level of radioimmunoassy. The results obtained were as follows: During the experimental periods, a significant hypertrophy occured in the remaining ovary of unilaterally ovariectomized group from day 16 after operation. The average ovarian weight of control group at day 16 was 21.0$\pm$1.7mg, which is samller than that of unilaterally ovariectomized group weighing 50.5$\pm$8.4mg(P<0.01). The ovarian weight of the unilaterally ovariectomized rats at day 20 and day 24 was 75.9$\pm$2.2 mg and 63.3$\pm$7.0 mg, which is heavier than those of control group; 29.1$\pm$2.3 and 26.3$\pm$1.7 mg(P<0.01 and P<0.01). 2. A same degree of ovarian follicle development was observed in the unilaterally ovariectomized group. Following unilateral ovariectomy and there was no change in total number of follicles larger than 130$\mu$ during the period from day 4 till day 24 after operation. 3. Although the size fo ovarian follicle did not significantly change between two groups from day 4 till day 16, the size of vesicular follicle in unilaterally ovariectomized group (406.3$\pm$26.2$\mu$) was significantly greater as compared to that of control group (323.8$\pm$19.3$\mu$)(P<0.05). 4. Corpus luteum in unilaterally ovariectomized and control group began to a, pp.ar from day 16 after operation and then the number of corpus luteum slightly increased. The number of corpus luteum in unilaterally ovariectomized group at day 24 ws remarkably increased (13.7$\pm$1.41) than that of control (5.2$\pm$2.01)(P<0.01). 5. Serum progesterone levels in unilaterally ovariectomized group were slightly higher than those of control but there were no significant difference between treatment groups.

  • PDF

Effects of age and gender on spatial orientation of human corpus callosum in healthy Koreans

  • Hwang, Seung-Jun;Park, Chan;Hong, Hea-Nam;Ryu, Ji-Yeon;Park, In-Sung;Rhyu, Im-Joo
    • Animal cells and systems
    • /
    • v.15 no.4
    • /
    • pp.274-278
    • /
    • 2011
  • The changes in the corpus callosum (CC) with age and gender remain largely subject to dispute, which might come from the different strategies for analyzing the size and shape of CC. We have investigated this issue by measuring some variables reflecting the spatial orientation of CC on magnetic resonance imaging in Koreans, which minimize individual variances in the brain. The subjects were composed of young adults in their twenties (51 male, 59 female) and elderly adults in their sixties and seventies (60 male, 71 female). The total area of CC, length and height of CC, the central angle and the four angles suggested by Oka et al. were measured. The whole area and the central angle of CC were not significantly affected by age and gender. The height and length of CC were significantly greater in elderly people. The angle connecting genu, upper notch of pons and splenium was significantly larger in the elderly group. Furthermore, all four angles were significantly different between male and female subjects. These results confirm that the spatial orientation of CC is influenced by age and gender.

Cloning, Expression and Hormonal Regulation of Steroidogenic Acute Regulatory Protein Gene in Buffalo Ovary

  • Malhotra, Nupur;Singh, Dheer;Sharma, M.K.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.20 no.2
    • /
    • pp.184-193
    • /
    • 2007
  • In mammalian ovary, steroidogenic acute regulatory (StAR) protein mediates the true rate-limiting step of transport of cholesterol from outer to inner mitochondrial membrane. Appropriate expression of StAR gene represents an indispensable component of steroidogenesis and its regulation has been found to be species specific. However, limited information is available regarding StAR gene expression during estrous cycle in buffalo ovary. In the present study, expression, localization and hormonal regulation of StAR mRNA were analyzed by semi-quantitative RT-PCR in buffalo ovary and partial cDNA was cloned. Total RNA was isolated from whole follicles of different sizes, granulosa cells from different size follicles and postovulatory structures like corpus luteum and Corpus albicans. Semi-quantitative RT-PCR analyses showed StAR mRNA expression in the postovulatory structure, corpus luteum. No StAR mRNA was detected in total RNA isolated from whole follicles of different size including the preovulatory follicle (>9 mm in diameter). However, granulosa cells isolated from preovulatory follicles showed the moderate expression of StAR mRNA. To assess the hormonal regulation of StAR mRNA, primary culture of buffalo granulosa cells were treated with FSH (100 ng/ml) alone or along with IGF-I (100 ng/ml) for 12 to 18 h. The abundance of StAR mRNA increased in cells treated with FSH alone or FSH with IGF-I. However, effect of FSH with IGF-I on mRNA expression was found highly significant (p<0.01). In conclusion, differential expression of StAR messages was observed during estrous cycle in buffalo ovary. Also, there was a synergistic action of IGF-I on FSH stimulation of StAR gene.

Construction of an Efficient Pre-analyzed Dictionary for Korean Morphological Analysis (한국어 형태소 분석을 위한 효율적 기분석 사전의 구성 방법)

  • Kwak, Sujeong;Kim, Bogyum;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.12
    • /
    • pp.881-888
    • /
    • 2013
  • A pre-analyzed dictionary is used to increase the speed and the accuracy of morphological analyzers and to decrease the over-generation. However, if the dictionary includes 'Insufficiently-analyzed word-phrases', which do not include all the possible analysis of the word-phrase, it may cause the decrease of the analysis accuracy. In this paper, we measure the accuracy changes according to the number of word-phrase frequency and the size changes of corpus by Sejong corpus. And performance of integrate system(SMA with pre-dictionary) is highest when sufficient analysis rate of pre-dictionary is more than 99.82%. Also pre-dictionary is constructed with word-phrase that frequency more than 32(64) when size of corpus is 1,600,000(6,300,000) word-phrase.