• Title/Summary/Keyword: Korean corpus

Search Result 1,186, Processing Time 0.209 seconds

The Study Of Lexical Statistics Analysis For Elementary School Textbook : Focusing On Comparing The SEJONG Corpus In Korean (초등학교 교과서의 어휘 통계 분석 연구 : 한국어 세종 코퍼스와의 비교를 중심으로)

  • Yu, Wonhee;Lim, Heuiseok
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.1
    • /
    • pp.99-108
    • /
    • 2015
  • In this paper, we build a primary school textbook corpus and a statistical analysis was performed with respect to the vocabulary found in elementary textbooks. also We performed the Spearman's correlation coefficient in order to explore whether similar elementary textbooks in general life used vocabulary. the result of this study shows that corpus building in the form of elementary school textbooks and actual examples. then numerically shown correlation of the elementary textbooks and general corpus.

Monophthong Analysis on a Large-scale Speech Corpus of Read-Style Korean (한국어 대용량발화말뭉치의 단모음분석)

  • Yoon, Tae-Jin;Kang, Yoonjung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.139-145
    • /
    • 2014
  • The paper describes methods of conducting vowel analysis from a large-scale corpus with the aids of forced alignment and optimal formant ceiling methods. 'Read Style Corpus of Standard Korean' is used for building the forced alignment system and a subset of the corpus for the processing and extraction of features for vowel analysis based on optimal formant ceiling. The results of the vowel analysis are reliable and comparable to the results obtained using traditional analytical methods. The findings indicate that the methods adopted for the analysis can be extended and be used for more fine-grained analysis without time-consuming manual labeling without losing accuracy and reliability.

Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling (언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석)

  • Jung Eui-Jung;Lee Youngjik
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.87-90
    • /
    • 2002
  • This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

  • PDF

Relaxation Effects of Eucomiae Cortex in Isolated Rabbit Corpus Cavernosum Smooth Muscle (杜冲의 토끼 음경해면체 평활근 이완효과)

  • Park, Sun Young
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.29 no.6
    • /
    • pp.485-491
    • /
    • 2015
  • This study was aimed to investigate the relaxation effects of Eucomiae Cortex (EC) extract in isolated rabbit corpus cavernosum smooth muscle and its mechanism. To evaluate the relaxation of EC extract in rabbit corpus cavernosum, EC extract was treated in corporal strips which were precontracted with phenylephrine(PE). To study its mechanism, Nω-nitro-L-arginine (L-NNA) was pretreated after infuse of EC extract and compared with non-treated. In calcium chloride (Ca2+) -free krebs solution, EC extract and Ca2+ 1 mM were infused by turns after Ca2+ 1 mM was treated into corporal strips contracted by PE. Cell ability, nitric oxide (NO) and epithelial nitric oxide synthase (eNOS) on human umbilical vein endothelial cell (HUVEC) were measured by MTT assay, Griess reagent system and histochemical, immunohistochemical methods. EC extract showed a significant relaxation effects on the corporal strips, this effects were inhibited by pretreatment of L-NNA. EC extract inhibited the increase of contraction by Ca2+ influx in Ca2+-free krebs solution, and eNOS positive reaction in corpus cavernosum, NO production in HUVEC increased by treatment of EC extract. These result suggest that the relaxation effects of EC extract in isolated corpus cavernosum smooth muscle are involved in increase of eNOS and NO production, blocking of extracellular Ca2+ influx.

Using Corpora for Studying English Grammar

  • Kwon, Heok-Seung
    • Korean Journal of English Language and Linguistics
    • /
    • v.4 no.1
    • /
    • pp.61-81
    • /
    • 2004
  • This paper will look at some grammatical phenomena which will illustrate some of the questions that can be addressed with a corpus-based approach. We will use this approach to investigate the following subjects in English grammar: number ambiguity, subject-verb concord, concord with measure expressions, and (reflexive) pronoun choice in coordinated noun phrases. We will emphasize the distinctive features of the corpus-based approach, particularly its strengths in investigating language use, as opposed to traditional descriptions or prescriptions of structure in English grammar. This paper will show that a corpus-based approach has made it possible to conduct new kinds of investigations into grammar in use and to expand the scope of earlier investigations. Native speakers rarely have accurate information about frequency of use. A large representative corpus (i.e., The British National Corpus) is one of the most reliable sources of frequency information. It is important to base an analysis of language on real data rather than intuition. Any description of grammar is more complete and accurate if it is based on a body of real data.

  • PDF

Usage analysis of vocabulary in Korean high school English textbooks using multiple corpora (코퍼스를 통한 고등학교 영어교과서의 어휘 분석)

  • Kim, Young-Mi;Suh, Jin-Hee
    • English Language & Literature Teaching
    • /
    • v.12 no.4
    • /
    • pp.139-157
    • /
    • 2006
  • As the Communicative Approach has become the norm in foreign language teaching, the objectives of teaching English in school have changed radically in Korea. The focus in high school English textbooks has shifted from mere mastery of structures to communicative proficiency. This paper will study five polysemous words which appear in twelve high school English textbooks used in Korea. The twelve text books are incorporated into a single corpus and analyzed to classify the usage of the selected words. Then the usage of each word was compared with that of three other corpora based sources: the BNC(British National Corpus) Sampler, ICE Singapore(International Corpus of English for Singapore) and Collins COBUILD learner's dictionary which is based on the corpus, "The Bank of English". The comparisons carried out as part of this study will demonstrate that Korean text books do not always supply the full range of meanings of polysemous words.

  • PDF

Detecting and correcting errors in Korean POS-tagged corpora (한국어 품사 부착 말뭉치의 오류 검출 및 수정)

  • Choi, Myung-Gil;Seo, Hyung-Won;Kwon, Hong-Seok;Kim, Jae-Hoon
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.37 no.2
    • /
    • pp.227-235
    • /
    • 2013
  • The quality of the part-of-speech (POS) annotation in a corpus plays an important role in developing POS taggers. There, however, are several kinds of errors in Korean POS-tagged corpora like Sejong Corpus. Such errors are likely to be various like annotation errors, spelling errors, insertion and/or deletion of unexpected characters. In this paper, we propose a method for detecting annotation errors using error patterns, and also develop a tool for effectively correcting them. Overall, based on the proposed method, we have hand-corrected annotation errors in Sejong POS Tagged Corpus using the developed tool. As the result, it is faster at least 9 times when compared without using any tools. Therefore we have observed that the proposed method is effective for correcting annotation errors in POS-tagged corpus.

Architecture of Cerebral Neuroendocrine System in the Lawa of Cabbage Butterfly Pieris rapue (배추흰나비 5령유충의 뇌신경내분비계의 구조)

  • 이봉희;윤혜련심재원
    • The Korean Journal of Zoology
    • /
    • v.36 no.2
    • /
    • pp.285-292
    • /
    • 1993
  • This investigation has been carried out to clarify structural architecture of cerebral neuroendocrine systems in the fifth instar lanra of cabbage butterfly Pieris rapae. In order to examine the cerebral neurosecretorv cell systems the brain and retrocerebral neuroendocrine complex were histochemically stained with the paraldehvde fuchsin. The brain of the fifth instar laMa contains three kinds of neurosecretorv cells: medial, lateral and tritocerebral neurosecretorv cells. The axon bundles of medial and lateral neurosecretory cells form medial neurosecretory pathway(MNSP) and lateral neurosecretorv pathwav(LNSP) within the brain respectively. Especially, prior to exiting the brain, the axon bundles of medial neurosecretorH cells located in both left and right cefebral hemispheres decussate in cerebral medial region and project to contralateral retrocerebral neuroendocrine complexes. Outside the brain the axon bundles of medial and lateral neurosecretory cells form the nenri corporis cardiaca(NCC) I and II respectively. The NCC I and ll run together to the retrocerebral neuroendocrine complex, forming the large nenre bundles in both left md right sides. The anon bundles of tritocerebral neurosecretory cells which pass through the brain along the tritocerebral neurosecretory pathway (TNSP) form the Ncc III outids the train. some of the Ncc I and it terminate in the corpus cardiacum, while the others pass through the corpus cardiacum without termination. The nerve bundle which passes the corpus cardiacum forms the nenrus corforis allatum(NCA) I which runs between the corpus cardiacum and the corpus allatum. Theyt are finally innervated to the corpus allatum. The Ncc III Projects to the corpus cardiRcum. However, most of NCC III priss through the corpus cardiacum without branching and then run down for another organ.

  • PDF

Noun Sense Disambiguation Based-on Corpus and Conceptual Information (말뭉치와 개념정보를 이용한 명사 중의성 해소 방법)

  • 이휘봉;허남원;문경희;이종혁
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.2
    • /
    • pp.1-10
    • /
    • 1999
  • This paper proposes a noun sense disambiguation method based-on corpus and conceptual information. Previous research has restricted the use of linguistic knowledge to the lexical level. Since knowledge extracted from corpus is stored in words themselves, the methods requires a large amount of space for the knowledge with low recall rate. On the contrary, we resolve noun sense ambiguity by using concept co-occurrence information extracted from an automatically sense-tagged corpus. In one experimental evaluation it achieved, on average, a precision of 82.4%, which is an improvement of the baseline by 14.6%. considering that the test corpus is completely irrelevant to the learning corpus, this is a promising result.

  • PDF

A study on the early pregnancy diagnosis by changing of plasma progesterone concentration and morphology of ovary in pregnancy and non -pregnancy cows (소에서 비임신 및 임신 상태의 난소 형태와 혈중 progesterone 농도 변화에 의한 조기 임신진단)

  • Kim, Cheol-Ho;Bhak, Jong-Sik;Shin, Jung-Sub;Kang, Chung-Bo
    • Korean Journal of Veterinary Service
    • /
    • v.31 no.3
    • /
    • pp.397-414
    • /
    • 2008
  • In order to evaluate conception rate of Hanwoo in northwestern region of Gyeongsang-nam-do, we investigated conception rate and reduction of reproductive disorder rate after artificial insemination (AI) in 1,000 heads of breeding cows, This study showed that 80.9% of cows were classified as fertility after 1st and 2nd AI. For a accurate pregnancy diagnosis with practicing ovariectomy and histeotomy, we comparatively investigated each of 80 slaughtered cows, including 30 of non-pregnancy, and used enzyme-linked immunosorbent assay (ELISA) for estimation of plasma progesterone concentration and serum luteal hormone. The mean diameter of non-pregnant corpus luteum is $18.9{\pm}4.2{\times}15.6{\pm}3.6 mm$ and that of pregnant corpus luteum is $22.5{\pm}2.7{\times}18.7{\pm}2.9 mm$. This indicates that corpus luteum is more developed in the ovary of pregnant than non-pregnant cows (P<0.05). The diameter of pregnant corpus luteum according to the stage of pregnancy showed $21.3{\pm}2.4{\pm}18.4{\pm}2.6 mm$ in early stage (1-3 month), $23.4{\pm}2.8{\times}19.1{\pm}2.7 mm$ in middle stage (4-6 month) and $22.8{\pm}3.0{\times}18.8{\pm}2.4mm$, in last stage (7-9 month). This indicates that corpus luteum in middle and last stage is more significantly developed than that of early stage(P<0.05). The mean plasma progesterone concentration of cows showing size of non-pregnant corpus luteum was $4.58{\pm}0.92ng/ml$ and that of pregnant corpus luteum $8.26{\pm}0.98ng/ml$. Thus, it was more significantly increased in pregnant corpus luteum(P<0.02).. However, it was low to $0.58{\pm}0.39ng/ml$. in estrus (corpus albicans). The plasma progesterone concentration according to gestation period was high in proportion to the degree of development in corpus luteum and more significantly increased (P<0.05) and maintained in middle and last state than early state. The concentration was sharply decreased to $0.56{\pm}0.32ng/ml$ at parturition. As a consequence, we can practice the early pregnancy diagnosis by confirming non-pregnancy when the mean plasma progesterone concentration is below 1ng/ml 19 to 22 days after AI and this can be available to diagnose reproductive disorder.