• 제목/요약/키워드: Korean corpus

검색결과 1,199건 처리시간 0.035초

Corpus-based analysis of the usage of Korean markers -(n)un and -i/ka in editorial texts

  • Kim, Kyoung-Young
    • 한국언어정보학회지:언어와정보
    • /
    • 제19권2호
    • /
    • pp.19-36
    • /
    • 2015
  • The aim of this paper is to investigate the usage of Korean markers -(n)un and -i/ka in editorial texts focusing on information structure. Noun phrases ending with the markers -(n)un and -i/ka were annotated semi-automatically using a corpus obtained from an online newspaper. Two important factors to determine the choice of markers were examined with the annotated data: referential givenness/newness and position in a sentence. Referential givenness and newness were adopted as indicators of information structure, topic and focus respectively. In addition to quantitative analysis, qualitative analysis was conducted on the selected data. The results suggest that both the marker -(n)un and -i/ka could carry a topic and a focus reading. Sentence position also played a crucial role in determining the marker, and the marker -i/ka was used more frequently in a later position of a sentence than the marker -(n)un.

  • PDF

코퍼스기반 음성합성기의 데이터베이스 감축방안 (A Reduction of Speech Database in Corpus-based Speech Synthesis System)

  • 장경애;정민화;김재인;구명완
    • 대한음성학회지:말소리
    • /
    • 제44호
    • /
    • pp.145-156
    • /
    • 2002
  • This paper describes the reduction of DB without degradation of speech quality in Corpus-based Speech synthesizer of the Korean language. In this paper, it is proposed that the frequency of every unit in reduced DB reflect the frequency of units in the Korean language. So, the target population of every unit is set to be proportional to its frequency in Korean large corpus (780k sentences, 45Mega phones). Secondly, the frequent instances during synthesis should be also maintained in reduced DB. To the last, it is proposed that frequency of every instance be reflected in clustering criteria and used as another important criterion for selection of representative instances. The evaluation result with proposed methods reveals better quality than that using conventional methods.

  • PDF

Semi-Automatic Annotation Tool to Build Large Dependency Tree-Tagged Corpus

  • Park, Eun-Jin;Kim, Jae-Hoon;Kim, Chang-Hyun;Kim, Young-Kill
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2007년도 정기학술대회
    • /
    • pp.385-393
    • /
    • 2007
  • Corpora annotated with lots of linguistic information are required to develop robust and statistical natural language processing systems. Building such corpora, however, is an expensive, labor-intensive, and time-consuming work. To help the work, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. Compared with other annotation tools, our tool is characterized by the following features: independence of applications, localization of errors, powerful error checking, instant annotated information sharing, user-friendly. Using our tool, we have annotated 100,904 Korean sentences with dependency structures. The number of annotators is 33, the average annotation time is about 4 minutes per sentence, and the total period of the annotation is 5 months. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.

  • PDF

Development and Evaluation of a Korean Treebank and its Application to NLP

  • Han, Chung-Hye;Han, Na-Rae;Ko, Eon-Suk;Martha Palmer
    • 한국언어정보학회지:언어와정보
    • /
    • 제6권1호
    • /
    • pp.123-138
    • /
    • 2002
  • This paper discusses issues in building a 54-thousand-word Korean Treebank using a phrase structure annotation, along with developing annotation guidelines based on the morpho-syntactic phenomena represented in the corpus. Various methods that were employed for quality control are presented. The evaluation on the quality of the Treebank and some of the NLP applications under development using the Treebank are also pre-sented.

  • PDF

Regional Distribution of Interstitial Cells of Cajal, (ICC) in Human Stomach

  • Yun, Hyo-Yung;Sung, Ro-Hyun;Kim, Young-Chul;Choi, Woong;Kim, Hun-Sik;Kim, Heon;Lee, Gwang-Ju;You, Ra-Young;Park, Seon-Mee;Yun, Sei-Jin;Kim, Mi-Jung;Kim, Won-Seop;Song, Young-Jin;Xu, Wen-Xie;Lee, Sang-Jin
    • The Korean Journal of Physiology and Pharmacology
    • /
    • 제14권5호
    • /
    • pp.317-324
    • /
    • 2010
  • We elucidated the distribution of interstitial cells of Cajal (ICC) in human stomach, using cryosection and $c-Kit$ immunohistochemistry to identify $c-Kit$ positive ICC. Before $c-Kit$ staining, we routinely used hematoxylin and eosin (HE) staining to identify every structure of human stomach, from mucosa to longitudinal muscle. HE staining revealed that the fundus greater curvature (GC) had prominent oblique muscle layer, and $c-Kit$ immunostaining $c-Kit$ positive ICC cells were found to have typical morphology of dense fusiform cell body with multiple processes protruding from the central cell body. In particular, we could observe dense processes and ramifications of ICC in myenteric area and longitudinal muscle layer of corpus GC. Interestingly, $c-Kit$ positive ICC-like cells which had morphology very similar to ICC were found in gastric mucosa. We could not find any significant difference in the distribution of ICC between fundus and corpus, except for submucosa where the density of ICC was much higher in gastric fundus than corpus. Furthermore, there was no significant difference in the density of ICC between each area of fundus and corpus, except for muscularis mucosa. Finally, we also found similar distribution of ICC in normal and cancerous tissue obtained from a patient who underwent pancreotomy and gastrectomy. In conclusion, ICC was found ubiquitously in human stomach and the density of ICC was significantly lower in the muscularis mucosa of both fundus/corpus and higher in the submucosa of gastric fundus than corpus.

The Outcome of Corpus Callosotomy for Intractable Epilepsy : 10 Years Experience of Corpus Callosotomy

  • Seo, Jeong-Suk;Lee, Jong-Ju;Lee, Jung-Kyo;Kang, Jung-Gu;Lee, Sang-Am;Ko, Tae-Sung
    • Journal of Korean Neurosurgical Society
    • /
    • 제39권1호
    • /
    • pp.16-19
    • /
    • 2006
  • Objective : The purpose of this study is to evaluate the effect of the corpus callosotomy and to elucidate possible prognostic factors. Methods : The cases of 39 patients who underwent corpus callosotomy were reviewed retrospectively. Clinical outcomes were analyzed using Engel's classification, with consideration of various presurgical conditions and the extent of the callosal resection during follow-up more than one year. Results : Satisfactory outcome [Engel's class I, II] was obtained in 20 patients [51%] of 39 patients. In 36 cases with drop attack seizures, the class I, II outcomes were 22 patients [61%]. When the patients were grouped according to the extent of callosal resection, the class I, II outcomes were 50% of the patients with anterior 1/2 or 2/3, 50% of those with anterior 4/5 callosotomy, and 57% of those with total callosotomy, respectively. The mean follow-up period was 34 months [24 to 58 months]. Conclusion : Although it is not statistically significant, the patients who had underwent total callosotomy show better outcomes than those with partial callosotomy. Corpus callosotomy is efficacious in controlling medically intractable epilepsy in appropriately selected patients.

자궁체부의 양성 및 악성 종양의 자기공명영상 소견과 감별 진단 (MRI Findings and Differential Diagnosis of Benign and Malignant Tumors of the Uterine Corpus)

  • 김지현;허숙희;신상수;정용연
    • 대한영상의학회지
    • /
    • 제82권5호
    • /
    • pp.1103-1123
    • /
    • 2021
  • 자궁은 크게 자궁체부와 자궁경부로 나뉜다. 이 중 자궁내막과 자궁근층으로 이루어진 자궁체부에는 양성에서 악성 종양까지 다양한 질환이 발생한다. 비침습적인 일차적 평가로 초음파와 컴퓨터단층촬영이 있으나 비특이적인 영상 소견으로 감별이 어려운 경우가 있다. 반면 높은 해상도와 병리학적 특성 파악이 가능한 자기공명영상은 병변의 위치 확인뿐만 아니라 조직학적 특징, 그리고 악성 종양의 병기 설정에도 도움을 준다. 이 종설에서는 영상의학과 의사들이 알아야 할 자궁체부에서 볼 수 있는 다양한 양성과 악성 종양들의 특징적인 자기공명영상 소견들과 이들의 감별점에 대해 정리했다.

A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English

  • Yoon, Tae-Jin
    • 말소리와 음성과학
    • /
    • 제7권1호
    • /
    • pp.117-124
    • /
    • 2015
  • This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.

CiteSeer 말뭉치를 이용한 과학기술 문헌의 주제 분석 (Topic Analysis of Science and Technology Articles using CiteSeer Corpus)

  • 정한민;강인수;성원경
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제14권5호
    • /
    • pp.507-511
    • /
    • 2008
  • 과학기술 분야는 매우 빠른 발전 속도를 보이며 세부 분야 간 융 복합 현상이 빈번하게 일어나는 특징을 가지고 있다. 과학기술정보 말뭉치로부터 상기 특성을 분석해 내는 작업은 연구 주제 추이를 분석하고 주제 간 연관 관계를 파악하기 위해 필요하다. 본 연구는 과학기술 분야 - 특히 정보기술(Information Technology) 분야 - 에서 광범위하게 활용되고 있는 Citeseer 말뭉치로부터 추출된 주제를 이용하여 다양한 주제 분석을 수행하는 방안을 보이는 것을 목표로 한다. 특히, 연구개발 전주기 지원 시스템인 OntoFrame에서 주제가 어떠한 역할을 할 수 있는지 사례를 통해 실증하고자 한다.

Differential Expression of Multiple Connexins in Rat Corpus and Cauda Epididymis at Various Postnatal Stages

  • Lee, Ki-Ho
    • Journal of Animal Science and Technology
    • /
    • 제55권6호
    • /
    • pp.521-530
    • /
    • 2013
  • Direct cell-cell communication via the transfer of small molecules between neighboring cells in tissue is accomplished by gap junctions composed of various connexins (Cxs). Proper postnatal development of the epididymis is important for acquisition of male reproduction. The epididymal epithelium is composed of several cell types, and some of these cells are connected by gap junctions. The present study was conducted to determine the presence of Cx transcripts in the corpus and cauda epididymis. In addition, transcriptional changes of Cxs expressed during different postnatal stages were examined by real-time PCR analysis. In both epididymal regions, the same nine Cx transcripts of thirteen Cxs tested were detected. In the corpus epididymis, the highest levels of Cxs31.1 and 37 transcripts were observed at 45 days of age, and amounts of Cxs26, 30.3, and 32 transcripts increased with age and subsequently decreased in the elderly. Expression of Cx31 was greatly increased in the adult and elder stages, while Cxs40, 43, and 45 were abundant in the early postnatal stages. In the cauda epididymis, expression of Cxs26, 30.3, 31.1, 37, and 40 reached the highest levels at 5 months of age. The levels of Cxs31 and 32 mRNAs fluctuated throughout the postnatal period. The amounts of Cxs43 and 45 transcripts were more abundant during the late neonatal and prepubertal ages than later ages. These findings suggest that regional specification of the epididymis is partly regulated by differential expression of Cx genes during the postnatal developmental period.