• 제목/요약/키워드: Korean corpus

검색결과 1,201건 처리시간 0.023초

대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계 (A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition)

  • 오유리;윤재삼;김홍국
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.103-106
    • /
    • 2005
  • In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

  • PDF

Morphologic Assessment of Corpus Callosum in the Patient of Alzheimer Disease using Magnetic Resonance Imaging

  • Seoung, Youl-Hun;Choe, Bo-Young
    • 한국자기공명학회논문지
    • /
    • 제13권2호
    • /
    • pp.84-95
    • /
    • 2009
  • The purpose of this study was to evaluate the usefulness of the measurement of corpus callosum (CC) size in the Alzheimer patient by using magnetic resonance (MR) midsagittal image. We performed MR scanning in 20 normal high age group, and in 20 mild cognitive impairment (MCI) group, and in 20 Alzheimer disease (AD) group. The following parameters were employed in AD group: TRITE/FA 6650ms/66ms/$90^{\circ}$, NEX 2, Thickness/Gap 2/0, FOV 220mm. The magnetic field strength was used at 3.0 Tesla. We selected midsagittal image of the brain by using view forum program, measured CC size, which were anteroposterior length, diameter of genu, body, narrowing portion, and splenium. The present study demonstrates that CC size of Alzheimer disease can be useful for clinical assessment concerning the diameter of genu, body, and splenium.

자유 발화 자료에 나타난 {그래가지고}의 접속 부사화 (A Study on the Lexicalization of {Geuraegajigo} Based on the Spontaneous Speech Corpus)

  • 하영우;신지영
    • 한국어학
    • /
    • 제64권
    • /
    • pp.195-223
    • /
    • 2014
  • The aim of this paper is to study the morphemization of {Geuraegajigo} based on a spontaneous speech corpus. For this purpose, the distributions, the semantic functions, and the intonational phrase pattterns of the connective {Geuraegajigo} have been analyzed based on the corpus. The results are as follow; at first, coalescence that comes with a morphemization process was found, resulting in many variations. Secondly, there are three functions of it: [Direct/Indirect interrelationship], [Enumerate conjunction], and [Discourse marker]. And this semantic/functional diversity has many similarities with conjunctive adverbs. Lastly, intonational phrase patterns of {Geuraegajigo} accord with those of conjunctive adverbs. Especially, the discourse strategic IP pattern is connected with the short variation type. In conclusion, {Geuraegajigo} has finished turning into a conjunctive adverb through morphemization.

Data Mining Research on Maehwado Painting Poetry in the Early Joseon Dynasty

  • Haeyoung Park;Younghoon An
    • Journal of Information Processing Systems
    • /
    • 제19권4호
    • /
    • pp.474-482
    • /
    • 2023
  • Data mining is a technique for extracting valuable information from vast amounts of data by analyzing statistical and mathematical operations, rules, and relationships. In this study, we employed data mining technology to analyze the data concerning the painting poetry of Maehwado (plum blossom paintings) from the early Joseon Dynasty. The data was extracted from the Hanguk Munjip Chonggan (Korean Literary Collections in Classical Chinese) in the Hanguk Gojeon Jonghap database (Korea Classics DB). Using computer information processing techniques, we carried out web scraping and classification of the painting poetry from the Hanguk Munjip Chonggan. Subsequently, we narrowed down our focus to the painting poetry specifically related to Maehwado in the early Joseon Dynasty. Based on this, refined dataset, we conducted an in-depth analysis and interpretation of the text data at the syllable corpus level. As a result, we found a direct correlation between the corpus statistics for each syllable in Maehwado painting poetry and the symbolic meaning of plum blossoms.

젖소에서 초음파 검사를 이용한 번식장애 정밀진단에 관한 연구 (Studies on Accurate Diagnosis on Reproductive Failures of Dairy Cows by Ultrasonography)

  • 김용준;박희섭;김용수;조성우;신동수;이해이;김수희
    • 한국임상수의학회지
    • /
    • 제23권2호
    • /
    • pp.133-143
    • /
    • 2006
  • Diagnosis on reproductive failures of dairy cows by ultrasonography was performed for 151 dairy cows. To diagnose types of reproductive failures, ultrasonography (SA 600, Medison, 5.0 MHz rectal linear transducer) was carried out in combination with rectal examination. Of 151 dairy cows, pregnant cows were 13 and the cows in normal estrual cycle were 40 cows, thereby the cows with reproductive failures were 98 cows. 1. Of 98 cows with reproductive failures, the cows with ovarian diseases were 34 cows (34.7%) and the cows with uterine diseases were 41 cows (41.8%). 2. The diameter of follicle in proestrus was 1.94 cm and it was longer than that of follicle in diestrus (p<0.05). 3. The mean size of corpus luteum of pregnant cows was bigger than that of corpus luteum in normal diestrus (p<0.05). 4. The length of cystic corpus luteum was 3.26 and the width of that was 1.91 cm. The length of corpus luteum tissue was 1.95 and the width of that was 1.91 cm excluding the size of cavity in corpus luteum. 5. The mean length of follicular cyst was 3.31 and the mean width of that was 2.3 cm. 6. The mean length and width of luteal cyst was 3.45 and 2.25 cm, respectively. The mean length and width of corpus luteum tissue was 1.15 and 0.67 cm, respectively, excluding the size of cyst in the luteal cyst. 7. The width of uterine horn associated with endometritis was significantly reduced as the period after parturition was elapsed (p<0.05). The mean width of uterine horn within 40 days after parturition was 4.55 cm. These results indicated that ultrasonography is of great use for accurate diagnosis both on ovarian diseases and uterine diseases and that it is very effective to diagnose endometritis in dairy cows.

초음파검사 및 호르몬검사에 의한 젖소 번식검진과 발정유도 II. 황체가 존재하는 난소낭종의 진단과 치료 (Reproductive Monitoring and Estrus Induction Using Ultrasonography and Hormone Assay in Dairy Cows II. Differential Diagnosis and Treatment of Coexist of Cysts and Corpus luteum)

  • 오기석;박상국;김방실;고진성;신종봉;백종환;홍기강;문광식;임원호
    • 한국임상수의학회지
    • /
    • 제20권3호
    • /
    • pp.376-383
    • /
    • 2003
  • To establish the differential diagnosis and treatment method in bovine ovarian cysts, specially ovarian cysts with corpus luteum, serum progesterone concentration, rectal palpation and ultrasonography for measuring the cystic wall thickness and diameter of cyst and corpus luteum were investigated from 1,188 dairy cows with ovarian cysts. The plasma progesterone concentrations were 0.3$\pm$0.4 (mean$\pm$SD) ng/ml in 629 cows with follicular cysts, 3.7$\pm$1.1 ng/ml in 431 cows with luteal cysts, and 3.8$\pm$1.2 ng/ml in 128 cows with coexist of ovarian cysts and corpus luteum, respectively. The cystic wall thickness by ultrasonography were 1.6$\pm$0.4 mm in 629 cows with follicular cysts, 4.2$\pm$1.5 mm in 431 cows with luteal cysts, and 1.6$\pm$0.6 mm in 128 cows with coexist of ovarian cysts and corpus luteum, respectively. The days from initial treatment to insemination in follicular cysts were 28.1$\pm$6.9 days in treatment of GnRH alone, 15.9$\pm$2.9 days in combination of GnRH and dinoprost, and 15.1$\pm$3.1 days in combination of GnRH and cloprostenol. The percentages of cows conceived within 100 days after initial treatment were 61 %, 68% and 73% in treatment of GnRH alone, combination of GnRH and dinoprost, and combination of GnRH and cloprostenol, respectively. The days from initial treatment to insemination in luteal cysts were 3.8$\pm$0.6 days in treatment of dinoprost alone and 3.8$\pm$0.7 in cloprostenol alone. The percentages of cows conceived within 100 days after initial treatment were 69.5% and 68.5% in treatment of dinoprost and cloprostenol, respectively. The days from initial treatment to insemination in coexist of cysts and corpus luteum were 3.7$\pm$0.7 days in treatment of dinoprost alone and 3.8$\pm$0.6 in cloprostenol alone. The percentages of cows conceived within 100 days after initial treatment were 87% and 84% in treatment of dinoprost and cloprostenol, respectively. These results suggest that the best choice for treatment agents in ovarian cysts were combination of GnRH and PGF$_2$$\alpha$ in follicular cysts, and the PGF$_2$$\alpha$ in luteal cysts and in coexist of cysts and corpus luteum, respectively. In conclusion, it is suggest that ultrasonography is useful diagnostic tool for diagnosis and selection of treatment remedy in cystic ovaries of bovine.

원거리 감독과 능동 배깅을 이용한 개체명 인식 (Named Entity Recognition Using Distant Supervision and Active Bagging)

  • 이성희;송영길;김학수
    • 정보과학회 논문지
    • /
    • 제43권2호
    • /
    • pp.269-274
    • /
    • 2016
  • 개체명 인식은 문장에서 개체명을 추출하고 추출된 개체명의 범주를 결정하는 작업이다. 기존의 개체명 인식 연구는 주로 지도 학습 기법이 사용되어 왔다. 지도 학습을 위해서는 개체명 범주가 수동으로 부착된 대용량의 학습 말뭉치가 필요하며, 대용량의 학습 말뭉치를 수동으로 구축하는 것은 시간과 인력이 많이 들어가는 일이다. 본 논문에서는 학습 말뭉치 구축비용을 최소화하면서 개체명 인식 성능을 빠르게 향상시키기 위한 준지도 학습 방법을 제안한다. 제안 방법은 초기 학습 말뭉치를 구축하기 위해 원거리 감독법을 사용한다. 그리고 배깅과 능동 학습을 결합한 앙상블 기법의 하나인 능동 배깅을 사용하여 초기 학습 말뭉치에 포함된 노이즈 문장을 효과적으로 제거한다. 실험 결과, 15회의 능동 배깅을 통해 개체명 인식 F1-점수를 67.36%에서 76.42%로 향상시켰다.

어말 위치 /ㅗ/의 /ㅜ/ 대체 현상에 대한 문법 항목별 출현빈도 연구 (Frequency of grammar items for Korean substitution of /u/ for /o/ in the word-final position)

  • 윤은경
    • 말소리와 음성과학
    • /
    • 제12권1호
    • /
    • pp.33-42
    • /
    • 2020
  • 본 논문은 구어 말뭉치를 기반으로 한국어 /ㅗ/가 /ㅜ/로 고모음화되는 현상(예, '별로' [별루])에 대해 문법 항목별로 차이를 살펴보는 데 연구 목적이 있다. 한국어의 /ㅗ/와 /ㅜ/는 [+원순성] 자질을 공유하지만, 혀 높이 차이로 변별된다. 그러나 최근 /ㅗ/와 /ㅜ/의 두 모음의 음성적 구분이 모호해지는 병합 현상이 진행 중이라고 여러 논문에서 보고되었다. 본 연구에서는 어말 위치의 /ㅗ/가 한국어 자연언어 구어 말뭉치(The Korean Corpus of Spontaneous Speech)에서 음성적으로 [o] 또는 [u]로 실현되는 현상을 연결어미, 조사, 부사, 체언의 문법 항목별로 출현빈도 및 출현비율에 대해 살펴보았다. 실험 결과 연결어미, 조사, 부사에서 /ㅗ/는 약 50%의 비율로 /ㅜ/로 대체되는 것을 확인했고, 체언에서만 상당히 낮은 비율인 5% 미만으로 대체가 되는 것을 알 수 있었다. 고빈도 형태소 중에서 가장 높은 /ㅜ/ 대체율을 보인 형태소는 '-도 [두]' (59.6%)였고, 연결어미에서는 '-고 [구]' (43.5%)로 나타났다. 구어 말뭉치를 통하여 실제 발음형과 표준발음의 차이를 살펴보았다는 데 연구 의의가 있다.

발음사전 표제어중의 음소의 통계적 성질-음성 DB용 단어선정을 위하여- (On the statistics of Korean Phonetic Dictionary - Basic Survey to make corpus of Korean Speech DB -)

  • 이용주;김경태;조철우;이태원
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1987년도 전기.전자공학 학술대회 논문집(II)
    • /
    • pp.1606-1609
    • /
    • 1987
  • Statistical information about spoken Korean was obtained. The data are the results of analyzing the Korean phonetic dictionary. This is one of the basic survey to make phoneme ballanced corpus of Korean Speech Data Base (KSDB).

  • PDF

Extracting Multiword Sentiment Expressions by Using a Domain-Specific Corpus and a Seed Lexicon

  • Lee, Kong-Joo;Kim, Jee-Eun;Yun, Bo-Hyun
    • ETRI Journal
    • /
    • 제35권5호
    • /
    • pp.838-848
    • /
    • 2013
  • This paper presents a novel approach to automatically generate Korean multiword sentiment expressions by using a seed sentiment lexicon and a large-scale domain-specific corpus. A multiword sentiment expression consists of a seed sentiment word and its contextual words occurring adjacent to the seed word. The multiword sentiment expressions that are the focus of our study have a different polarity from that of the seed sentiment word. The automatically extracted multiword sentiment expressions show that 1) the contextual words should be defined as a part of a multiword sentiment expression in addition to their corresponding seed sentiment word, 2) the identified multiword sentiment expressions contain various indicators for polarity shift that have rarely been recognized before, and 3) the newly recognized shifters contribute to assigning a more accurate polarity value. The empirical result shows that the proposed approach achieves improved performance of the sentiment analysis system that uses an automatically generated lexicon.