• 제목/요약/키워드: corpora

검색결과 249건 처리시간 0.026초

An Example-Based Engligh Learing Environment for Writing

  • Miyoshi, Yasuo;Ochi, Youji;Okamoto, Ryo;Yano, Yoneo
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2001년도 The Pacific Aisan Confrence On Intelligent Systems 2001
    • /
    • pp.292-297
    • /
    • 2001
  • In writing learning as a second/foreign language, a learner has to acquire not only lexical and syntactical knowledge but also the skills to choose suitable words for content which s/he is interested in. A learning system should extrapolate learner\\`s intention and give example phrases that concern with the content in order to support this on the system. However, a learner cannot always represent a content of his/her desired phrase as inputs to the system. Therefore, the system should be equipped with a diagnosis function for learner\\`s intention. Additionally, a system also should be equipped with an analysis function to score similarity between learner\\`s intention and phrases which is stored in the system on both syntactic and idiomatic level in order to present appropriate example phrases to a learner. In this paper, we propose architecture of an interactive support method for English writing learning which is based an analogical search technique of sample phrases from corpora. Our system can show a candidate of variation/next phrases to write and an analogous sentence that a learner wants to represents from corpora.

  • PDF

자동차 환경에서의 노이즈 DB 및 한국어 음성 DB 구축 (Creation and Assessment of Korean Speech and Noise DB in Car Environments)

  • 이광현;김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제48호
    • /
    • pp.141-153
    • /
    • 2003
  • Researches into robust recognition in noise environments, especially in car environments, are being carried out actively in speech community. In this paper we will report on three types of corpora that SiTEC (Speech Information TEchnology & industry promotion Center) has created for research into speech recognition in car noise environments. The first is the recordings of 900 Korean native speakers, distributed according to gender, age, and region, who uttered application words in car environments. The second is the collections of mixed noise in 3 car types by model while setting up various noise patterns which can be obtained with the car engine on or off, at different driving speed, and in different road conditions with windows open or closed. The third is the recordings of simulated speech by HATS (Head and Torso Simulator) in car environments with the internal and external noise factors added. These three types of recordings were all made through synchronized 8 channel microphones that are fixed in a car. The creation and applications of these corpora will be reported on in detail.

  • PDF

OBSERVATIONS ON FERTILITY PARAMETERS FOLLOWING SUPEROVULATION IN JERSEY CATTLE

  • Ullah, N.;Javed, M.H.;Akhtar, S.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제8권4호
    • /
    • pp.321-323
    • /
    • 1995
  • Observations were recorded regarding various fertility parameters on 26 Jersey donor cows following superovulation under tropical conditions. These cows, in their mid-luteal phase were treated with 2,500-3,000 i.u. PMSG or 28-40 mg FSH followed by $500{\mu}g$ $PGF_{2{\alpha}}$ injection 48-60 hours later, to induce oestrus. The cows were bred artificially twelve hours following standing oestrus. Embryo collection was carried out 7 days after oestrus. $PGF_{2{\alpha}}$ was injected to each donor cow after embryo recovery to regress the corpora lutea. Fertility data($PGF_{2{\alpha}}$-Oestrus interval, services per conception, days between embryo collection and successful service and any pathololgical condition) were recorded. $PGF_{2{\alpha}}$-Oestrus interval and correlation (r) between number of corpora lutea and $PGF_{2{\alpha}}$-Oestrus interval were $30.9{\pm}6.3$ and 0.17, respectively. Of 26 treated donors, 19 conceived within a period of $91.7{\pm}18.8$ days after embryo recovery. Average services per conception were $2.3{\pm}0.3$. Only two cows developed metritis which conceived after treatment with antibiotics. These observations indicated no profound adverse effect of superovulation on subsequent reproduction of donor cows, except some effect on services per conception, under tropical conditions.

Genetical and Physiological Mechanisms of Adult Diapause in Insects

  • Kim, Yong-Gyun
    • 한국응용곤충학회지
    • /
    • 제34권1호
    • /
    • pp.20-32
    • /
    • 1995
  • Adult diapause in insects is characterized by suppression of reproductive development. It is induced by environmental cues such as photoperiod, temperature, food availability, and other conditions Diapause-inducing environment is recognized and analyzed by the brain of the insects. The interpreted information is conveyed via endocrine system to target tissues such as ovaries, fat body, and other tissues. From this signal hierarchy of a brain-endocrine-target tissue axis, several factors are involved to express a diapause trait in a quantitative mode, even though the insects show a binomial phenotye between being in diapause or not. Recent works estimated that the number of the factors is relatively small by a series of crossing trials between high and low diapause lines. Heritability of the diapause is quite high (ca. 70%) in some species. Epistasis, sex-linkage, pleiotropism, and other nongenetic components also affect diapause inheritance. Most physiological studies have been focused on control mechanisms of the juvenile hormone (JH) synthesis in corpora allata (CA) because JH level in hemolymph of teneral adults is critical to decide a later developmental mode. Allatostatin, an antagonizer of JH synthesis, has been believed to be a potent brain message to CA for adult diapause induction.

  • PDF

Understanding recurrent neural network for texts using English-Korean corpora

  • Lee, Hagyeong;Song, Jongwoo
    • Communications for Statistical Applications and Methods
    • /
    • 제27권3호
    • /
    • pp.313-326
    • /
    • 2020
  • Deep Learning is the most important key to the development of Artificial Intelligence (AI). There are several distinguishable architectures of neural networks such as MLP, CNN, and RNN. Among them, we try to understand one of the main architectures called Recurrent Neural Network (RNN) that differs from other networks in handling sequential data, including time series and texts. As one of the main tasks recently in Natural Language Processing (NLP), we consider Neural Machine Translation (NMT) using RNNs. We also summarize fundamental structures of the recurrent networks, and some topics of representing natural words to reasonable numeric vectors. We organize topics to understand estimation procedures from representing input source sequences to predict target translated sequences. In addition, we apply multiple translation models with Gated Recurrent Unites (GRUs) in Keras on English-Korean sentences that contain about 26,000 pairwise sequences in total from two different corpora, colloquialism and news. We verified some crucial factors that influence the quality of training. We found that loss decreases with more recurrent dimensions and using bidirectional RNN in the encoder when dealing with short sequences. We also computed BLEU scores which are the main measures of the translation performance, and compared them with the score from Google Translate using the same test sentences. We sum up some difficulties when training a proper translation model as well as dealing with Korean language. The use of Keras in Python for overall tasks from processing raw texts to evaluating the translation model also allows us to include some useful functions and vocabulary libraries as well.

한우에서 생식기질환의 치료 및 조기임신진단을 위한 초음파영상진단 (Ultrasonographic Diagnosis for the Treatment of Genital Disease and Early Pregnancy Diagnosis in Korean Native Cattle)

  • 황광남;김명철;변홍섭;박명호;이경광;한용만;신상태
    • 한국가축번식학회지
    • /
    • 제21권1호
    • /
    • pp.31-37
    • /
    • 1997
  • Ultrasonographic diagnosis of genital disease and early pregnancy diagnosis was performed in Korean native cattle. The size of ovarian follicle in preovulation, luteal stage and follicular cyst was 18.9, 9.2 and 27.6 mm, respectively, and the thickness of follicular wall was 2.3, 1.8 and 2.8 mm, respectively. The size of corpus luteums in formation stage, activity stage, regression stage, cystic corpora lutea and luteal cyst was 6.2, 11.3, 8.6, 26.7 and 25.9 mm, respectively. The thickness of luteal wall in cystic corpora lutea and luteal cyst was 8.4 and 4.9 mm, respectively. The size of embryo or fetus on day 25, 27, 30, 35, 40, 45 and 50 was 0.8, 0.9, 1.3, 1.5, 2.2, 2.8 and 3.8 cm, respectively. The size of amniotic vesicle on day 25, 27 and 30 was 1.2, 2.1 and 3,0 cm, respectively. The diameter of pregnant uterus on day 25 and 27 was 7.0 and 7.8 cm, respectively. It was concluded that the ultrasonographci values determined in this study can be used as references for the treatment of genital disease and early pregnancy diagnosis in Korean native cattle.

  • PDF

한국어 품사 부착 말뭉치의 오류 검출 및 수정 (Detecting and correcting errors in Korean POS-tagged corpora)

  • 최명길;서형원;권홍석;김재훈
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제37권2호
    • /
    • pp.227-235
    • /
    • 2013
  • 품사 부착 말뭉치의 품질은 품사 부착기를 개발하는데 있어서 매우 중요한 역할을 수행한다. 그러나 세종 말뭉치를 비롯하여 한국에서 구축된 많은 품사 부착 말뭉치들은 여전히 다양한 형태의 오류를 포함하고 있다. 이런 오류들을 살펴보면 품사 부착 오류는 물론이고 철자 오류, 문자의 삽입 및 삭제 등 매우 다양하다. 본 논문에서는 오류 패턴을 이용하여 품사 부착 오류를 검출하고 이를 효과적으로 수정하는 도구를 개발한다. 제안된 방법과 도구를 이용해서 오류를 수정할 경우 평균 9배 이상 빠르게 오류를 수정할 수 있어서 이 방법이 매우 효과적인 방법임을 확인할 수 있었다.

꼬꼬마 : 관계형 데이터베이스를 활용한 세종 말뭉치 활용 도구 (KKMA : A Tool for Utilizing Sejong Corpus based on Relational Database)

  • 이동주;연종홈;황인범;이상구
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제16권11호
    • /
    • pp.1046-1050
    • /
    • 2010
  • 말뭉치는 언어학 분야에서 다양한 연구를 위한 기초자료로서 활용된다. 국내에서도 세종 21세기 계획 등을 통해서 몇몇 대용량 말뭉치가 구축되었으나, 다수의 사용자가 쉽게 활용할 수 있는 활용 도구에 대한 연구는 여전히 부족하다. 본 논문에서는 한국어 대용량 말뭉치 중 하나인 세종 현대 국어 말뭉치를 관계형 데이터베이스에 저장하여, 다양한 방법으로 활용할 수 있도록 지원하는 말뭉치 활용 도구에 대한 설계 및 구현 방법을 보인다. 웹 기반의 말뭉치 활용 시스템을 구축하였고, 실제로 언어학 연구자들에게 사용되고 있다.

COVID-19 recommender system based on an annotated multilingual corpus

  • Barros, Marcia;Ruas, Pedro;Sousa, Diana;Bangash, Ali Haider;Couto, Francisco M.
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.24.1-24.7
    • /
    • 2021
  • Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Formulaic Language Development in Asian Learners of English: A Comparative Study of Phrase-frames in Written and Oral Production

  • Yoon Namkung;Ute Romer
    • 아시아태평양코퍼스연구
    • /
    • 제4권2호
    • /
    • pp.1-39
    • /
    • 2023
  • Recent research in usage-based Second Language Acquisition has provided new insights into second language (L2) learners' development of formulaic language (Wulff, 2019). The current study examines the use of phrase-frames, which are recurring sequences of words including one or more variable slots (e.g., it is * that), in written and oral production data from Asian learners of English across four proficiency levels (beginner, low-intermediate, high-intermediate, advanced) and native English speakers. The variability, predictability, and discourse functions of the most frequent 4-word phrase-frames from the written essay and spoken dialogue sub-corpora of the International Corpus Network of Asian Learners of English (ICNALE) were analyzed and then compared across groups and modes. The results revealed that while learners' phrase-frames in writing became more variable and unpredictable as proficiency increased, no clear developmental patterns were found in speaking, although all groups used more fixed and predictable phrase-frames than the reference group. Further, no developmental trajectories in the functions of the most frequent phrase-frames were found in both modes. Additionally, lower-level learners and the reference group used more variable phrase-frames in speaking, whereas advanced-level learners showed more variability in writing. This study contributes to a better understanding of the development of L2 phraseological competence.