• Title/Summary/Keyword: corpora

Search Result 249, Processing Time 0.024 seconds

An Example-Based Engligh Learing Environment for Writing

  • Miyoshi, Yasuo;Ochi, Youji;Okamoto, Ryo;Yano, Yoneo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.292-297
    • /
    • 2001
  • In writing learning as a second/foreign language, a learner has to acquire not only lexical and syntactical knowledge but also the skills to choose suitable words for content which s/he is interested in. A learning system should extrapolate learner\\`s intention and give example phrases that concern with the content in order to support this on the system. However, a learner cannot always represent a content of his/her desired phrase as inputs to the system. Therefore, the system should be equipped with a diagnosis function for learner\\`s intention. Additionally, a system also should be equipped with an analysis function to score similarity between learner\\`s intention and phrases which is stored in the system on both syntactic and idiomatic level in order to present appropriate example phrases to a learner. In this paper, we propose architecture of an interactive support method for English writing learning which is based an analogical search technique of sample phrases from corpora. Our system can show a candidate of variation/next phrases to write and an analogous sentence that a learner wants to represents from corpora.

  • PDF

Creation and Assessment of Korean Speech and Noise DB in Car Environments (자동차 환경에서의 노이즈 DB 및 한국어 음성 DB 구축)

  • Lee Kwang-Hyun;Kim Bong-Wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.48
    • /
    • pp.141-153
    • /
    • 2003
  • Researches into robust recognition in noise environments, especially in car environments, are being carried out actively in speech community. In this paper we will report on three types of corpora that SiTEC (Speech Information TEchnology & industry promotion Center) has created for research into speech recognition in car noise environments. The first is the recordings of 900 Korean native speakers, distributed according to gender, age, and region, who uttered application words in car environments. The second is the collections of mixed noise in 3 car types by model while setting up various noise patterns which can be obtained with the car engine on or off, at different driving speed, and in different road conditions with windows open or closed. The third is the recordings of simulated speech by HATS (Head and Torso Simulator) in car environments with the internal and external noise factors added. These three types of recordings were all made through synchronized 8 channel microphones that are fixed in a car. The creation and applications of these corpora will be reported on in detail.

  • PDF

OBSERVATIONS ON FERTILITY PARAMETERS FOLLOWING SUPEROVULATION IN JERSEY CATTLE

  • Ullah, N.;Javed, M.H.;Akhtar, S.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.8 no.4
    • /
    • pp.321-323
    • /
    • 1995
  • Observations were recorded regarding various fertility parameters on 26 Jersey donor cows following superovulation under tropical conditions. These cows, in their mid-luteal phase were treated with 2,500-3,000 i.u. PMSG or 28-40 mg FSH followed by $500{\mu}g$ $PGF_{2{\alpha}}$ injection 48-60 hours later, to induce oestrus. The cows were bred artificially twelve hours following standing oestrus. Embryo collection was carried out 7 days after oestrus. $PGF_{2{\alpha}}$ was injected to each donor cow after embryo recovery to regress the corpora lutea. Fertility data($PGF_{2{\alpha}}$-Oestrus interval, services per conception, days between embryo collection and successful service and any pathololgical condition) were recorded. $PGF_{2{\alpha}}$-Oestrus interval and correlation (r) between number of corpora lutea and $PGF_{2{\alpha}}$-Oestrus interval were $30.9{\pm}6.3$ and 0.17, respectively. Of 26 treated donors, 19 conceived within a period of $91.7{\pm}18.8$ days after embryo recovery. Average services per conception were $2.3{\pm}0.3$. Only two cows developed metritis which conceived after treatment with antibiotics. These observations indicated no profound adverse effect of superovulation on subsequent reproduction of donor cows, except some effect on services per conception, under tropical conditions.

Genetical and Physiological Mechanisms of Adult Diapause in Insects

  • Kim, Yong-Gyun
    • Korean journal of applied entomology
    • /
    • v.34 no.1
    • /
    • pp.20-32
    • /
    • 1995
  • Adult diapause in insects is characterized by suppression of reproductive development. It is induced by environmental cues such as photoperiod, temperature, food availability, and other conditions Diapause-inducing environment is recognized and analyzed by the brain of the insects. The interpreted information is conveyed via endocrine system to target tissues such as ovaries, fat body, and other tissues. From this signal hierarchy of a brain-endocrine-target tissue axis, several factors are involved to express a diapause trait in a quantitative mode, even though the insects show a binomial phenotye between being in diapause or not. Recent works estimated that the number of the factors is relatively small by a series of crossing trials between high and low diapause lines. Heritability of the diapause is quite high (ca. 70%) in some species. Epistasis, sex-linkage, pleiotropism, and other nongenetic components also affect diapause inheritance. Most physiological studies have been focused on control mechanisms of the juvenile hormone (JH) synthesis in corpora allata (CA) because JH level in hemolymph of teneral adults is critical to decide a later developmental mode. Allatostatin, an antagonizer of JH synthesis, has been believed to be a potent brain message to CA for adult diapause induction.

  • PDF

Understanding recurrent neural network for texts using English-Korean corpora

  • Lee, Hagyeong;Song, Jongwoo
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.3
    • /
    • pp.313-326
    • /
    • 2020
  • Deep Learning is the most important key to the development of Artificial Intelligence (AI). There are several distinguishable architectures of neural networks such as MLP, CNN, and RNN. Among them, we try to understand one of the main architectures called Recurrent Neural Network (RNN) that differs from other networks in handling sequential data, including time series and texts. As one of the main tasks recently in Natural Language Processing (NLP), we consider Neural Machine Translation (NMT) using RNNs. We also summarize fundamental structures of the recurrent networks, and some topics of representing natural words to reasonable numeric vectors. We organize topics to understand estimation procedures from representing input source sequences to predict target translated sequences. In addition, we apply multiple translation models with Gated Recurrent Unites (GRUs) in Keras on English-Korean sentences that contain about 26,000 pairwise sequences in total from two different corpora, colloquialism and news. We verified some crucial factors that influence the quality of training. We found that loss decreases with more recurrent dimensions and using bidirectional RNN in the encoder when dealing with short sequences. We also computed BLEU scores which are the main measures of the translation performance, and compared them with the score from Google Translate using the same test sentences. We sum up some difficulties when training a proper translation model as well as dealing with Korean language. The use of Keras in Python for overall tasks from processing raw texts to evaluating the translation model also allows us to include some useful functions and vocabulary libraries as well.

Ultrasonographic Diagnosis for the Treatment of Genital Disease and Early Pregnancy Diagnosis in Korean Native Cattle (한우에서 생식기질환의 치료 및 조기임신진단을 위한 초음파영상진단)

  • 황광남;김명철;변홍섭;박명호;이경광;한용만;신상태
    • Korean Journal of Animal Reproduction
    • /
    • v.21 no.1
    • /
    • pp.31-37
    • /
    • 1997
  • Ultrasonographic diagnosis of genital disease and early pregnancy diagnosis was performed in Korean native cattle. The size of ovarian follicle in preovulation, luteal stage and follicular cyst was 18.9, 9.2 and 27.6 mm, respectively, and the thickness of follicular wall was 2.3, 1.8 and 2.8 mm, respectively. The size of corpus luteums in formation stage, activity stage, regression stage, cystic corpora lutea and luteal cyst was 6.2, 11.3, 8.6, 26.7 and 25.9 mm, respectively. The thickness of luteal wall in cystic corpora lutea and luteal cyst was 8.4 and 4.9 mm, respectively. The size of embryo or fetus on day 25, 27, 30, 35, 40, 45 and 50 was 0.8, 0.9, 1.3, 1.5, 2.2, 2.8 and 3.8 cm, respectively. The size of amniotic vesicle on day 25, 27 and 30 was 1.2, 2.1 and 3,0 cm, respectively. The diameter of pregnant uterus on day 25 and 27 was 7.0 and 7.8 cm, respectively. It was concluded that the ultrasonographci values determined in this study can be used as references for the treatment of genital disease and early pregnancy diagnosis in Korean native cattle.

  • PDF

Detecting and correcting errors in Korean POS-tagged corpora (한국어 품사 부착 말뭉치의 오류 검출 및 수정)

  • Choi, Myung-Gil;Seo, Hyung-Won;Kwon, Hong-Seok;Kim, Jae-Hoon
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.37 no.2
    • /
    • pp.227-235
    • /
    • 2013
  • The quality of the part-of-speech (POS) annotation in a corpus plays an important role in developing POS taggers. There, however, are several kinds of errors in Korean POS-tagged corpora like Sejong Corpus. Such errors are likely to be various like annotation errors, spelling errors, insertion and/or deletion of unexpected characters. In this paper, we propose a method for detecting annotation errors using error patterns, and also develop a tool for effectively correcting them. Overall, based on the proposed method, we have hand-corrected annotation errors in Sejong POS Tagged Corpus using the developed tool. As the result, it is faster at least 9 times when compared without using any tools. Therefore we have observed that the proposed method is effective for correcting annotation errors in POS-tagged corpus.

KKMA : A Tool for Utilizing Sejong Corpus based on Relational Database (꼬꼬마 : 관계형 데이터베이스를 활용한 세종 말뭉치 활용 도구)

  • Lee, Dong-Joo;Yeon, Jong-Heum;Hwang, In-Beom;Lee, Sang-Goo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.11
    • /
    • pp.1046-1050
    • /
    • 2010
  • Corpus is widely used as a fundamental resource for various purposes in linguistic studies. There are several large corpora such as Sejong corpus in Korea. However, it is hard to find a tool utilizing such large corpora. In this paper, we propose a method of utilizing Sejong corpus based on the relational database. We designed the relational database scheme to store corpus and implemented a Web-based application so that many researchers can easily access and utilize the Sejong corpus.

COVID-19 recommender system based on an annotated multilingual corpus

  • Barros, Marcia;Ruas, Pedro;Sousa, Diana;Bangash, Ali Haider;Couto, Francisco M.
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.24.1-24.7
    • /
    • 2021
  • Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Formulaic Language Development in Asian Learners of English: A Comparative Study of Phrase-frames in Written and Oral Production

  • Yoon Namkung;Ute Romer
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.2
    • /
    • pp.1-39
    • /
    • 2023
  • Recent research in usage-based Second Language Acquisition has provided new insights into second language (L2) learners' development of formulaic language (Wulff, 2019). The current study examines the use of phrase-frames, which are recurring sequences of words including one or more variable slots (e.g., it is * that), in written and oral production data from Asian learners of English across four proficiency levels (beginner, low-intermediate, high-intermediate, advanced) and native English speakers. The variability, predictability, and discourse functions of the most frequent 4-word phrase-frames from the written essay and spoken dialogue sub-corpora of the International Corpus Network of Asian Learners of English (ICNALE) were analyzed and then compared across groups and modes. The results revealed that while learners' phrase-frames in writing became more variable and unpredictable as proficiency increased, no clear developmental patterns were found in speaking, although all groups used more fixed and predictable phrase-frames than the reference group. Further, no developmental trajectories in the functions of the most frequent phrase-frames were found in both modes. Additionally, lower-level learners and the reference group used more variable phrase-frames in speaking, whereas advanced-level learners showed more variability in writing. This study contributes to a better understanding of the development of L2 phraseological competence.