• Title/Summary/Keyword: language data

Search Result 3,807, Processing Time 0.031 seconds

$O^{2}LDM$ : A Language for Object-Oriented Logic Data Modeling ($O^{2}LDM$ : 객체지향 논리 데이터모형을 위한 언어)

  • Jeong, Cheol-Yong
    • Asia pacific journal of information systems
    • /
    • v.4 no.2
    • /
    • pp.3-34
    • /
    • 1994
  • In this paper we describe a new data modeling language we call $O^{2}LDM$. $O^{2}LDM$ incorporates features from object-oriented and logic approaches. In $O^{2}LDM$ there is a rich collection of objects organized in a type hierarchy. It is possible to compose queries that involve field selection, function application and other constructs which transcend the usual, strictly syntactic, matching of PROLOG. We give the features of $O^{2}LDM$ and motivate its utility for conceptual modeling. We have a prototype implementation for the language, which we have written in ML. In this paper we describe an executable semantics of the deductive process used in the language. We work some examples to illustrate the expressive power of the language, and compare $O^{2}LDM$ to PROLOG.

  • PDF

A Role of English Children's Stories in Primary School English Learners' Language Development

  • Kim, Ji-Sun
    • English Language & Literature Teaching
    • /
    • v.15 no.3
    • /
    • pp.129-150
    • /
    • 2009
  • This paper attempts to examine the effect of children's English stories on the development of Korean EFL primary school learners' listening and speaking competences and their motivation to learn English. This paper also discusses factors of English children's stories that make EFL learners' language learning efficient. Participants were 120 primary school students who attend one of the elementary schools in Chungnam province. They were randomly chosen and divided into two groups: experimental and control groups. In order to collect data, students' listening and speaking proficiency pre- and post-tests and the pre- and post-questionnaires regarding the participants' motivation to learn English were administered. The data were analyzed by ANOVA. The results indicate that the application of English children's stories to EFL learning settings can be an efficient way to improve EFL learners' listening and speaking competences and motivation to learn their target language. The findings of this study suggest that English children's stories provide language learners with interest, meaningful and authentic contexts and enjoyment. The pedagogical suggestion and implications are provided for EFL educators and teachers.

  • PDF

Using Corpora for Studying English Grammar

  • Kwon, Heok-Seung
    • Korean Journal of English Language and Linguistics
    • /
    • v.4 no.1
    • /
    • pp.61-81
    • /
    • 2004
  • This paper will look at some grammatical phenomena which will illustrate some of the questions that can be addressed with a corpus-based approach. We will use this approach to investigate the following subjects in English grammar: number ambiguity, subject-verb concord, concord with measure expressions, and (reflexive) pronoun choice in coordinated noun phrases. We will emphasize the distinctive features of the corpus-based approach, particularly its strengths in investigating language use, as opposed to traditional descriptions or prescriptions of structure in English grammar. This paper will show that a corpus-based approach has made it possible to conduct new kinds of investigations into grammar in use and to expand the scope of earlier investigations. Native speakers rarely have accurate information about frequency of use. A large representative corpus (i.e., The British National Corpus) is one of the most reliable sources of frequency information. It is important to base an analysis of language on real data rather than intuition. Any description of grammar is more complete and accurate if it is based on a body of real data.

  • PDF

Style-Specific Language Model Adaptation using TF*IDF Similarity for Korean Conversational Speech Recognition

  • Park, Young-Hee;Chung, Min-Hwa
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2E
    • /
    • pp.51-55
    • /
    • 2004
  • In this paper, we propose a style-specific language model adaptation scheme using n-gram based tf*idf similarity for Korean spontaneous speech recognition. Korean spontaneous speech shows especially different style-specific characteristics such as filled pauses, word omission, and contraction, which are related to function words and depend on preceding or following words. To reflect these style-specific characteristics and overcome insufficient data for training language model, we estimate in-domain dependent n-gram model by relevance weighting of out-of-domain text data according to their n-. gram based tf*idf similarity, in which in-domain language model include disfluency model. Recognition results show that n-gram based tf*idf similarity weighting effectively reflects style difference.

A Study of Methodology for Automatic Construction of OWL Ontologies from Sejong Electronic Dictionary (대용량 OWL 온톨로지 자동구축을 위한 세종전자사전 활용 방법론 연구)

  • Song Do Gyu
    • Language and Information
    • /
    • v.9 no.1
    • /
    • pp.19-34
    • /
    • 2005
  • Ontology is an indispensable component in intelligent and semantic processing of knowledge and information, such as in semantic web. However, ontology construction requires vast amount of data collection and arduous efforts in processing these un-structured data. This study proposed a methodology to automatically construct and generate ontologies from Sejong Electronic Dictionary. As Sejong Electronic Dictionary is structured in XML format, it can be processed automatically by computer programmed tools into an OWL(Web Ontology Language)-based ontologies as specified in W3C . This paper presents the process and concrete application of this methodology.

  • PDF

Building a Korean conversational speech database in the emergency medical domain (응급의료 영역 한국어 음성대화 데이터베이스 구축)

  • Kim, Sunhee;Lee, Jooyoung;Choi, Seo Gyeong;Ji, Seunghun;Kang, Jeemin;Kim, Jongin;Kim, Dohee;Kim, Boryong;Cho, Eungi;Kim, Hojeong;Jang, Jeongmin;Kim, Jun Hyung;Ku, Bon Hyeok;Park, Hyung-Min;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.81-90
    • /
    • 2020
  • This paper describes a method of building Korean conversational speech data in the emergency medical domain and proposes an annotation method for the collected data in order to improve speech recognition performance. To suggest future research directions, baseline speech recognition experiments were conducted by using partial data that were collected and annotated. All voices were recorded at 16-bit resolution at 16 kHz sampling rate. A total of 166 conversations were collected, amounting to 8 hours and 35 minutes. Various information was manually transcribed such as orthography, pronunciation, dialect, noise, and medical information using Praat. Baseline speech recognition experiments were used to depict problems related to speech recognition in the emergency medical domain. The Korean conversational speech data presented in this paper are first-stage data in the emergency medical domain and are expected to be used as training data for developing conversational systems for emergency medical applications.

A Formalism of Iverson Language as a Reference Language for the Organization of Homogeneous Field using many Extermal Media (IVERSON언어의 참고어로의 구성(외부매체의 동질화를 위하여))

  • Yung Taek Kim
    • 전기의세계
    • /
    • v.24 no.4
    • /
    • pp.71-72
    • /
    • 1975
  • A formalism of reference language and homogeneous field is constructed for Iverson Language to organize a virtual file among external media. To execute some data manipulations among these external media files must be organized homogeneously forming multi-dimensional array like single media. This paper shows some organization of reference language to build the virtual file using many external media and some examples of program and hardware organization is presented for the justification of proposals.

  • PDF

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

  • Junseok Oh;Eunsoo Cho;Ji-Hwan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.6
    • /
    • pp.1692-1705
    • /
    • 2024
  • In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre-trained Korean End-to-end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause-level n-gram language model, transformed into a Weighted Finite-State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre-trained ASR model. This is particularly advantageous for Korean, characterized as a low-resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n-gram model at the clause-level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

A Simple Syntax for Complex Semantics

  • Lee, Kiyong
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.2-27
    • /
    • 2002
  • As pact of a long-ranged project that aims at establishing database-theoretic semantics as a model of computational semantics, this presentation focuses on the development of a syntactic component for processing strings of words or sentences to construct semantic data structures. For design arid modeling purposes, the present treatment will be restricted to the analysis of some problematic constructions of Korean involving semi-free word order, conjunction arid temporal anchoring, and adnominal modification and antecedent binding. The present work heavily relies on Hausser's (1999, 2000) SLIM theory for language that is based on surface compositionality, time-linearity arid two other conditions on natural language processing. Time-linear syntax for natural language has been shown to be conceptually simple and computationally efficient. The associated semantics is complex, however, because it must deal with situated language involving interactive multi-agents. Nevertheless, by processing input word strings in a time-linear mode, the syntax cart incrementally construct the necessary semantic structures for relevant queries and valid inferences. The fragment of Korean syntax will be implemented in Malaga, a C-type implementation language that was enriched for both programming and debugging purposes arid that was particluarly made suitable for implementing in Left-Associative Grammar. This presentation will show how the system of syntactic rules with constraining subrules processes Korean sentences in a step-by-step time-linear manner to incrementally construct semantic data structures that mainly specify relations with their argument, temporal, and binding structures.

  • PDF

A Study on Finger Language Translation System using Machine Learning and Leap Motion (머신러닝과 립 모션을 활용한 지화 번역 시스템 구현에 관한 연구)

  • Son, Da Eun;Go, Hyeong Min;Shin, Haeng yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.552-554
    • /
    • 2019
  • Deaf mutism (a hearing-impaired person and speech disorders) communicates using sign language. There are difficulties in communicating by voice. However, sign language can only be limited in communicating with people who know sign language because everyone doesn't use sign language when they communicate. In this paper, a finger language translation system is proposed and implemented as a means for the disabled and the non-disabled to communicate without difficulty. The proposed algorithm recognizes the finger language data by leap motion and self-learns the data using machine learning technology to increase recognition rate. We show performance improvement from the simulation results.