• Title/Summary/Keyword: lexical approach

Search Result 75, Processing Time 0.025 seconds

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

  • Junseok Oh;Eunsoo Cho;Ji-Hwan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.6
    • /
    • pp.1692-1705
    • /
    • 2024
  • In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre-trained Korean End-to-end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause-level n-gram language model, transformed into a Weighted Finite-State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre-trained ASR model. This is particularly advantageous for Korean, characterized as a low-resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n-gram model at the clause-level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

Graph-based ISA/instanceOf Relation Extraction from Category Structure (그래프 구조를 이용한 카테고리 구조로부터 상하위 관계 추출)

  • Choi, Dong-Hyun;Choi, Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.6
    • /
    • pp.464-469
    • /
    • 2010
  • In this paper, we propose a method to extract isa/instanceOf relation from category structure. Existing researches use lexical patterns to get isa/instanceOf relation from the category structure, e.g. head word matching, to determine whether the given category link is isa/instanceOf relation or not. In this paper, we propose a new approach which analyzes other category links related to the given category link to determine whether the given category link is isa/instanceOf relation or not. The experimental result shows that our algorithm can cover many cases which the existing algorithms were not able to deal with.

A Natural Language Question Answering System-an Application for e-learning

  • Gupta, Akash;Rajaraman, Prof. V.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.285-291
    • /
    • 2001
  • This paper describes a natural language question answering system that can be used by students in getting as solution to their queries. Unlike AI question answering system that focus on the generation of new answers, the present system retrieves existing ones from question-answer files. Unlike information retrieval approaches that rely on a purely lexical metric of similarity between query and document, it uses a semantic knowledge base (WordNet) to improve its ability to match question. Paper describes the design and the current implementation of the system as an intelligent tutoring system. Main drawback of the existing tutoring systems is that the computer poses a question to the students and guides them in reaching the solution to the problem. In the present approach, a student asks any question related to the topic and gets a suitable reply. Based on his query, he can either get a direct answer to his question or a set of questions (to a maximum of 3 or 4) which bear the greatest resemblance to the user input. We further analyze-application fields for such kind of a system and discuss the scope for future research in this area.

  • PDF

Reconstitution of Meteorological Daily Logs in Choseon Dynasty and Analyzing Weather Records of the Annals of King Gojong (조선시대 일기류의 기상일지(氣象日誌)적 재구성과 고종일기의 기상기록 분석)

  • Kim, Il-Gwon
    • Atmosphere
    • /
    • v.25 no.3
    • /
    • pp.407-433
    • /
    • 2015
  • First half of my article focused on analyzing the current state of historical materials regarding weather and climate, and established a list of weather-related historical literature collection of Korea with which to make a lexical approach to the situations of all kinds of weather literature. It also put emphasis on gathering information and data of weather logs from journal-type historical records which were contained in 48 weather-related journals of Choseon period. The results of this research are expected to be useful for the activation of study in historical meteorology. The latter half of my research focused on analyzing various meteorological states of sunny, cloudy, rainy, snowy and frosty weather which were recorded in the official Annals of King Kojong (1864~1907). And it re-verified historical rainfall data of preceding researches of Wada Yuji (1917), Jung-Lim (1994), Jhun-Moon (1997). In result, different records were found between data of theirs and mine. It means that we have to analyze and reconstruct newly the meteorological data of the Annals of King Gojong and the Daily Records of Royal Sungjungwon (1623~1910) during the late Choseon period.

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

  • Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.413-425
    • /
    • 2022
  • The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

Development and Evaluation of a Document Summarization System using Features and a Text Component Identification Method (텍스트 구성요소 판별 기법과 자질을 이용한 문서 요약 시스템의 개발 및 평가)

  • Jang, Dong-Hyun;Myaeng, Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.678-689
    • /
    • 2000
  • This paper describes an automatic summarization approach that constructs a summary by extracting sentences that are likely to represent the main theme of a document. As a way of selecting summary sentences, the system uses a model that takes into account lexical and statistical information obtained from a document corpus. As such, the system consists of two parts: the training part and the summarization part. The former processes sentences that have been manually tagged for summary sentences and extracts necessary statistical information of various kinds, and the latter uses the information to calculate the likelihood that a given sentence is to be included in the summary. There are at least three unique aspects of this research. First of all, the system uses a text component identification model to categorize sentences into one of the text components. This allows us to eliminate parts of text that are not likely to contain summary sentences. Second, although our statistically-based model stems from an existing one developed for English texts, it applies the framework to individual features separately and computes the final score for each sentence by combining the pieces of evidence using the Dempster-Shafer combination rule. Third, not only were new features introduced but also all the features were tested for their effectiveness in the summarization framework.

  • PDF

A Bi-clausal Account of English 'to'-Modal Auxiliary Verbs

  • Hong, Sungshim
    • Language and Information
    • /
    • v.18 no.1
    • /
    • pp.33-52
    • /
    • 2014
  • This paper proposes a unified structural account of some instances of the English Modals and Semi-auxiliaries. The classification and the syntactic/structural description of the English Modal auxiliary verbs and verb-related elements have long been the center for many proposals in the history of generative syntax. According to van Gelderen (1993) and Lightfoot (2002), it was sometime around 1380 that the Tense-node (T) appeared in the phrasal structures of the English language, and the T-node is under which the English Modal auxiliaries occupy. Closely related is the existing evidence that English Modals were used as main verbs up to the early sixteenth century (Lightfoot 1991, Han 2000). This paper argues for a bi-clausal approach to English Modal auxiliaries with the infinitival particle 'to' such as 'ought to' 'used to' and 'dare (to)' 'need (to)', etc. and Semi-auxiliaries including 'be to' and 'have to'. More specifically, 'ought' in 'ought to' constructions, for instance, undergoes V-to-T movement within the matrix clause, just like 'HAVEAux' and all instances of 'BE', whereas 'to' occupies the T position of the embedded complement clause. By proposing the bi-clausal account, Radford's (2004, 2009) problems can be solved. Further, the historical motivation for the account takes a stance along with Norde (2009) and Brinton & Traugott (2005) in that Radford's (2004, 2009) syncretization of the two positions of the infinitival particle 'to' is no different from the 'boundary loss' in the process of Grammariticalization. This line of argument supports Krug's (2011), and in turn Bolinger's(1980) generalization on Auxiliaryhood, while providing a novel insight into Head movement of V-to-T in Present Day English.

  • PDF

An analysis on streetscape using the Model of Emotion Evaluation (가로경관에 대한 감성평가모형 적용 분석 연구)

  • Lee, Jin-Sook;Kim, Ji-Hye
    • Science of Emotion and Sensibility
    • /
    • v.16 no.2
    • /
    • pp.149-156
    • /
    • 2013
  • In this study, the Model of Emotion Evaluation, an emotional analysis actively applied in environmental assessment, was divided into two parts, the abbreviated model and the inferential model, through pilot study and experiment. In addition, an analysis was conducted through the experiment on the attributes of the evaluation vocabularies of two additional types of representative models, the EPA Model and PAD Model, and the results show a huge difference in the development approach and lexical constitution of the two models. It was also identified through factor analysis that the vocabularies were abbreviated according to the respective models. Similarity relationships were analyzed using multidimensional scaling and the results show that mutual relationship was established to some degree. Based on this, we can conclude that, rather than a biased use of the Model of Emotion Evaluation in emotion evaluation, a more objective image analysis is possible by analyzing the characteristics of the model before applying it. In this study, the evaluation target was confined only to the environmental assessment of streetscape and continuous research on the Model of Emotion Evaluation that allows for the comparison of evaluation models in various areas is needed.

  • PDF

Ontology Selection Ranking Model based on Semantic Similarity Approach (의미적 유사성에 기반한 온톨로지 선택 랭킹 모델)

  • Oh, Sun-Ju;Ahn, Joong-Ho;Park, Jin-Soo
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.2
    • /
    • pp.95-116
    • /
    • 2009
  • Ontologies have provided supports in integrating heterogeneous and distributed information. More and more ontologies and tools have been developed in various domains. However, building ontologies requires much time and effort. Therefore, ontologies need to be shared and reused among users. Specifically, finding the desired ontology from an ontology repository will benefit users. In the past, most of the studies on retrieving and ranking ontologies have mainly focused on lexical level supports. In those cases, it is impossible to find an ontology that includes concepts that users want to use at the semantic level. Most ontology libraries and ontology search engines have not provided semantic matching capability. Retrieving an ontology that users want to use requires a new ontology selection and ranking mechanism based on semantic similarity matching. We propose an ontology selection and ranking model consisting of selection criteria and metrics which are enhanced in semantic matching capabilities. The model we propose presents two novel features different from the previous research models. First, it enhances the ontology selection and ranking method practically and effectively by enabling semantic matching of taxonomy or relational linkage between concepts. Second, it identifies what measures should be used to rank ontologies in the given context and what weight should be assigned to each selection measure.

  • PDF

Linking Korean Predicates to Knowledge Base Properties (한국어 서술어와 지식베이스 프로퍼티 연결)

  • Won, Yousung;Woo, Jongseong;Kim, Jiseong;Hahm, YoungGyun;Choi, Key-Sun
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1568-1574
    • /
    • 2015
  • Relation extraction plays a role in for the process of transforming a sentence into a form of knowledge base. In this paper, we focus on predicates in a sentence and aim to identify the relevant knowledge base properties required to elucidate the relationship between entities, which enables a computer to understand the meaning of a sentence more clearly. Distant Supervision is a well-known approach for relation extraction, and it performs lexicalization tasks for knowledge base properties by generating a large amount of labeled data automatically. In other words, the predicate in a sentence will be linked or mapped to the possible properties which are defined by some ontologies in the knowledge base. This lexical and ontological linking of information provides us with a way of generating structured information and a basis for enrichment of the knowledge base.