• Title/Summary/Keyword: word representation

Search Result 165, Processing Time 0.02 seconds

Improvement of MLLR Speaker Adaptation Algorithm to Reduce Over-adaptation Using ICA and PCA (과적응 감소를 위한 주성분 분석 및 독립성분 분석을 이용한 MLLR 화자적응 알고리즘 개선)

  • 김지운;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.539-544
    • /
    • 2003
  • This paper describes how to reduce the effect of an occupation threshold by that the transform of mixture components of HMM parameters is controlled in hierarchical tree structure to prevent from over-adaptation. To reduce correlations between data elements and to remove elements with less variance, we employ PCA (Principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible, and decline the effect of over-adaptation. When we set lower occupation threshold and increase the number of transformation function, ordinary MLLR adaptation algorithm represents lower recognition rate than SI models, whereas the proposed MLLR adaptation algorithm represents the improvement of over 2% for the word recognition rate as compared to performance of SI models.

Phonetic Realization of Aspiration of Stops in English /Cr/ and /sCr/ Clusters and their Syllable Structure at the Phonetic Level: a Comparison between Two Speaker Groups (영어의 /Cr/과 /sCr/ 자음군 내 폐쇄음의 기식성 실현과 음성 단위의 음절구조: 두 화자집단 간 비교)

  • Sohn, Hyang-Sook
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.121-130
    • /
    • 2014
  • This study investigates the acoustic property of aspiration realized in English voiceless stops of /Cr/ and /sCr/ clusters. VOT is measured from stops in these clusters produced by two groups; one from native speakers of English and the other from Korean native speakers. Aspiration of stops in different types of clusters is compared to various phonological factors such as location of stress, syllable type, and position in word. Pursuing the idea that phonetic realization is correlated with phonological representation, attempts are made to account for the gradient nature of aspiration of stops on the basis of syllable structure at the phonetic level, which may vary in the wake of resyllabification. Voiceless stops in /Cr/ and /sCr/ clusters are further compared to results obtained in the previous study on /sC/ cluster. Variations in aspiration are also characterized in terms of segmental precedence relation of stops in the clusters, namely, post-[s], pre-[r], or both.

A Korean Normative Study of 213 Pictures (한국판 그림자극의 규준연구)

  • 박미자;박태진
    • Korean Journal of Cognitive Science
    • /
    • v.11 no.3_4
    • /
    • pp.57-72
    • /
    • 2000
  • A Korean standardized set of pictures has been called for as more and more studies utilized picture stimuli among memory and representation research. This article presents a Korean standardized set of pictures for studies probing the cognitive mechanisms that underlie picture and word processing or studies that simply utilize pictures stimuli. This norm provides 213 pictures, data on several variables such as name agreement, appropriateness of pictures. and familiarity. Previous data on such variables as frequency. category. and frequency within a category have been integrated 1 into this norm. Limitation, usage. and application of this set are discussed in terms of 1 implicit and explicit memory, and those variables mentioned previously.

  • PDF

Morphological Analysis of the Korean Language (한국어의 형태소해석)

  • Lee, Soo-Hyon;Ozawa, S.;Lee, Joo-Keun
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.4
    • /
    • pp.53-61
    • /
    • 1989
  • A morphological analysis is described to extract the informations which are required in syntactic and semantic analysis of the Korean language. The noun and particle are separated in a noun phrase, the selecting conditions are specified to analyze the compound noun and a restoring rule is represented to process the irregular compound noun. The stem and ending are separated in normal verbals and a logical representive form is proposed to the anomalously inflected word and contracted vowels. The logical representation is composed of the attribute value an analyzing rule. The redundancy of noun is reduced in the dictionary as the verb of a "Nounformed HA-" is processed by "noun" and "HA-", separately and a predicative "IDA" is analyzed by Q parameter. The processing form of negation is also derived and the morpheme and basic structure of compound predicative parts are presented.

  • PDF

PC-KIMMO-based Description of Mongolian Morphology

  • Jaimai, Purev;Zundui, Tsolmon;Chagnaa, Altangerel;Ock, Cheol-Young
    • Journal of Information Processing Systems
    • /
    • v.1 no.1 s.1
    • /
    • pp.41-48
    • /
    • 2005
  • This paper presents the development of a morphological processor for the Mongolian language, based on the two-level morphological model which was introduced by Koskenniemi. The aim of the study is to provide Mongolian syntactic parsers with more effective information on word structure of Mongolian words. First hand written rules that are the core of this model are compiled into finite-state transducers by a rule tool. Output of the compiler was edited to clarity by hand whenever necessary. The rules file and lexicon presented in the paper describe the morphology of Mongolian nouns, adjectives and verbs. Although the rules illustrated are not sufficient for accounting all the processes of Mongolian lexical phonology, other necessary rules can be easily added when new words are supplemented to the lexicon file. The theoretical consideration of the paper is concluded in representation of the morphological phenomena of Mongolian by the general, language-independent framework of the two-level morphological model.

Improving methods for normalizing biomedical text entities with concepts from an ontology with (almost) no training data at BLAH5 the CONTES

  • Ferre, Arnaud;Ba, Mouhamadou;Bossy, Robert
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.20.1-20.5
    • /
    • 2019
  • Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.

Survey of Automatic Query Expansion for Arabic Text Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.4
    • /
    • pp.67-86
    • /
    • 2020
  • Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.

Aspect-Based Sentiment Analysis with Position Embedding Interactive Attention Network

  • Xiang, Yan;Zhang, Jiqun;Zhang, Zhoubin;Yu, Zhengtao;Xian, Yantuan
    • Journal of Information Processing Systems
    • /
    • v.18 no.5
    • /
    • pp.614-627
    • /
    • 2022
  • Aspect-based sentiment analysis is to discover the sentiment polarity towards an aspect from user-generated natural language. So far, most of the methods only use the implicit position information of the aspect in the context, instead of directly utilizing the position relationship between the aspect and the sentiment terms. In fact, neighboring words of the aspect terms should be given more attention than other words in the context. This paper studies the influence of different position embedding methods on the sentimental polarities of given aspects, and proposes a position embedding interactive attention network based on a long short-term memory network. Firstly, it uses the position information of the context simultaneously in the input layer and the attention layer. Secondly, it mines the importance of different context words for the aspect with the interactive attention mechanism. Finally, it generates a valid representation of the aspect and the context for sentiment classification. The model which has been posed was evaluated on the datasets of the Semantic Evaluation 2014. Compared with other baseline models, the accuracy of our model increases by about 2% on the restaurant dataset and 1% on the laptop dataset.

Burmese Sentiment Analysis Based on Transfer Learning

  • Mao, Cunli;Man, Zhibo;Yu, Zhengtao;Wu, Xia;Liang, Haoyuan
    • Journal of Information Processing Systems
    • /
    • v.18 no.4
    • /
    • pp.535-548
    • /
    • 2022
  • Using a rich resource language to classify sentiments in a language with few resources is a popular subject of research in natural language processing. Burmese is a low-resource language. In light of the scarcity of labeled training data for sentiment classification in Burmese, in this study, we propose a method of transfer learning for sentiment analysis of a language that uses the feature transfer technique on sentiments in English. This method generates a cross-language word-embedding representation of Burmese vocabulary to map Burmese text to the semantic space of English text. A model to classify sentiments in English is then pre-trained using a convolutional neural network and an attention mechanism, where the network shares the model for sentiment analysis of English. The parameters of the network layer are used to learn the cross-language features of the sentiments, which are then transferred to the model to classify sentiments in Burmese. Finally, the model was tuned using the labeled Burmese data. The results of the experiments show that the proposed method can significantly improve the classification of sentiments in Burmese compared to a model trained using only a Burmese corpus.

A Study on Semantic Logic Platform of multimedia Sign Language Content (멀티미디어 수화 콘텐츠의 Semantic Logic 플랫폼 연구)

  • Jung, Hoe-Jun;Park, Dea-Woo;Han, Kyung-Don
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.199-206
    • /
    • 2009
  • The development of broadband multimedia content, a deaf sign language sign language is being used in education. Most of the content used in sign language training for Hangul word representation of sign language is sign language videos for the show. For the first time to learn sign language, sign language users are unfamiliar with the sign language characteristics difficult to understand, difficult to express the sign is displayed. In this paper, online, learning sign language to express the sign with reference to the attributes, Semantic Logic applying the sign language of multimedia content model for video-based platform is designed to study.