• Title/Summary/Keyword: Multilingual Systems

Search Result 47, Processing Time 0.029 seconds

A Design and Implementation of the Multilingual RDD Registry (다중언어 RDD 레지스트리의 설계 및 구현)

  • 정상원;오원근;윤기송
    • Journal of Broadcast Engineering
    • /
    • v.8 no.4
    • /
    • pp.381-391
    • /
    • 2003
  • This paper deals nth the Multilingual Registry for the Rights Data Dictionary (RDD), which will be used for the semantic representation of rights on digital contents in MPEG-21 framework. The translation of RDD terms owing to different language populations often lacks the desirable precision. The purpose of this paper Is to demonstrate the Multilingual RDD Registry concept to achieve a more precise and interoperable translation of RDD terms among different DRM systems.

Building and Analysis of Semantic Network on S&T Multilingual Terminology (과학기술 전문용어의 다국어 의미망 생성과 분석)

  • Jeong, Do-Heon;Choi, Hee-Yoon
    • Journal of Information Management
    • /
    • v.37 no.4
    • /
    • pp.25-47
    • /
    • 2006
  • A terminology system capable of providing interpretations and classification information on a multilingual science and technology(S&T) terminology is essential to establish an integrated search environment for multilingual S&T information systems. This paper aims to build a base system to manage an integrated information system for multilingual S&T terminology search. It introduces a method to build a search system for S&T terminologies internally linked through the multilingual semantic network and a search technique on the multiple linked nodes. In order to provide a foundation for further analysis researches, it also attempts to suggest a basic approach to interpret terminology clusters generated with those two search methods.

Combination of Classifiers Decisions for Multilingual Speaker Identification

  • Nagaraja, B.G.;Jayanna, H.S.
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.928-940
    • /
    • 2017
  • State-of-the-art speaker recognition systems may work better for the English language. However, if the same system is used for recognizing those who speak different languages, the systems may yield a poor performance. In this work, the decisions of a Gaussian mixture model-universal background model (GMM-UBM) and a learning vector quantization (LVQ) are combined to improve the recognition performance of a multilingual speaker identification system. The difference between these classifiers is in their modeling techniques. The former one is based on probabilistic approach and the latter one is based on the fine-tuning of neurons. Since the approaches are different, each modeling technique identifies different sets of speakers for the same database set. Therefore, the decisions of the classifiers may be used to improve the performance. In this study, multitaper mel-frequency cepstral coefficients (MFCCs) are used as the features and the monolingual and cross-lingual speaker identification studies are conducted using NIST-2003 and our own database. The experimental results show that the combined system improves the performance by nearly 10% compared with that of the individual classifier.

Subject Searching Using Controlled Vocabulary Versus Uncontrolled Vocaburary in Online Catalog System: Focusing on Multilingual Environment

  • Choi, Hee-Yoon
    • Journal of Information Management
    • /
    • v.26 no.2
    • /
    • pp.61-79
    • /
    • 1995
  • The purpose of this paper is to investigate search efficiency of controlled vocabulary versus uncontrolled vocabulary subject access in online catalog systems. The question of the effectiveness of controlled versus uncontrolled vocabulary in information retrieval has been raised in many literatures. A debate continues in the Library and Information Science Professions over the relative merit, appropriateness, and efficiency of uncontrolled vocabulary subject access in online catalog systems. Actually users used to combine uncontrolled vocabulary subject searching with controlled vocabulary subject searching. But the success of user's subject search depends on his choice of search terms. Also the technical developments that facilitate cooperation among information services in general make it increasingly possible for such cooperation to take place on an international level. In this study, several common types of vocabularies on online catalog systems are described and compared, especially usages of vocabularies in multilingual environment are analyzed.

  • PDF

Multilingual Product Retrieval Agent through Semantic Web and Semantic Networks (Semantic Web과 Semantic Network을 활용한 다국어 상품검색 에이전트)

  • Moon Yoo-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.1-13
    • /
    • 2004
  • This paper presents a method for the multilingual product retrieval agent through XML and the semantic networks in e-commerce. Retrieval for products is an important process, since it represents interfaces of the customer contact to the e-commerce. Keyword-based retrieval is efficient as long as the product information is structured and organized. But when the product information is expressed across many online shopping malls, especially when it is expressed in different languages with cultural backgrounds, buyers' product retrieval needs language translation with ambiguities resolved in a specific context. This paper presents a RDF modeling case that resolves semantic problems in the representation of product information and across the boundaries of language domains. With adoption of UNSPSC code system, this paper designs and implements an architecture for the multilingual product retrieval agents. The architecture is based on the central repository model of product catalog management with distributed updating processes. It also includes the perspectives of buyers and suppliers. And the consistency and version management of product information are controlled by UNSPSC code system. The multilingual product names are resolved by semantic networks, thesaurus and ontology dictionary for product names.

  • PDF

COVID-19 recommender system based on an annotated multilingual corpus

  • Barros, Marcia;Ruas, Pedro;Sousa, Diana;Bangash, Ali Haider;Couto, Francisco M.
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.24.1-24.7
    • /
    • 2021
  • Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Computer Codes for Korean Sounds: K-SAMPA

  • Kim, Jong-mi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4E
    • /
    • pp.3-16
    • /
    • 2001
  • An ASCII encoding of Korean has been developed for extended phonetic transcription of the Speech Assessment Methods Phonetic Alphabet (SAMPA). SAMPA is a machine-readable phonetic alphabet used for multilingual computing. It has been developed since 1987 and extended to more than twenty languages. The motivating factor for creating Korean SAMPA (K-SAMPA) is to label Korean speech for a multilingual corpus or to transcribe native language (Ll) interfered pronunciation of a second language learner for bilingual education. Korean SAMPA represents each Korean allophone with a particular SAMPA symbol. Sounds that closely resemble it are represented by the same symbol, regardless of the language they are uttered in. Each of its symbols represents a speech sound that is spectrally and temporally so distinct as to be perceptually different when the components are heard in isolation. Each type of sound has a separate IPA-like designation. Korean SAMPA is superior to other transcription systems with similar objectives. It describes better the cross-linguistic sound quality of Korean than the official Romanization system, proclaimed by the Korean government in July 2000, because it uses an internationally shared phonetic alphabet. It is also phonetically more accurate than the official Romanization in that it dispenses with orthographic adjustments. It is also more convenient for computing than the International Phonetic Alphabet (IPA) because it consists of the symbols on a standard keyboard. This paper demonstrates how the Korean SAMPA can express allophonic details and prosodic features by adopting the transcription conventions of the extended SAMPA (X-SAMPA) and the prosodic SAMPA(SAMPROSA).

  • PDF

Syntactic Structured Framework for Resolving Reflexive Anaphora in Urdu Discourse Using Multilingual NLP

  • Nasir, Jamal A.;Din, Zia Ud.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1409-1425
    • /
    • 2021
  • In wide-ranging information society, fast and easy access to information in language of one's choice is indispensable, which may be provided by using various multilingual Natural Language Processing (NLP) applications. Natural language text contains references among different language elements, called anaphoric links. Resolving anaphoric links is a key problem in NLP. Anaphora resolution is an essential part of NLP applications. Anaphoric links need to be properly interpreted for clear understanding of natural languages. For this purpose, a mechanism is desirable for the identification and resolution of these naturally occurring anaphoric links. In this paper, a framework based on Hobbs syntactic approach and a system developed by Lappin & Leass is proposed for resolution of reflexive anaphoric links, present in Urdu text documents. Generally, anaphora resolution process takes three main steps: identification of the anaphor, location of the candidate antecedent(s) and selection of the appropriate antecedent. The proposed framework is based on exploring the syntactic structure of reflexive anaphors to find out various features for constructing heuristic rules to develop an algorithm for resolving these anaphoric references. System takes Urdu text containing reflexive anaphors as input, and outputs Urdu text with resolved reflexive anaphoric links. Despite having scarcity of Urdu resources, our results are encouraging. The proposed framework can be utilized in multilingual NLP (m-NLP) applications.