• Title/Summary/Keyword: Multilingual

Search Result 173, Processing Time 0.026 seconds

An Arabic Script Recognition System

  • Alginahi, Yasser M.;Mudassar, Mohammed;Nomani Kabir, Muhammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.9
    • /
    • pp.3701-3720
    • /
    • 2015
  • A system for the recognition of machine printed Arabic script is proposed. The Arabic script is shared by three languages i.e., Arabic, Urdu and Farsi. The three languages have a descent amount of vocabulary in common, thus compounding the problems for identification. Therefore, in an ideal scenario not only the script has to be differentiated from other scripts but also the language of the script has to be recognized. The recognition process involves the segregation of Arabic scripted documents from Latin, Han and other scripted documents using horizontal and vertical projection profiles, and the identification of the language. Identification mainly involves extracting connected components, which are subjected to Principle Component Analysis (PCA) transformation for extracting uncorrelated features. Later the traditional K-Nearest Neighbours (KNN) algorithm is used for recognition. Experiments were carried out by varying the number of principal components and connected components to be extracted per document to find a combination of both that would give the optimal accuracy. An accuracy of 100% is achieved for connected components >=18 and Principal components equals to 15. This proposed system would play a vital role in automatic archiving of multilingual documents and the selection of the appropriate Arabic script in multi lingual Optical Character Recognition (OCR) systems.

The Correlation between Library Users' Fields of Study and the Use of Translated Works in University Libraries (대학도서관에서 대출된 번역서와 대출자 전공과의 관계 연구)

  • Lee Hyun-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.155-167
    • /
    • 1998
  • In the climate of increasing calls for academic assessment, the author undertook a study to ascertain availability of original texts and their translations in the academic library. The object of this study is to compare the use frequency of original texts by academic major of users in the university libraries. To achieve this object, the author collected the data stored in 3 Korean university library online systems from September 10th to 30th, 1995 and tested the hypotheses by using the Minitab statistical package. Libraries with multilingual collection and automated systems will find the methodology Presented here Particularly valuable.

  • PDF

Stroke based Multilingual Input System for Embedded System (임베디드 시스템에서 필획기반 다국어 입력 시스템)

  • Lee, Jin-Yeong;Hong, Sung-Ryrong;Lee, Si-Jin
    • Journal of Internet Computing and Services
    • /
    • v.8 no.6
    • /
    • pp.145-153
    • /
    • 2007
  • Recently, development in information technology is mainly focused on mobile service, and most of mobile users are using various services based on wireless network. So the importance of system software or middleware, which enables such mobile services, is growing bigger and bigger, and one of those is character input/output system. This paper will introduce an alphabet input system, which decomposes a character to a series of strokes, by its formation principal. It is designed to make a person, who knows the character, to input characters in the way that he/she is actually writing down the character.

  • PDF

Summarizing the Differences in Chinese-Vietnamese Bilingual News

  • Wu, Jinjuan;Yu, Zhengtao;Liu, Shulong;Zhang, Yafei;Gao, Shengxiang
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1365-1377
    • /
    • 2019
  • Summarizing the differences in Chinese-Vietnamese bilingual news plays an important supporting role in the comparative analysis of news views between China and Vietnam. Aiming at cross-language problems in the analysis of the differences between Chinese and Vietnamese bilingual news, we propose a new method of summarizing the differences based on an undirected graph model. The method extracts elements to represent the sentences, and builds a bridge between different languages based on Wikipedia's multilingual concept description page. Firstly, we calculate the similarity between Chinese and Vietnamese news sentences, and filter the bilingual sentences accordingly. Then we use the filtered sentences as nodes and the similarity grade as the weight of the edge to construct an undirected graph model. Finally, combining the random walk algorithm, the weight of the node is calculated according to the weight of the edge, and sentences with highest weight can be extracted as the difference summary. The experiment results show that our proposed approach achieved the highest score of 0.1837 on the annotated test set, which outperforms the state-of-the-art summarization models.

Component Analysis for Constructing an Emotion Ontology (감정 온톨로지의 구축을 위한 구성요소 분석)

  • Yoon, Aesun;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.19-24
    • /
    • 2009
  • 의사소통에서 대화자 간 감정의 이해는 메시지의 내용만큼이나 중요하다. 비언어적 요소에 의해 감정에 관한 더 많은 정보가 전달되고 있기는 하지만, 텍스트에도 화자의 감정을 나타내는 언어적 표지가 다양하고 풍부하게 녹아 들어 있다. 본 연구의 목적은 인간언어공학에 활용할 수 있는 감정 온톨로지를 설계하는 데 있다. 텍스트 기반 감정 처리 분야의 선행 연구가 감정을 분류하고, 각 감정의 서술적 어휘 목록을 작성하고, 이를 텍스트에서 검색함으로써, 추출된 감정의 정확도가 높지 않았다. 이에 비해, 본 연구에서 제안하는 감정 온톨로지는 다음과 같은 장점을 갖는다. 첫째, 감정 표현의 범주를 기술 대상(언어적 vs. 비언어적)과 방식(표현적, 서술적, 도상적)으로 분류하고, 이질적 특성을 갖는 6개 범주 간 상호 대응관계를 설정함으로써, 멀티모달 환경에 적용할 수 있다. 둘째, 세분화된 감정을 분류할 수 있되, 감정 간 차별성을 가질 수 있도록 24개의 감정 명세를 선별하고, 더 섬세하게 감정을 분류할 수 있는 속성으로 강도와 극성을 설정하였다. 셋째, 텍스트에 나타난 감정 표현을 명시적으로 구분할 수 있도록, 경험자 기술 대상과 방식 언어적 자질에 관한 속성을 도입하였다. 이때 본 연구에서 제안하는 감정 온톨로지가 한국어 처리에 국한되지 않고, 다국어 처리에 활용할 수 있도록 확장성을 고려했다.

  • PDF

Relations of multilingual's L1, L2, L3 lexical processing and cerebral activation areas in fMRI (fMRI에 반영된 다중언어화자의 L1, L2, L3 어휘 정보처리 특성과 대뇌 활성화 영역의 관련성)

  • Nam Kichun;Lee Donghoon;Oh Hyun-Gum;Ryu Jaeook
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.313-316
    • /
    • 2002
  • 본 연구에서는 기능적 자기공명 영상법(functional magnetic resonance imaging)을 이용하여, 한국어, 일어, 프랑스어, 영어 등 여러 언어를 구사할 수 있는 다중언어화자들을 대상으로 각 언어에 따른 대뇌 언어처리 과정을 알아보고, 그 처리과정이 해당언어의 유창성, 습득시기에 따라 어떻게 달라지는지를 알아보았다. 실험 결과, 언어처리에 있어 핵심적인 역할을 하는 것으로 보고되는 Broca 영역은 언어의 이해와 산출 과정에 모두 관계된 것으로 보이며, 언어의 산출과정에는 언어의 이해과정에 관계되는 영역외에 조음과정에 따른 영역의 활성화가 보고되었다. 또한 언어습득시기와 유창성에 따른 각 언어의 활성화를 살펴보면, 유창성이 높을수록 대뇌 활성화는 줄어들며, 유창성이 낮은 언어조건에서는 언어처리 영역의 활성화 수준이 높아지며 또한 우반구 및 전전두회(prefrontal gyrus)의 활성화가 높아지는 것이 보인다.

  • PDF

Subject Approach to Information Retrieval with Special Reference to Bengali Documents: A Critical Study

  • Halder, Sambhu Nath
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.10 no.3
    • /
    • pp.51-68
    • /
    • 2020
  • The library provides its services to satisfy the user's approach. Naturally, the development of library services may determine by considering the satisfaction of users. It traces users' perceptions regarding subject access highlighting problems in the retrieval of Bengali documents by subject. This study has designed to assess users' attitudes towards the retrievals of Bengali documents in OPAC through subject headings. For a collection of data, a representative sample has drawn from a large and heterogeneous population consisting of users in university libraries of West Bengal using a stratified sampling technique. Subsequently, under each of the universities, users' community was stratified into students, research scholars, and faculty members. Under each stratum, the sample selected on a random basis. The users met personally to collect relevant data, while they came to the library and went on to search OPAC. A structured schedule, prepared for the purpose, was presented before library users and consequently, interviews and interpretations recorded systematically. In this manner, several factors have identified concerning subject searching and retrieval performance for Bengali documents. This study explores the access using subject headings in multilingual information retrieval systems. Moreover, the suitability of subject headings for retrieval of Bengali resources has ascertained from the users' point of view. The findings demand standard principles and rules for the construction of Bengali subject headings to maintain uniformity and consistency.

Taboo Word Matching System Using a Common Multilingual Phoneme System (다국어 공통 음소 체계를 이용한 금기어 매칭 시스템)

  • Kim, Da-Hee;Shin, Sa-Im;Jang, Dal-Won;Lee, Jong-Seol;Jang, Sei-Jin
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.155-158
    • /
    • 2015
  • 단어의 유사도 측정 알고리즘은 DB 인덱싱, 필터링, 소스코드 분석 소프트웨어, 음성 인식 등 다양한 분야에서 활용되고 있다. 하지만 기존의 단어의 유사도만 비교하는 시스템에는 발음이 비슷한 유사단어나 오타가 있는 유사단어들은 측정을 못하는 단점이 있다. 언어의 유사도 측정에서는 알파벳만으로 볼게 아니라 언어 발음의 발화적 특성 또한 고려되어야 한다. 본 논문에서는 글로벌 시장에서의 다국적 기업들의 제품이나 문화 수출 등의 도움이 되는 각 나라의 금기어와의 발화적 특성까지 고려한 단어 유사도를 측정 할 수 있는 시스템을 제안한다. 11개국의 4개 언어 총 21487개의 금기어 단어를 금기어 데이터로 사용하였다. 제안하는 방법의 성능을 평가하기 위하여 타 알고리즘과의 성능비교와 여러 나라의 다양한 언어의 사용자들로부터 사용자 평가를 수행하였고 제안하는 방법이 발음 유사도를 측정하지 않는 알고리즘보다 우수한 성능을 보임을 확인하였다.

  • PDF

The Dilemma of Language in Education Policies in Ghana and Tanzania

  • Dzahene-Quarshie, Josephine;Moshi, Lioba
    • Cross-Cultural Studies
    • /
    • v.36
    • /
    • pp.149-173
    • /
    • 2014
  • This paper examines language policies of Ghana and Tanzania (former British Colonies) since independence. The view that language use in education is a problem for African countries is evident in the ever changing language in education policies in many African countries. Because of the inevitable multilingual situation in many African countries, there are unavoidable challenges in their quest to adopt a language policy that works for the entire country since it is not practical to adopt all the languages spoken in the country as Media of Instruction. Ghana is not immune to this challenge and has fallen victim to this tendency to change the language in education policy from time to time in an attempt to adopt a satisfactory policy which would yield the intended results. Tanzania, however, is one of the few African countries that have found a sustainable language in education policy since independence. Nonetheless, it has its fair share of challenges as a consequence of the perceived competition between Kiswahili and English as official languages. The paper discusses the challenges that both Ghana and Tanzania face against the background of colonization. The paper also offers a discussion on possible future perspectives for the two countries.

An Empirical Research on Current Status and Developmental Countermeasures of Language Services Industry in China

  • Tong, Ying;Zhang, Mengze;Wu, Chanti;Bae, Ki-Hyung
    • International Journal of Contents
    • /
    • v.15 no.2
    • /
    • pp.44-52
    • /
    • 2019
  • This paper uses Delphi Method and statistics provided by the State Administration of Market Regulation of China, attempts to develop the scale index of language service industry in China. Coupled with practical investigation and theoretical framework of SCP paradigm, a deep analysis on market behavior, market structure, and market performance of the industry have been explored. First, the study indicates that the scale index of Chinese language services industry has experienced an upward trend from 10.481million RMB in 2008 to 351.403 million RMB in 2017. Second, majority of language services enterprises are situated in Chinese coastal provinces and there are variations in demand for language services. Third, the standardization of language services are minimal while most of the talents involve a single discipline background. Fourth, most enterprises utilize language tools while there is lack of technological innovation for utilization of language resources and enhancement of service quality. The author mainly suggests four strategies which include: cultivating multilingual services, strengthening the development of industrial informatization and technical innovation, and optimizing the industrial talents structure while also adjusting for industrial distribution and regional coordination, which propose the meaningful implications for the development of language service industry in China.