• Title/Summary/Keyword: Hindi

Search Result 16, Processing Time 0.03 seconds

Korean Speakers' Perception of Hindi Stop Consonants (한국인의 힌디어 폐쇄음 인식)

  • Ahn, Hyun-Kee
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.57-63
    • /
    • 2009
  • The two specific research questions pursued in this paper are: (i) how Korean speakers perceive Hindi stops in terms of the three laryngeal categories of Korean stops; (ii) how well Korean speakers do with an ABX perception test that utilizes a total of 52 Hindi minimal pairs where all sounds are identical except for the laryngeal features of a stop in each word. A total of 45 university students participated in this experiment. The results showed that (i) Koreans tended to perceive Hindi voiceless unaspirated stops as Korean fortis ones, voiceless aspirated stops as aspirated ones, voiced stops as lenis ones, and breathy stops as aspirated ones, and (ii) Koreans had difficulty in distinguishing between voiceless aspirated and breathy stops in Hindi.

  • PDF

Hindi Correspondence of Bengali Nominal Suffixes

  • Chatterji, Sanjay
    • Journal of Multimedia Information System
    • /
    • v.8 no.4
    • /
    • pp.221-232
    • /
    • 2021
  • One bottleneck of Bengali to Hindi transfer based machine translation system is the translation of suffixes of noun. The appropriate translation of a nominal suffix often depends on the semantic role of the corresponding noun chunk in the sentence. With the availability of a high performance Bengali morphological analyzer and a basic Bengali parser it is possible to identify the role of each noun chunk. This information may be used for building rules for translating the ambiguous nominal suffixes. As there are some similarities between the uses of Bengali and Hindi nominal suffixes we find that the rules may be identified by linguistically analyzing corpus data. In this paper, we identify rules for the ambiguous four Bengali nominal suffixes from corpus data and evaluate their performances. This set of rules is able to resolve a majority of the nominal suffix ambiguities in Bengali to Hindi transfer based machine translation system. Using the rules, we are able to translate 98.17% Bengali nouns correctly which is much better than the baseline ILMT system's accuracy of 62.8%.

Part-of-speech Tagging for Hindi Corpus in Poor Resource Scenario

  • Modi, Deepa;Nain, Neeta;Nehra, Maninder
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.147-154
    • /
    • 2018
  • Natural language processing (NLP) is an emerging research area in which we study how machines can be used to perceive and alter the text written in natural languages. We can perform different tasks on natural languages by analyzing them through various annotational tasks like parsing, chunking, part-of-speech tagging and lexical analysis etc. These annotational tasks depend on morphological structure of a particular natural language. The focus of this work is part-of-speech tagging (POS tagging) on Hindi language. Part-of-speech tagging also known as grammatical tagging is a process of assigning different grammatical categories to each word of a given text. These grammatical categories can be noun, verb, time, date, number etc. Hindi is the most widely used and official language of India. It is also among the top five most spoken languages of the world. For English and other languages, a diverse range of POS taggers are available, but these POS taggers can not be applied on the Hindi language as Hindi is one of the most morphologically rich language. Furthermore there is a significant difference between the morphological structures of these languages. Thus in this work, a POS tagger system is presented for the Hindi language. For Hindi POS tagging a hybrid approach is presented in this paper which combines "Probability-based and Rule-based" approaches. For known word tagging a Unigram model of probability class is used, whereas for tagging unknown words various lexical and contextual features are used. Various finite state machine automata are constructed for demonstrating different rules and then regular expressions are used to implement these rules. A tagset is also prepared for this task, which contains 29 standard part-of-speech tags. The tagset also includes two unique tags, i.e., date tag and time tag. These date and time tags support all possible formats. Regular expressions are used to implement all pattern based tags like time, date, number and special symbols. The aim of the presented approach is to increase the correctness of an automatic Hindi POS tagging while bounding the requirement of a large human-made corpus. This hybrid approach uses a probability-based model to increase automatic tagging and a rule-based model to bound the requirement of an already trained corpus. This approach is based on very small labeled training set (around 9,000 words) and yields 96.54% of best precision and 95.08% of average precision. The approach also yields best accuracy of 91.39% and an average accuracy of 88.15%.

Optical Character Recognition for Hindi Language Using a Neural-network Approach

  • Yadav, Divakar;Sanchez-Cuadrado, Sonia;Morato, Jorge
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.117-140
    • /
    • 2013
  • Hindi is the most widely spoken language in India, with more than 300 million speakers. As there is no separation between the characters of texts written in Hindi as there is in English, the Optical Character Recognition (OCR) systems developed for the Hindi language carry a very poor recognition rate. In this paper we propose an OCR for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which improves its efficiency. One of the major reasons for the poor recognition rate is error in character segmentation. The presence of touching characters in the scanned documents further complicates the segmentation process, creating a major problem when designing an effective character segmentation technique. Preprocessing, character segmentation, feature extraction, and finally, classification and recognition are the major steps which are followed by a general OCR. The preprocessing tasks considered in the paper are conversion of gray scaled images to binary images, image rectification, and segmentation of the document's textual contents into paragraphs, lines, words, and then at the level of basic symbols. The basic symbols, obtained as the fundamental unit from the segmentation process, are recognized by the neural classifier. In this work, three feature extraction techniques-: histogram of projection based on mean distance, histogram of projection based on pixel value, and vertical zero crossing, have been used to improve the rate of recognition. These feature extraction techniques are powerful enough to extract features of even distorted characters/symbols. For development of the neural classifier, a back-propagation neural network with two hidden layers is used. The classifier is trained and tested for printed Hindi texts. A performance of approximately 90% correct recognition rate is achieved.

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

  • Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.413-425
    • /
    • 2022
  • The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

Hindi version of short form of douleur neuropathique 4 (S-DN4) questionnaire for assessment of neuropathic pain component: a cross-cultural validation study

  • Gudala, Kapil;Ghai, Babita;Bansal, Dipika
    • The Korean Journal of Pain
    • /
    • v.30 no.3
    • /
    • pp.197-206
    • /
    • 2017
  • Background: Pain with neuropathic characteristics is generally more severe and associated with a lower quality of life compared to nociceptive pain (NcP). Short form of the Douleur Neuropathique en 4 Questions (S-DN4) is one of the most used and reliable screening questionnaires and is reported to have good diagnostic properties. This study was aimed to cross-culturally validate the Hindi version of the S-DN4 in patients with various chronic pain conditions. Methods: The S-DN4 is already translated into the Hindi language by Mapi Research Trust. This study assessed the psychometric properties of the Hindi version of the S-DN4 including internal consistency and test-retest reliability after 3 days' post-baseline assessment. Diagnostic performance was also assessed. Results: One hundred sixty patients with chronic pain, 80 each in the neuropathic pain (NeP) present and NeP absent groups, were recruited. Patients with NeP present reported significantly higher S-DN4 scores in comparison to patients in the NeP absent group (mean (SD), 4.7 (1.7) vs. 1.8 (1.6), P < 0.01). The S-DN4 was found to have an AUC of 0.88 with adequate internal consistency (Cronbach's ${\alpha}=0.80$) and a test-retest reliability (ICC = 0.92) with an optimal cut-off value of 3 (Youden's index = 0.66, sensitivity and specificity of 88.7% and 77.5%). The diagnostic concordance rate between clinician diagnosis and the S-DN4 questionnaire was 83.1% (kappa = 0.66). Conclusions: Overall, the Hindi version of the S-DN4 has good internal consistency and test-retest reliability along with good diagnostic accuracy.

An Artificial Intelligence Approach for Word Semantic Similarity Measure of Hindi Language

  • Younas, Farah;Nadir, Jumana;Usman, Muhammad;Khan, Muhammad Attique;Khan, Sajid Ali;Kadry, Seifedine;Nam, Yunyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.2049-2068
    • /
    • 2021
  • AI combined with NLP techniques has promoted the use of Virtual Assistants and have made people rely on them for many diverse uses. Conversational Agents are the most promising technique that assists computer users through their operation. An important challenge in developing Conversational Agents globally is transferring the groundbreaking expertise obtained in English to other languages. AI is making it possible to transfer this learning. There is a dire need to develop systems that understand secular languages. One such difficult language is Hindi, which is the fourth most spoken language in the world. Semantic similarity is an important part of Natural Language Processing, which involves applications such as ontology learning and information extraction, for developing conversational agents. Most of the research is concentrated on English and other European languages. This paper presents a Corpus-based word semantic similarity measure for Hindi. An experiment involving the translation of the English benchmark dataset to Hindi is performed, investigating the incorporation of the corpus, with human and machine similarity ratings. A significant correlation to the human intuition and the algorithm ratings has been calculated for analyzing the accuracy of the proposed similarity measures. The method can be adapted in various applications of word semantic similarity or module for any other language.

The Role of Contrast in Prosodically Induced Acoustic Variation

  • Choi, Han-Sook
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.29-37
    • /
    • 2009
  • This paper presents results from speech production experiments on English, Korean, and Hindi that compare variation in the acoustic expression of dissimilar phonological laryngeal contrast in stops conditioned by prosodic prominence. Target stops are analyzed from utterance-initial, -medial, and -final positions, with a variation in contrastive focal accent, from the speech data by six male American English speakers, five male Seoul Korean speakers, and five male Delhi Hindi speakers. The results show that prosodic prominence conditions enhanced distinctiveness between contrastive segments in the three languages. The manner in which prosodic prominence and prosodic phrase structure is marked at the level of segmental variation is, however, found to be language-specific to some extent. In addition, a correlation between the size of the phonological inventory and the corresponding acoustic variation was found but the linear correlation was not strongly supported with the findings in the present study.

  • PDF

A Text Processing Method for Devanagari Scripts in Andriod (안드로이드에서 힌디어 텍스트 처리 방법)

  • Kim, Jae-Hyeok;Maeng, Seung-Ryol
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.560-569
    • /
    • 2011
  • In this paper, we propose a text processing method for Hindi characters, Devanagari scripts, in the Android. The key points of the text processing are to device automata, which define the combining rules of alphabets into a set of syllables, and to implement a font rendering engine, which retrieves and displays the glyph images corresponding to specific characters. In general, an automaton depends on the type and the number of characters. For the soft-keyboard, we designed the automata with 14 consonants and 34 vowels based on Unicode. Finally, a combined syllable is converted into a glyph index using the mapping table, used as a handle to load its glyph image. According to the multi-lingual framework of Freetype font engine, Dvanagari scripts can be supported in the system level by appending the implementation of our method to the font engine as the Hindi module. The proposed method is verified through a simple message system.

The Interpreggtation of the Indian Stupa as Origin of Korean Pagoda (탑의 원조 인도 스투파의 형태 해석 - 인도 전역의 현장 답사를 바탕으로 -)

  • Lee, Hee-Bong
    • Journal of architectural history
    • /
    • v.18 no.6
    • /
    • pp.103-126
    • /
    • 2009
  • This study aims to discover historical trends and change of form of all stupas in India with observation of field study that is as direct as possible, by classifying, analyzing, and synthesizing the stupas. Study of Indian stupa in Korea has a number of shortcomings since only introductory partial approach has been made in order to seek the origin of Korean pagoda. This study also aims to correct errors of stupa terminology in Chinese character committed by misinterpretation of Hindi language which was established by precedent Japanese scholars several decades ago. Piled-up stupas were totally destroyed by pagans, therefore their remains tell us only of structure, material, sizeand disposition. However remains of carved stone at torana and drum give us clues as to the original form of stupa and worshipping activity, as well as change to a more luxurious form. Many rock cave stupas of India show us both simple forms matching the ascetic age of early Buddhism and luxurious changes in Mahayanan era introducing us to statues of Buddha. Indians recovered the spheric form of 'anda,' a Hindi term meaning cosmic egg, from the hemispheric form of the piled-up stupa. Therefore we might discard the erratic term of 'bokbal', which means an upset vessel. Railings and parasols became main factors of stupa design. Carved railings around stupa became a sign of divinity. Serious worshipping activity made drums long or high and created multi-embossed stripes. Bases of circular drums of some cave stupas changed their shapes to rectangular or octagonal. Single parasols became multiparasols of affluent flowerlike curved stems on carved stupa. Multistoried, elongated and high parasols of Gandhara stupas are closely related to such factors as diverse changes of form in Indian subcontinent. Four-sided torana gate and ayaka column of the circular form of original stupas suggest the rectangular form of subsequent East Asian pagoda, and higher and wider base of Indian stupas became the origin of East Asian rectangular pagoda.

  • PDF