• Title/Summary/Keyword: Script Identification

Search Result 12, Processing Time 0.024 seconds

An Arabic Script Recognition System

  • Alginahi, Yasser M.;Mudassar, Mohammed;Nomani Kabir, Muhammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.9
    • /
    • pp.3701-3720
    • /
    • 2015
  • A system for the recognition of machine printed Arabic script is proposed. The Arabic script is shared by three languages i.e., Arabic, Urdu and Farsi. The three languages have a descent amount of vocabulary in common, thus compounding the problems for identification. Therefore, in an ideal scenario not only the script has to be differentiated from other scripts but also the language of the script has to be recognized. The recognition process involves the segregation of Arabic scripted documents from Latin, Han and other scripted documents using horizontal and vertical projection profiles, and the identification of the language. Identification mainly involves extracting connected components, which are subjected to Principle Component Analysis (PCA) transformation for extracting uncorrelated features. Later the traditional K-Nearest Neighbours (KNN) algorithm is used for recognition. Experiments were carried out by varying the number of principal components and connected components to be extracted per document to find a combination of both that would give the optimal accuracy. An accuracy of 100% is achieved for connected components >=18 and Principal components equals to 15. This proposed system would play a vital role in automatic archiving of multilingual documents and the selection of the appropriate Arabic script in multi lingual Optical Character Recognition (OCR) systems.

Fuzzy-Membership Based Writer Identification from Handwritten Devnagari Script

  • Kumar, Rajiv;Ravulakollu, Kiran Kumar;Bhat, Rajesh
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.893-913
    • /
    • 2017
  • The handwriting based person identification systems use their designer's perceived structural properties of handwriting as features. In this paper, we present a system that uses those structural properties as features that graphologists and expert handwriting analyzers use for determining the writer's personality traits and for making other assessments. The advantage of these features is that their definition is based on sound historical knowledge (i.e., the knowledge discovered by graphologists, psychiatrists, forensic experts, and experts of other domains in analyzing the relationships between handwritten stroke characteristics and the phenomena that imbeds individuality in stroke). Hence, each stroke characteristic reflects a personality trait. We have measured the effectiveness of these features on a subset of handwritten Devnagari and Latin script datasets from the Center for Pattern Analysis and Recognition (CPAR-2012), which were written by 100 people where each person wrote three samples of the Devnagari and Latin text that we have designed for our experiments. The experiment yielded 100% correct identification on the training set. However, we observed an 88% and 89% correct identification rate when we experimented with 200 training samples and 100 test samples on handwritten Devnagari and Latin text. By introducing the majority voting based rejection criteria, the identification accuracy increased to 97% on both script sets.

Principle and Algorithm of Cloth Covering and Application to Script Identification (천 커버링의 원리와 알고리즘 그리고 언어 식별에 응용)

  • Kim, Min-Woo;Oh, Il-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.3
    • /
    • pp.67-76
    • /
    • 2012
  • This paper proposes a concept and algorithm of cloth covering. It is a physically-based model which simulates computationally a shape of cloth covering some objects. The goal of cloth covering is to conceal the details of object and to reveal only the shape outline. It has one scale parameter which controls the degree of suppressing fine-scale structures. To show viability of the proposed cloth covering, this paper performed an experiment of script recognition. The results of comparing accuracies of feature extraction using Gaussian and cloth covering showed that the cloth covering is superior to Gaussian. We discuss the reason for the superiority.

ArcView와 Avenue$^{TM}$ Language를 활용한 수문지질도 도식 표현 기법 개발

  • 김규범;조민조;이장룡
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2000.11a
    • /
    • pp.31-35
    • /
    • 2000
  • We investigate the groundwater distribution and chemical characteristics for 3 or 5 districts every year and make the hydrogeologic map on a scale of 1:50,000. We draw the hydrogeologic digital map based on "The Handbook for the Drawing and Management of Hydrogeologic Map" which was published by MOCT and KOWACO in 1998. But, the Stiff diagram and well's notation are difficult to be presented in the digital map using the commercial Arcview GIS tools. So we develop the script file with Avenue language to represent them in Arcview GIS tool. At first, we design the database for the chemical analysis result of groundwater and well identification, and make the program code with Avenue language to display them on the digital map. And next we test the usefulness of the program code. As a result, we find that the script file is very useful for drawing the symbols and diagrams in hydrogeologic digital map using ArcView GIS.

  • PDF

A Methodology for Urdu Word Segmentation using Ligature and Word Probabilities

  • Khan, Yunus;Nagar, Chetan;Kaushal, Devendra S.
    • International Journal of Ocean System Engineering
    • /
    • v.2 no.1
    • /
    • pp.24-31
    • /
    • 2012
  • This paper introduce a technique for Word segmentation for the handwritten recognition of Urdu script. Word segmentation or word tokenization is a primary technique for understanding the sentences written in Urdu language. Several techniques are available for word segmentation in other languages but not much work has been done for word segmentation of Urdu Optical Character Recognition (OCR) System. A method is proposed for word segmentation in this paper. It finds the boundaries of words in a sequence of ligatures using probabilistic formulas, by utilizing the knowledge of collocation of ligatures and words in the corpus. The word identification rate using this technique is 97.10% with 66.63% unknown words identification rate.

A Study on Considerations in the Authority Control to Accommodate LRM Nomen (LRM 노멘을 수용하기 위한 전거제어시 고려사항에 관한 연구)

  • Lee, Mihwa
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.1
    • /
    • pp.109-128
    • /
    • 2021
  • This paper is to explore considerations in authority control to accommodate LRM nomen entities through the literature reviews, the analysis of RDA rules, and the opinion survey of domestic catalog experts. As a result, for authority control, considerations were proposed in the aspect of nomen's attribute elements, catalog description, and MARC authority format. First, it is necessary to describe in as much detail as possible the category, the scheme, intended audience, the context of use, the reference source, the language, the script, the script conversion as the attributes of the nomen with the status of identification, note, and indifferentiated name indicators added in RDA. Second, the description method of attribute elements and relational elements of nomen can be unstructured, structured, identifier, and IRI as suggested in RDA, and vocabulary encoding scheme (VES) and string encoding scheme (SES) should be written for structured description, Also, cataloging rules for structuring authorized access points and preferred names/title should be established. Third, an additional expansion plan based on Maxwell's expansion (draft) was proposed in order to prepare the MARC 21 authority format to reflect the LRM nomen. (1) The attribute must be described in 4XX and 5XX so that the attribute can be entered for each nomen, and the attributes of the nomen to be described in 1XX, 5XX and 4XX are presented separately. (2) In order to describe the nomen category, language, script, script conversion, context of use, and date of usage as a nomen attribute, field and subfield in MARC 21 must be added. Accordingly, it was proposed to expand the subfield of 368, 381, and 377, and to add fields to describe the context of use and date of usage. The considerations in authority control for the LRM nomen proposed in this paper will be the basis for establishing an authority control plan that reflects LRM in Korea.

An acoustical analysis method of numeric sounds by Praat (Praat를 이용한 숫자음의 음향적 분석법)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.127-137
    • /
    • 2000
  • This paper presents a macro script to analyze numeric sounds by a speech analysis shareware, Praat, and analyzes those sounds produced by three students who were born and raised in Pusan. Recording was done in a quiet office. To make a meaningful comparison, dynamic time points in relation to the total duration of voicing segments were determined to measure acoustical values. Results showed that a strong correlation coefficient was found between the repetitive production of numeric sounds within and across the speakers. Very high coefficients among diphthongal numbers (0 and 6) which usually show wide formant variation were noticed. This supports that each speaker produced numbers quite coherently. Also, the frequency differences between the three subjects were within a perceptually similar range. To identify a speaker among others may require to find subtle individual differences within this range. Perceptual experiments by synthesized numeric sounds may lead to resolve the issue.

  • PDF

Formant Trajectories of English Vowels Produced by American Males (미국인 남성이 발음한 영어 모음의 포먼트 궤적)

  • Yang, Byung-Gon
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.65-72
    • /
    • 2009
  • Formant values are the most important acoustic correlates of English vowels. Classical studies on English vowels reported the first three formant values measured at a single timepoint on a sustained vowel segment. However, many recent studies revealed that partial onset or offset segments with information of dynamic spectral changes may contribute to the exact identification of English vowels with an accuracy almost comparable to that by the whole vowel segment or word. The purpose of this study was to examine formant trajectories of nine English vowels collected by Hillenbrand et al.(1995). Acoustic analysis was systematically made by a Praat script at six equidistant timepoints over the vowel segment. Results showed that the first formant trajectories played an important role in distinguishing each vowel within the front- or back-vowel groups. The second formant trajectories of the back vowels varied more drastically than those of the front vowels. The third formant value was similar except the high vowel /i/. From the vowel space on F1 by F2 axes, the formant trajectories of each vowel clearly showed a transition toward the locus of the following consonant /d/. Other acoustic data revealed that there were some vowel inherent duration or pitch values. From this study we can conclude that the dynamic spectral changes are very important in specifying acoustic characteristics of the English vowels. Further studies on vowels and diphthongs in different contexts are desirable.

  • PDF

Development of Feedback Data Automated Verification Program for Mission S/W (임무 S/W 시험을 위한 피드백 데이터의 기댓값 검증 자동화 도구 개발)

  • Kwon, GI-Bong;Lee, Ha-Yoeun;Ha, Seok-Wun
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.49 no.10
    • /
    • pp.871-877
    • /
    • 2021
  • Aircraft defects are important matters directly related to the operation of the aircraft and the life of the pilot. The defects in the mission software that occur during aircraft control seriously affect the pilot's mission performance and safety. Therefore, the organization in charge of aircraft development or software defects are reinforced in the process to identify and eliminate defects in the early stages of development, and a lot of labor and time are spent, but due to the nature of the mission software, strong functional coupling with other avionics and high complexity, so there are restrictions on the identification and removal of software defects through the existing test method. This study analyzes the effect of securing mission software integrity and reducing test cost through data integrity verification by developing a tool that automates the verification of expected value of feedback data among communication data of mission computer interlocking equipment.

Visualization and Localization of Fusion Image Using VRML for Three-dimensional Modeling of Epileptic Seizure Focus (VRML을 이용한 융합 영상에서 간질환자 발작 진원지의 3차원적 가시화와 위치 측정 구현)

  • 이상호;김동현;유선국;정해조;윤미진;손혜경;강원석;이종두;김희중
    • Progress in Medical Physics
    • /
    • v.14 no.1
    • /
    • pp.34-42
    • /
    • 2003
  • In medical imaging, three-dimensional (3D) display using Virtual Reality Modeling Language (VRML) as a portable file format can give intuitive information more efficiently on the World Wide Web (WWW). The web-based 3D visualization of functional images combined with anatomical images has not studied much in systematic ways. The goal of this study was to achieve a simultaneous observation of 3D anatomic and functional models with planar images on the WWW, providing their locational information in 3D space with a measuring implement using VRML. MRI and ictal-interictal SPECT images were obtained from one epileptic patient. Subtraction ictal SPECT co-registered to MRI (SISCOM) was performed to improve identification of a seizure focus. SISCOM image volumes were held by thresholds above one standard deviation (1-SD) and two standard deviations (2-SD). SISCOM foci and boundaries of gray matter, white matter, and cerebrospinal fluid (CSF) in the MRI volume were segmented and rendered to VRML polygonal surfaces by marching cube algorithm. Line profiles of x and y-axis that represent real lengths on an image were acquired and their maximum lengths were the same as 211.67 mm. The real size vs. the rendered VRML surface size was approximately the ratio of 1 to 605.9. A VRML measuring tool was made and merged with previous VRML surfaces. User interface tools were embedded with Java Script routines to display MRI planar images as cross sections of 3D surface models and to set transparencies of 3D surface models. When transparencies of 3D surface models were properly controlled, a fused display of the brain geometry with 3D distributions of focal activated regions provided intuitively spatial correlations among three 3D surface models. The epileptic seizure focus was in the right temporal lobe of the brain. The real position of the seizure focus could be verified by the VRML measuring tool and the anatomy corresponding to the seizure focus could be confirmed by MRI planar images crossing 3D surface models. The VRML application developed in this study may have several advantages. Firstly, 3D fused display and control of anatomic and functional image were achieved on the m. Secondly, the vector analysis of a 3D surface model was defined by the VRML measuring tool based on the real size. Finally, the anatomy corresponding to the seizure focus was intuitively detected by correlations with MRI images. Our web based visualization of 3-D fusion image and its localization will be a help to online research and education in diagnostic radiology, therapeutic radiology, and surgery applications.

  • PDF