• Title/Summary/Keyword: Text line information

Search Result 147, Processing Time 0.025 seconds

On-Line Linear Combination of Classifiers Based on Incremental Information in Speaker Verification

  • Huenupan, Fernando;Yoma, Nestor Becerra;Garreton, Claudio;Molina, Carlos
    • ETRI Journal
    • /
    • v.32 no.3
    • /
    • pp.395-405
    • /
    • 2010
  • A novel multiclassifier system (MCS) strategy is proposed and applied to a text-dependent speaker verification task. The presented scheme optimizes the linear combination of classifiers on an on-line basis. In contrast to ordinary MCS approaches, neither a priori distributions nor pre-tuned parameters are required. The idea is to improve the most accurate classifier by making use of the incremental information provided by the second classifier. The on-line multiclassifier optimization approach is applicable to any pattern recognition problem. The proposed method needs neither a priori distributions nor pre-estimated weights, and does not make use of any consideration about training/testing matching conditions. Results with Yoho database show that the presented approach can lead to reductions in equal error rate as high as 28%, when compared with the most accurate classifier, and 11% against a standard method for the optimization of linear combination of classifiers.

A Study on the online of PDF Electronic Documents System (인터넷 원거리출판의 응용과 PDF의 인쇄활용에 관한 연구)

  • 유영수;강영립;김병현;이광수
    • Proceedings of the Korean Printing Society Conference
    • /
    • 2001.06a
    • /
    • pp.63-77
    • /
    • 2001
  • PDF(Portable Document Format) is a file format that Adobe advances postscritp technique and use in managing document information or electric publishing(internet, CD-ROM, DVD). PDF is a devised document type for being able to read and print anywhere, independent of OS, printer type, resolution, and the kind of computer etc. Because this includes a compressing function, it transfers document through a small size of file in internet or intranet. In addition, that is a file format has various advantages-sharing of information and transfering documents in on line or off line environment. In this paper, we developed electronic document system using PDF format. Electronic document system consists of filter, automatic indexing, special searching system and web server. The information used in this paper is database made using Zwon\`s DocuCom. The filter recognizes various kinds of document structure. And according to property of document, it produces ASCII output. In addition to processing various formats of document, the filter can extract keywords in documents of MS WORD, Excel, Powerpoint, PDF, CAD etc. This filter uses the structure of window printer drive and can extract the information for text, page, font type and size from relevant document. The automatic indexing recognizes the formatted tag of document form ASCII text produced by filter and extracts adequate keyword to structure and property of document. PDF electronic document systems proposed in this paper can be used in Internet, PC communication. Users can choose and read electronic documents by two ways. First, users can choose and read relevant books using PDF electronic document homepage. Second, users can use PDF integrated-search system. User can search after inputing keyword and choose reference field and type of data. But, now, PDF products of Adobe can\`t support the Korean character. If this problem is resolved, we thick that PDF applications system looks active. Although there is limited function in case of using Zwon DocuCom used in this study, we think that there isn\`t a great deal of difficulty in electronic document and building digital database.

  • PDF

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

  • Kim, Tae-Kyung;Oh, Jeong-Su;Ko, Gun-Hwan;Cho, Wan-Sup;Hou, Bo-Kyeng;Lee, Sang-Hyuk
    • Interdisciplinary Bio Central
    • /
    • v.3 no.2
    • /
    • pp.7.1-7.6
    • /
    • 2011
  • Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at http://pubmine.kobic.re.kr. Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.

Experimentation on The Recognition of Arithmetic Expressions (수식 표현의 인식에 관한 연구)

  • Lee, Young Kyo;Kim, Young Po
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.10 no.4
    • /
    • pp.29-35
    • /
    • 2014
  • The formula contains up between the text and the structural information, as well as their mathematical symbols. Research on-line or off-line recognition formula is underway actively used in various fields, and various forms of the equation are implemented recognition system. Although many documents are included in the various formulas, it is not easy to enter a formula into the computer. Recognition of the expression is divided into two processes of symbol recognition and structural analysis. After analyzing the location information of each character is specified to recognize the effective area after each symbol, and to the structure analysis based on the proximity between the characters is recognized as an independent single formula. Furthermore, analyzing the relationship between the front and back each time a combination of the position relationship between each symbol, and then to add the symbol which was able to easily update the structure of the entire formula. In this paper, by using a scanner to scan the book formula was used to interpret the meaning of the recognized symbol has a relative size and location information of the expression symbol. An algorithm to remove the formulas for calculation of the number of formula is present at the same time is proposed. Using the proposed algorithms to scan the books in the formula in order to evaluate the performance verification as 100% separation and showed the recognition rate equation.

A Development of Gas Line Safety Management System by GIS (GIS를 이용한 가스관의 안전 관리시스템 개발)

  • 최병길;정영동;김영곤
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.18 no.1
    • /
    • pp.11-17
    • /
    • 2000
  • GIS is the system that has ability of integrating, managing, and analyzing the voluminous graphic and text data, which is adequate system to manage complex network of the underground utilities of urban area. A development of gas line safety management system is accomplished to construct a database of gas line network and topographic data, create safety managing model, and estimate openly its safety by GIS. This system is designed to evaluate easily the damaged facilities in case of gas line explosion by the establishment of the geographic output system. It is designed to trace and present efficiently closed valves and interrupted facilities of gas when gas line breakage occurs, and offer the information by which one can take quickly emergency. And also, it is constructed to prevent from accident occurring under construction work by showing underground utilities and states of work.

  • PDF

Development of e-Mail Classifiers for e-Mail Response Management Systems (전자메일 자동관리 시스템을 위한 전자메일 분류기의 개발)

  • Kim, Kuk-Pyo;Kwon, Young-S.
    • Journal of Information Technology Services
    • /
    • v.2 no.2
    • /
    • pp.87-95
    • /
    • 2003
  • With the increasing proliferation of World Wide Web, electronic mail systems have become very widely used communication tools. Researches on e-mail classification have been very important in that e-mail classification system is a major engine for e-mail response management systems which mine unstructured e-mail messages and automatically categorize them. in this research we develop e-mail classifiers for e-mail Response Management Systems (ERMS) using naive bayesian learning and centroid-based classification. We analyze which method performs better under which conditions, comparing classification accuracies which may depend on the structure, the size of training data set and number of classes, using the different data set of an on-line shopping mall and a credit card company. The developed e-mail classifiers have been successfully implemented in practice. The experimental results show that naive bayesian learning performs better, while centroid-based classification is more robust in terms of classification accuracy.

Document Layout Analysis Using Coarse/Fine Strategy (Coarse/fine 전략을 이용한 문서 구조 분석)

  • 박동열;곽희규;김수형
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.198-201
    • /
    • 2000
  • We propose a method for analyzing the document structure. This method consists of two processes, segmentation and classification. The segmentation first divides a low resolution image, and then finely splits the original document image using projection profiles. The classification deterimines each segmented region as text, line, table or image. An experiment with 238 documents images shows that the segmentation accuracy is 99.1% and the classification accuracy is 97.3%.

  • PDF

Fast Skew Detection of Document Images by Extraction of Center Points of Blank Lines (공백행의 중심점 추출에 의한 고속 문서 기울기 검출)

  • Jeong, Jae-Yeong;Kim, Mun-Hyeon
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1342-1349
    • /
    • 1999
  • 본 논문에서는 문서 내의 인접한 두 행 사이에는 일정한 두께의 공백 행이 존재하며 그 공백 행의 기울기는 실제 문서의 기울어진 정도를 반영한다는 사실에 기반하여, 선형적으로 기울어진 문서 영상의 기울기 추정을 위한 고속의 알고리즘을 제안한다. 먼저, 간단한 모폴로지 연산(dilation)을 이용하여 문자행 영역과 공백행 영역을 분리한 후, 이를 일정 간격으로 수직 샘플링하여 수직선 상에 있는 모든 공백행의 중심점(행간점)을 찾는다. 동일한 공백 행 상에 있는 인접한 두 행간점 간에 기울기를 계산하고, 전체 영상으로부터 이들의 분포를 조사하여 최대 빈도를 가지는 기울기를 입력 문서의 기울기로 추정한다. 실험에서는 제안한 알고리즘을 필기체 및 인쇄체를 포함하는 다양한 형태의 가로쓰기 문서에 적용한 결과를 보인다.Abstract In this paper, we propose a fast algorithm to estimate the skew angle of linearly skewed document images. This paper is based on the fact that there is a blank line with uniform thickness between two adjacent text lines and the slope of the line is the same as that of the document. Firstly, we apply a dilation operation to the image to separate blank lines from text lines, and we detect center points of blank lines along the vertically sampled lines. Then we calculate the slope between neighboring center points in the same blank line. Calculated slopes for the entire image are accumulated on the histogram to display the distribution of them. Finally, the peak in the histogram is detected and estimated as the slope of the document image. In the experiments, we adopted a lot of images of various format with hand-printed or machine-printed document to verify our algorithm.

On-line dynamic hand gesture recognition system for the korean sign language (KSL) (한글 수화용 동적 손 제스처의 실시간 인식 시스템의 구현에 관한 연구)

  • Kim, Jong-Sung;Lee, Chan-Su;Jang, Won;Bien, Zeungnam
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.2
    • /
    • pp.61-70
    • /
    • 1997
  • Human-hand gestures have been used a means of communication among people for a long time, being interpreted as streams of tokens for a language. The signed language is a method of communication for hearing impaired person. Articulated gestures and postures of hands and fingers are commonly used for the signed language. This paper presents a system which recognizes the korean sign language (KSL) and translates the recognition results into a normal korean text and sound. A pair of data-gloves are used a sthe sensing device for detecting motions of hands and fingers. In this paper, we propose a dynamic gesture recognition mehtod by employing a fuzzy feature analysis method for efficient classification of hand motions, and applying a fuzzy min-max neural network to on-line pattern recognition.

  • PDF