• Title/Summary/Keyword: Text based

Search Result 3,893, Processing Time 0.043 seconds

Building Topic Hierarchy of e-Documents using Text Mining Technology

  • Kim, Han-Joon
    • Proceedings of the CALSEC Conference
    • /
    • 2004.02a
    • /
    • pp.294-301
    • /
    • 2004
  • ·Text-mining approach to e-documents organization based on topic hierarchy - Machine-Learning & information Theory-based ㆍ 'Category(topic) discovery' problem → document bundle-based user-constraint document clustering ㆍ 'Automatic categorization' problem → Accelerated EM with CU-based active learning → 'Hierarchy Construction' problem → Unsupervised learning of category subsumption relation

  • PDF

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

  • Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.771-789
    • /
    • 2019
  • Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.

Text Region Detection using Edge and Regional Minima/Maxima Transformation from Natural Scene Images (에지 및 국부적 최소/최대 변환을 이용한 자연 이미지로부터 텍스트 영역 검출)

  • Park, Jong-Cheon;Lee, Keun-Wang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.2
    • /
    • pp.358-363
    • /
    • 2009
  • Text region detection from the natural scene images used in a variety of applications, many research are needed in this field. Recent research methods is to detect the text region using various algorithm which it is combination of edge based and connected component based. Therefore, this paper proposes an text region detection using edge and regional minima/maxima transformation algorithm from natural scene images, and then detect the connected components of edge and regional minima/maxima, labeling edge and regional minima/maxima connected components. Analysis the labeled regions and then detect a text candidate regions, each of detected text candidates combined and create a single text candidate image, Final text region validated by comparing the similarity and adjacency of individual characters, and then as the final text regions are detected. As the results of experiments, proposed algorithm improved the correctness of text regions detection using combined edge and regional minima/maxima connected components detection methods.

Text Area Extraction Method for Color Images Based on Labeling and Gradient Difference Method (레이블링 기법과 밝기값 변화에 기반한 컬러영상의 문자영역 추출 방법)

  • Won, Jong-Kil;Kim, Hye-Young;Cho, Jin-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.511-521
    • /
    • 2011
  • As the use of image input and output devices increases, the importance of extracting text area in color images is also increasing. In this paper, in order to extract text area of the images efficiently, we present a text area extraction method for color images based on labeling and gradient difference method. The proposed method first eliminates non-text area using the processes of labeling and filtering. After generating the candidates of text area by using the property that is high gradient difference in text area, text area is extracted using the post-processing of noise removal and text area merging. The benefits of the proposed method are its simplicity and high accuracy that is better than the conventional methods. Experimental results show that precision, recall and inverse ratio of non-text extraction (IRNTE) of the proposed method are 99.59%, 98.65% and 82.30%, respectively.

Empirical Analysis on the Effect of Design Pattern of Web Page, Perceived Risk and Media Richness to Customer Satisfaction (콘텐츠 제작방식, 지각된 위험, 미디어 풍부성이 고객만족에 미치는 영향 분석)

  • Park, Bong-Won;Lee, Jung-Mann;Lee, Jong-Won
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.6
    • /
    • pp.385-396
    • /
    • 2011
  • Internet web pages can be classified by three major types such as texts only, images with texts and videos with texts. The purpose of this paper is to analyze how customers recognize and respond perspective of perceived risk and media richness with regard to design patterns of internet web pages. Additionally, we will examine the extent to which aforementioned factors affect customer satisfaction. Analyses with perceived risks revealed that customers feel less personal risks including performance, psychology and time/convenience when used web pages of text-images and text-videos, compared to text only based web pages. However, customers feel that web pages consisting of image-text or video-text have higher points in terms of symbolism and social presence in media richness, compared to text only based web pages. Finally, we showed that personal risk and text-based Web page negatively affect but symbolism and social presence positively impact on customer satisfaction. Therefore, this study suggests a clue that why video-based Web content did not grow different from many people's expectation.

WCTT: Web Crawling System based on HTML Document Formalization (WCTT: HTML 문서 정형화 기반 웹 크롤링 시스템)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.495-502
    • /
    • 2022
  • Web crawler, which is mainly used to collect text on the web today, is difficult to maintain and expand because researchers must implement different collection logic by collection channel after analyzing tags and styles of HTML documents. To solve this problem, the web crawler should be able to collect text by formalizing HTML documents to the same structure. In this paper, we designed and implemented WCTT(Web Crawling system based on Tag path and Text appearance frequency), a web crawling system that collects text with a single collection logic by formalizing HTML documents based on tag path and text appearance frequency. Because WCTT collects texts with the same logic for all collection channels, it is easy to maintain and expand the collection channel. In addition, it provides the preprocessing function that removes stopwords and extracts only nouns for keyword network analysis and so on.

Text Region Detection using Adaptive Character-Edge Map From Natural Image (자연영상에서 적응적 문자-에지 맵을 이용한 텍스트 영역 검출)

  • Park, Jong-Cheon;Hwang, Dong-Guk;Jun, Byoung-Min
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.5
    • /
    • pp.1135-1140
    • /
    • 2007
  • This paper proposes an edge-based text region detection algorithm using the adaptive character-edge maps which are independent of the size of characters and the orientation of character string in natural images. First, labeled images are obtained from edge images and in order to search for characters, adaptive character-edge maps by way grammar are applied to labeled images. Next, selected label images are clustered as for distance of its neighbors. And then, text region candidates are obtained. Finally, text region candidates are verified by using the empirical rules and horizontal/vertical projection profiles based on the orientation of text region. As the results of experiments, a text region detection algorithm turned out to be robust in the matter of various character size, orientation, and the complexity of the background.

  • PDF

Pill Identification Algorithm Based on Deep Learning Using Imprinted Text Feature (음각 정보를 이용한 딥러닝 기반의 알약 식별 알고리즘 연구)

  • Seon Min, Lee;Young Jae, Kim;Kwang Gi, Kim
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.441-447
    • /
    • 2022
  • In this paper, we propose a pill identification model using engraved text feature and image feature such as shape and color, and compare it with an identification model that does not use engraved text feature to verify the possibility of improving identification performance by improving recognition rate of the engraved text. The data consisted of 100 classes and used 10 images per class. The engraved text feature was acquired through Keras OCR based on deep learning and 1D CNN, and the image feature was acquired through 2D CNN. According to the identification results, the accuracy of the text recognition model was 90%. The accuracy of the comparative model and the proposed model was 91.9% and 97.6%. The accuracy, precision, recall, and F1-score of the proposed model were better than those of the comparative model in terms of statistical significance. As a result, we confirmed that the expansion of the range of feature improved the performance of the identification model.

A method for text entry on a touch-screen keyboard based on the fuzzy touch scheme (퍼지터치를 이용한 터치스크린에서의 문자 입력 방법에 대한 연구)

  • Kwon, Sung-Hyuk;Lee, Dong-Hun;Chung, Min-K.
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.262-268
    • /
    • 2008
  • Recently, as the demand for multimedia services based on the wireless technologies and mobile devices increases, Full-touch screen mobile devices adopting touch screen keyboards are emerging to cope with the limited display size and take advantage of the flexibility in the design of user interfaces. However, the text entry task, which is one of the main features of the mobile devices, decreases the competitive advantages of the touch screen keyboards over the physical keyboards or keypads due to the lack of physical feedbacks and the frequent occurrence of mistyping. This study aims to introduce a novel text entry method named Fuzzy Touch and compare this method with the conventional text entry method on a touch screen keyboard in terms of the performance (time, number of touch) and the subjective ratings (ease of use, overall preference).

  • PDF

Efficient Text Localization using MLP-based Texture Classification (신경망 기반의 텍스춰 분석을 이용한 효율적인 문자 추출)

  • Jung, Kee-Chul;Kim, Kwang-In;Han, Jung-Hyun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.3
    • /
    • pp.180-191
    • /
    • 2002
  • We present a new text localization method in images using a multi-layer perceptron(MLP) and a multiple continuously adaptive mean shift (MultiCAMShift) algorithm. An automatically constructed MLP-based texture classifier generates a text probability image for various types of images without an explicit feature extraction. The MultiCAMShift algorithm, which operates on the text probability Image produced by an MLP, can place bounding boxes efficiently without analyzing the texture properties of an entire image.