• 제목/요약/키워드: Science text

검색결과 3,789건 처리시간 0.028초

A Fast Algorithm for Korean Text Extraction and Segmentation from Subway Signboard Images Utilizing Smartphone Sensors

  • Milevskiy, Igor;Ha, Jin-Young
    • Journal of Computing Science and Engineering
    • /
    • 제5권3호
    • /
    • pp.161-166
    • /
    • 2011
  • We present a fast algorithm for Korean text extraction and segmentation from subway signboards using smart phone sensors in order to minimize computational time and memory usage. The algorithm can be used as preprocessing steps for optical character recognition (OCR): binarization, text location, and segmentation. An image of a signboard captured by smart phone camera while holding smart phone by an arbitrary angle is rotated by the detected angle, as if the image was taken by holding a smart phone horizontally. Binarization is only performed once on the subset of connected components instead of the whole image area, resulting in a large reduction in computational time. Text location is guided by user's marker-line placed over the region of interest in binarized image via smart phone touch screen. Then, text segmentation utilizes the data of connected components received in the binarization step, and cuts the string into individual images for designated characters. The resulting data could be used as OCR input, hence solving the most difficult part of OCR on text area included in natural scene images. The experimental results showed that the binarization algorithm of our method is 3.5 and 3.7 times faster than Niblack and Sauvola adaptive-thresholding algorithms, respectively. In addition, our method achieved better quality than other methods.

Implementation of a Web-Based Electronic Text for High School's Probability and Statistics Education

  • Choi, Sook-Hee
    • Communications for Statistical Applications and Methods
    • /
    • 제11권2호
    • /
    • pp.329-343
    • /
    • 2004
  • With advancement of computer and network, world wide web(WWW) as a medium of information communication is generalized in many fields. In educational aspect, applications of WWW as alternative media for class teachings or printed matters are increasing. In this article, we demonstrate a web-based electronic text on the 'probability and statistics' which is one of six fields of mathematics in the 7th curriculum. This text places importance on comprehension of concepts of probability and statistics as an applied science.

Arabic Text Recognition with Harakat Using Deep Learning

  • Ashwag, Maghraby;Esraa, Samkari
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.41-46
    • /
    • 2023
  • Because of the significant role that harakat plays in Arabic text, this paper used deep learning to extract Arabic text with its harakat from an image. Convolutional neural networks and recurrent neural network algorithms were applied to the dataset, which contained 110 images, each representing one word. The results showed the ability to extract some letters with harakat.

고등학교 과학 수업에서 의미지도 읽기 전략이 고등학생의 과학 텍스트 읽기 능력에 미치는 영향 (The Effects of Implementing Semantic Mapping Reading Strategy in Science Class On High School Students' Science Text Reading Ability)

  • 이수진;남정희
    • 대한화학회지
    • /
    • 제66권5호
    • /
    • pp.376-389
    • /
    • 2022
  • 이 연구는 과학 수업에서 의미지도 읽기 전략이 고등학생의 과학 텍스트 읽기 능력에 미치는 영향을 알아보는 것을 목적으로 하였다. 이를 위해 중소도시 소재의 과학중점학교 3학년 학생들(40명)을 대상으로 한 학기 동안 사회과학적 이슈와 화학 개념에 대한 8개의 과학 텍스트를 이용하여 의미지도 읽기 전략 수업을 적용하였다. 의미지도 읽기 전략이 과학 텍스트 읽기 능력에 미치는 영향을 알아보기 위해 학생들이 작성한 사전·사후 과학 읽기 능력 검사를 비교 분석하였다. 분석 결과, 의미지도 수업을 적용한 실험집단의 과학 읽기 능력 검사 점수의 평균이 비교집단보다 유의미하게 높았다. 읽기 과제를 해결하기 전에 의미지도를 그리는 것은 학생들이 텍스트에서 정보를 찾고, 의미를 추론하는 것에 효과가 나타났다. 학생들 역시 의미지도가 텍스트의 내용을 시각화하여 개념들 사이의 관계를 파악하기 쉽고, 자신의 배경지식과 텍스트 내용을 연결시킬 수 있어 텍스트의 이해에 도움이 된다고 인식하고 있음을 알 수 있다.

Representation of Texts into String Vectors for Text Categorization

  • Jo, Tae-Ho
    • Journal of Computing Science and Engineering
    • /
    • 제4권2호
    • /
    • pp.110-127
    • /
    • 2010
  • In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based approaches to text categorization, where string vectors are received as input vectors, instead of numerical vectors. As a result, we can improve text categorization performance by avoiding these two problems.

New Text Steganography Technique Based on Part-of-Speech Tagging and Format-Preserving Encryption

  • Mohammed Abdul Majeed;Rossilawati Sulaiman;Zarina Shukur
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권1호
    • /
    • pp.170-191
    • /
    • 2024
  • The transmission of confidential data using cover media is called steganography. The three requirements of any effective steganography system are high embedding capacity, security, and imperceptibility. The text file's structure, which makes syntax and grammar more visually obvious than in other media, contributes to its poor imperceptibility. Text steganography is regarded as the most challenging carrier to hide secret data because of its insufficient redundant data compared to other digital objects. Unicode characters, especially non-printing or invisible, are employed for hiding data by mapping a specific amount of secret data bits in each character and inserting the character into cover text spaces. These characters are known with limited spaces to embed secret data. Current studies that used Unicode characters in text steganography focused on increasing the data hiding capacity with insufficient redundant data in a text file. A sequential embedding pattern is often selected and included in all available positions in the cover text. This embedding pattern negatively affects the text steganography system's imperceptibility and security. Thus, this study attempts to solve these limitations using the Part-of-speech (POS) tagging technique combined with the randomization concept in data hiding. Combining these two techniques allows inserting the Unicode characters in randomized patterns with specific positions in the cover text to increase data hiding capacity with minimum effects on imperceptibility and security. Format-preserving encryption (FPE) is also used to encrypt a secret message without changing its size before the embedding processes. By comparing the proposed technique to already existing ones, the results demonstrate that it fulfils the cover file's capacity, imperceptibility, and security requirements.

Image Steganography to Hide Unlimited Secret Text Size

  • Almazaydeh, Wa'el Ibrahim A.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권4호
    • /
    • pp.73-82
    • /
    • 2022
  • This paper shows the hiding process of unlimited secret text size in an image using three methods: the first method is the traditional method in steganography that based on the concealing the binary value of the text using the least significant bits method, the second method is a new method to hide the data in an image based on Exclusive OR process and the third one is a new method for hiding the binary data of the text into an image (that may be grayscale or RGB images) using Exclusive and Huffman Coding. The new methods shows the hiding process of unlimited text size (data) in an image. Peak Signal to Noise Ratio (PSNR) is applied in the research to simulate the results.

Representative Batch Normalization for Scene Text Recognition

  • Sun, Yajie;Cao, Xiaoling;Sun, Yingying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권7호
    • /
    • pp.2390-2406
    • /
    • 2022
  • Scene text recognition has important application value and attracted the interest of plenty of researchers. At present, many methods have achieved good results, but most of the existing approaches attempt to improve the performance of scene text recognition from the image level. They have a good effect on reading regular scene texts. However, there are still many obstacles to recognizing text on low-quality images such as curved, occlusion, and blur. This exacerbates the difficulty of feature extraction because the image quality is uneven. In addition, the results of model testing are highly dependent on training data, so there is still room for improvement in scene text recognition methods. In this work, we present a natural scene text recognizer to improve the recognition performance from the feature level, which contains feature representation and feature enhancement. In terms of feature representation, we propose an efficient feature extractor combined with Representative Batch Normalization and ResNet. It reduces the dependence of the model on training data and improves the feature representation ability of different instances. In terms of feature enhancement, we use a feature enhancement network to expand the receptive field of feature maps, so that feature maps contain rich feature information. Enhanced feature representation capability helps to improve the recognition performance of the model. We conducted experiments on 7 benchmarks, which shows that this method is highly competitive in recognizing both regular and irregular texts. The method achieved top1 recognition accuracy on four benchmarks of IC03, IC13, IC15, and SVTP.

법률정보시스템의 색인에 관한 연구 -특히 2차 법률정보를 중심으로- (A Study on the Index Model for Secondary Legal Information Databases)

  • 노정란
    • 한국비블리아학회지
    • /
    • 제8권1호
    • /
    • pp.117-134
    • /
    • 1997
  • This study proves that the quoted legal text functions as the index which represents the contents of the text because of the characteristics of legal information, the automatic indexing in the secondary legal full-text databases can be possible without the assitance of the experts. In case of the establishment, amendment or repealing of law, change of words of index can be possible through revising the legal text quoted in the secondary legal full-text databases. Even when we dont input the full-text about retrospective documents, automatic indexing is also possible, and the establihment and the practice of expert knowledge and integrated databases are possible in case of the retrospective documents. This study indicates that it is necessary to have characteristic information the information experts recognize - that is to say, experimental and inherent knowledge only human being can have - built-in into the system rather than to approach the information system by the linguistic, statistic or structuralistic way, and it can be more essential and intelligent information system.

  • PDF

An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification

  • Mikawa, Kenta;Ishida, Takashi;Goto, Masayuki
    • Industrial Engineering and Management Systems
    • /
    • 제11권1호
    • /
    • pp.87-93
    • /
    • 2012
  • This paper discusses a new weighting method for text analyzing from the view point of supervised learning. The term frequency and inverse term frequency measure (tf-idf measure) is famous weighting method for information retrieval, and this method can be used for text analyzing either. However, it is an experimental weighting method for information retrieval whose effectiveness is not clarified from the theoretical viewpoints. Therefore, other effective weighting measure may be obtained for document classification problems. In this study, we propose the optimal weighting method for document classification problems from the view point of supervised learning. The proposed measure is more suitable for the text classification problem as used training data than the tf-idf measure. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of newspaper article and the customer review which is posted on the web site.