• Title/Summary/Keyword: Text detection

Search Result 400, Processing Time 0.027 seconds

Knowledge Graph-based Korean New Words Detection Mechanism for Spam Filtering (스팸 필터링을 위한 지식 그래프 기반의 신조어 감지 매커니즘)

  • Kim, Ji-hye;Jeong, Ok-ran
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.79-85
    • /
    • 2020
  • Today, to block spam texts on smartphone, a simple string comparison between text messages and spam keywords or a blocking spam phone numbers is used. As results, spam text is sent in a gradually hanged way to prevent if from being automatically blocked. In particular, for words included in spam keywords, spam texts are sent to abnormal words using special characters, Chinese characters, and whitespace to prevent them from being detected by simple string match. There is a limit that traditional spam filtering methods can't block these spam texts well. Therefore, new technologies are needed to respond to changing spam text messages. In this paper, we propose a knowledge graph-based new words detection mechanism that can detect new words frequently used in spam texts and respond to changing spam texts. Also, we show experimental results of the performance when detected Korean new words are applied to the Naive Bayes algorithm.

A Simple Enzymatic Method for Quantitation of 2'-Fucosyllactose

  • Seydametova, Emine;Shin, Jonghyeok;Yu, Jiwon;Kweon, Dae-Hyuk
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.7
    • /
    • pp.1141-1146
    • /
    • 2018
  • 2'-Fucosyllactose (2'-FL) is one of the most important human milk oligosaccharides and has several health benefits for infants. The levels of 2'-FL in breast milk or samples from other sources can be quantified by high-performance liquid chromatography. However, this method cannot be used for simultaneous detection of the target compound in numerous samples. Here, we developed a simple method for quantifying 2'-FL in a microplate format. The method involves two steps: (i) release of $\text\tiny{L}$-fucose from 2'-FL by ${\alpha}$-(1-2,3,4,6)-$\text\tiny{L}$-fucosidase and (ii) measurement of NADPH formed during the oxidation of $\text\tiny{L}$-fucose by $\text\tiny{L}$-fucose dehydrogenase. This method enables measurement of up to 5 g/l 2'-FL in 50 min using a 96-well microplate. The efficiency and simplicity of the proposed method make it suitable for the analyses of a large number of samples simultaneously.

Text Region Extraction Using Pattern Histogram of Character-Edge Map in Natural Images (문자-에지 맵의 패턴 히스토그램을 이용한 자연이미지에세 텍스트 영역 추출)

  • Park, Jong-Cheon;Hwang, Dong-Guk;Lee, Woo-Ram;Jun, Byoung-Min
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.6
    • /
    • pp.1167-1174
    • /
    • 2006
  • Text region detection from a natural scene is useful in many applications such as vehicle license plate recognition. Therefore, in this paper, we propose a text region extraction method using pattern histogram of character-edge maps. We create 16 kinds of edge maps from the extracted edges and then, we create the 8 kinds of edge maps which compound 16 kinds of edge maps, and have a character feature. We extract a candidate of text regions using the 8 kinds of character-edge maps. The verification about candidate of text region used pattern histogram of character-edge maps and structural features of text region. Experimental results show that the proposed method extracts a text regions composed of complex background, various font sizes and font colors effectively.

  • PDF

Real-time Character Detection System Using EAST Model and OCR (EAST 모델과 OCR을 이용한 실시간 문자 탐지 시스템)

  • Ye-Jun Choi;Mikyeong Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.683-684
    • /
    • 2023
  • 웹페이지나 디지털 문서에는 특정 단어나 특정 문구를 검색하는 기능이 있다. 인쇄된 도서나 참고서 등과 같은 인쇄물에는 실시간으로 특정 단어나 특정 문구를 찾는 기능이 없어 어려움을 겪는 경우가 많다. 본 논문에서는 텍스트를 감지(Detection)하는 EAST 모델과 텍스트를 인식(Recognition)하는 EasyOCR을 활용한 실시간 문자 탐지 시스템의 개발내용에 대해 기술한다. 이 시스템을 통해 사용자는 인쇄물에서 실시간으로 원하는 단어나 문구를 찾아 필요한 정보를 빠르게 읽는 것에 효과적일 것을 기대한다.

  • PDF

Font Change Blindness Triggered by the Text Difficulty in Moving Window Technique (움직이는 창 기법에서의 덩이글 난이도에 따른 글꼴 변화맹)

  • Seong-Jun Bak;Joo-Seok Hyun
    • Korean Journal of Cognitive Science
    • /
    • v.34 no.4
    • /
    • pp.259-275
    • /
    • 2023
  • The aim of this study was to investigate font change blindness based on text difficulty in the "Moving Window Task", as originally introduced by McConkie and Rayner(1975). During the reading process where the moving window was applied, different target words in terms of font style compared to the text were presented. As participants' gaze reached the position of the target word, the font of the target word was changed to match the text font. The font of the target word before the change was either sans-serif when the text font was serif, or serif when the text font was sans-serif. After completing the reading task, more than half of the participants(62.5%) reported not detecting the font change. Observation of eye movements at the target word positions revealed that when understanding the content within the text was difficult, there was an increase in the number of regressions, an extended gaze duration, and a reduction in saccade length. Specifically, the increase in the number of regressions was evident only when the text font was serif, in other words, when the font of the target word shifted from sans-serif to serif. These results suggest that sensory interference unrelated to content understanding is not easily detected during reading. However, the possibility of detection increases when comprehension of the content becomes challenging. Furthermore, this exceptional detection possibility implies that it may be higher when the text font is serif compared to when it is sans-serif.

A Classification Model for Attack Mail Detection based on the Authorship Analysis (작성자 분석 기반의 공격 메일 탐지를 위한 분류 모델)

  • Hong, Sung-Sam;Shin, Gun-Yoon;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.35-46
    • /
    • 2017
  • Recently, attackers using malicious code in cyber security have been increased by attaching malicious code to a mail and inducing the user to execute it. Especially, it is dangerous because it is easy to execute by attaching a document type file. The author analysis is a research area that is being studied in NLP (Neutral Language Process) and text mining, and it studies methods of analyzing authors by analyzing text sentences, texts, and documents in a specific language. In case of attack mail, it is created by the attacker. Therefore, by analyzing the contents of the mail and the attached document file and identifying the corresponding author, it is possible to discover more distinctive features from the normal mail and improve the detection accuracy. In this pager, we proposed IADA2(Intelligent Attack mail Detection based on Authorship Analysis) model for attack mail detection. The feature vector that can classify and detect attack mail from the features used in the existing machine learning based spam detection model and the features used in the author analysis of the document and the IADA2 detection model. We have improved the detection models of attack mails by simply detecting term features and extracted features that reflect the sequence characteristics of words by applying n-grams. Result of experiment show that the proposed method improves performance according to feature combinations, feature selection techniques, and appropriate models.

Decomposition of a Text Block into Words Using Projection Profiles, Gaps and Special Symbols (투영 프로파일, GaP 및 특수 기호를 이용한 텍스트 영역의 어절 단위 분할)

  • Jeong Chang Bu;Kim Soo Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1121-1130
    • /
    • 2004
  • This paper proposes a method for line and word segmentation for machine-printed text blocks. To separate a text region into the unit of lines, it analyses the horizontal projection profile and performs a recursive projection profile cut method. In the word segmentation, between-word gaps are identified by a hierarchical clustering method after finding gaps in the text line by using a connected component analysis. In addition, a special symbol detection technique is applied to find two types of special symbols tying between words using their morphologic features. An experiment with 84 text regions from English and Korean documents shows that the proposed method achieves 99.92% accuracy of word segmentation, while a commercial OCR software named Armi 6.0 Pro$^{TM}$ has 97.58% accuracy.y.

A Tensor Space Model based Deep Neural Network for Automated Text Classification (자동문서분류를 위한 텐서공간모델 기반 심층 신경망)

  • Lim, Pu-reum;Kim, Han-joon
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.3-13
    • /
    • 2018
  • Text classification is one of the text mining technologies that classifies a given textual document into its appropriate categories and is used in various fields such as spam email detection, news classification, question answering, emotional analysis, and chat bot. In general, the text classification system utilizes machine learning algorithms, and among a number of algorithms, naïve Bayes and support vector machine, which are suitable for text data, are known to have reasonable performance. Recently, with the development of deep learning technology, several researches on applying deep neural networks such as recurrent neural networks (RNN) and convolutional neural networks (CNN) have been introduced to improve the performance of text classification system. However, the current text classification techniques have not yet reached the perfect level of text classification. This paper focuses on the fact that the text data is expressed as a vector only with the word dimensions, which impairs the semantic information inherent in the text, and proposes a neural network architecture based upon the semantic tensor space model.

An Ensemble Classifier Based Method to Select Optimal Image Features for License Plate Recognition (차량 번호판 인식을 위한 앙상블 학습기 기반의 최적 특징 선택 방법)

  • Jo, Jae-Ho;Kang, Dong-Joong
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.1
    • /
    • pp.142-149
    • /
    • 2016
  • This paper proposes a method to detect LP(License Plate) of vehicles in indoor and outdoor parking lots. In restricted environment, there are many conventional methods for detecting LP. But, it is difficult to detect LP in natural and complex scenes with background clutters because several patterns similar with text or LP always exist in complicated backgrounds. To verify the performance of LP text detection in natural images, we apply MB-LGP feature by combining with ensemble machine learning algorithm in purpose of selecting optimal features of small number in huge pool. The feature selection is performed by adaptive boosting algorithm that shows great performance in minimum false positive detection ratio and in computing time when combined with cascade approach. MSER is used to provide initial text regions of vehicle LP. Throughout the experiment using real images, the proposed method functions robustly extracting LP in natural scene as well as the controlled environment.

Integrated Method for Text Detection in Natural Scene Images

  • Zheng, Yang;Liu, Jie;Liu, Heping;Li, Qing;Li, Gen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5583-5604
    • /
    • 2016
  • In this paper, we present a novel image operator to extract textual information in natural scene images. First, a powerful refiner called the Stroke Color Extension, which extends the widely used Stroke Width Transform by incorporating color information of strokes, is proposed to achieve significantly enhanced performance on intra-character connection and non-character removal. Second, a character classifier is trained by using gradient features. The classifier not only eliminates non-character components but also remains a large number of characters. Third, an effective extractor called the Character Color Transform combines color information of characters and geometry features. It is used to extract potential characters which are not correctly extracted in previous steps. Fourth, a Convolutional Neural Network model is used to verify text candidates, improving the performance of text detection. The proposed technique is tested on two public datasets, i.e., ICDAR2011 dataset and ICDAR2013 dataset. The experimental results show that our approach achieves state-of-the-art performance.