• Title/Summary/Keyword: Image SPAM

Search Result 14, Processing Time 0.022 seconds

Spam Image Detection Model based on Deep Learning for Improving Spam Filter

  • Seong-Guk Nam;Dong-Gun Lee;Yeong-Seok Seo
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.289-301
    • /
    • 2023
  • Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.

Improved Spam Filter via Handling of Text Embedded Image E-mail

  • Youn, Seongwook;Cho, Hyun-Chong
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.1
    • /
    • pp.401-407
    • /
    • 2015
  • The increase of image spam, a kind of spam in which the text message is embedded into attached image to defeat spam filtering technique, is a major problem of the current e-mail system. For nearly a decade, content based filtering using text classification or machine learning has been a major trend of anti-spam filtering system. Recently, spammers try to defeat anti-spam filter by many techniques. Text embedding into attached image is one of them. We proposed an ontology spam filters. However, the proposed system handles only text e-mail and the percentage of attached images is increasing sharply. The contribution of the paper is that we add image e-mail handling capability into the anti-spam filtering system keeping the advantages of the previous text based spam e-mail filtering system. Also, the proposed system gives a low false negative value, which means that user's valuable e-mail is rarely regarded as a spam e-mail.

Experimental Verification of the Versatility of SPAM-based Image Steganalysis (SPAM 기반 영상 스테그아날리시스의 범용성에 대한 실험적 검증)

  • Kim, Jaeyoung;Park, Hanhoon;Park, Jong-Il
    • Journal of Broadcast Engineering
    • /
    • v.23 no.4
    • /
    • pp.526-535
    • /
    • 2018
  • Many steganography algorithms have been studied, and steganalysis for detecting stego images which steganography is applied to has also been studied in parallel. Especially, in the case of the image steganalysis, the features such as ALE, SPAM, and SRMQ are extracted from the statistical characteristics of the image, and stego images are classified by learning the classifier using various machine learning algorithms. However, these studies did not consider the effect of image size, aspect ratio, or message-embedding rate, and thus the features might not function normally for images with conditions different from those used in the their studies. In this paper, we analyze the classification rate of the SPAM-based image stegnalysis against variety image sizes aspect ratios and message-embedding rates and verify its versatility.

A Chinese Spam Filter Using Keyword and Text-in-Image Features

  • Chen, Ying-Nong;Wang, Cheng-Tzu;Lo, Chih-Chung;Han, Chin-Chuan;Fana, Kuo-Chin
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.32-37
    • /
    • 2009
  • Recently, electronic mail(E-mail) is the most popular communication manner in our society. In such conventional environments, spam increasingly congested in Internet. In this paper, Chinese spam could be effectively detected using text and image features. Using text features, keywords and reference templates in Chinese mails are automatically selected using genetic algorithm(GA). In addition, spam containing a promotion image is also filtered out by detecting the text characters in images. Some experimental results are given to show the effectiveness of our proposed method.

  • PDF

Image Forensic Decision Algorithm using Edge Energy Information of Forgery Image (위·변조 영상의 에지 에너지 정보를 이용한 영상 포렌식 판정 알고리즘)

  • Rhee, Kang Hyeon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.3
    • /
    • pp.75-81
    • /
    • 2014
  • In a distribution of the digital image, there is a serious problem that is distributed an illegal forgery image by pirates. For the problem solution, this paper proposes an image forensic decision algorithm using an edge energy information of forgery image. The algorithm uses SA (Streaking Artifacts) and SPAM (Subtractive Pixel Adjacency Matrix) to extract the edge energy informations of original image according to JPEG compression rate(QF=90, 70, 50 and 30) and the query image. And then it decides the forge whether or not by comparing the edge informations between the original and query image each other. According to each threshold in TCJCR (Threshold by Combination of JPEG Compression Ratios), the matching of the edge informations of original and query image is excused. Through the matching experiments, TP (True Positive) and FN (False Negative) is 87.2% and 13.8% respectively. Thus, the minimum average decision error is 0.1349. Also, it is confirmed that the performed class evaluation of the proposed algorithm is 'Excellent(A)' because of the AUROC (Area Under Receiver Operating Characteristic) curve is 0.9388 by sensitivity and 1-specificity.

Extraction of Text Regions from Spam-Mail Images Using Color Layers (색상레이어를 이용한 스팸메일 영상에서의 텍스트 영역 추출)

  • Kim Ji-Soo;Kim Soo-Hyung;Han Seung-Wan;Nam Taek-Yong;Son Hwa-Jeong;Oh Sung-Ryul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.409-416
    • /
    • 2006
  • In this paper, we propose an algorithm for extracting text regions from spam-mail images using color layer. The CLTE(color layer-based text extraction) divides the input image into eight planes as color layers. It extracts connected components on the eight images, and then classifies them into text regions and non-text regions based on the component sizes. We also propose an algorithm for recovering damaged text strokes from the extracted text image. In the binary image, there are two types of damaged strokes: (1) middle strokes such as 'ㅣ' or 'ㅡ' are deleted, and (2) the first and/or last strokes such as 'ㅇ' or 'ㅁ' are filled with black pixels. An experiment with 200 spam-mail images shows that the proposed approach is more accurate than conventional methods by over 10%.

Downscaling Forgery Detection using Pixel Value's Gradients of Digital Image (디지털 영상 픽셀값의 경사도를 이용한 Downscaling Forgery 검출)

  • RHEE, Kang Hyeon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.2
    • /
    • pp.47-52
    • /
    • 2016
  • The used digital images in the smart device and small displayer has been a downscaled image. In this paper, the detection of the downscaling image forgery is proposed using the feature vector according to the pixel value's gradients. In the proposed algorithm, AR (Autoregressive) coefficients are computed from pixel value's gradients of the image. These coefficients as the feature vectors are used in the learning of a SVM (Support Vector Machine) classification for the downscaling image forgery detector. On the performance of the proposed algorithm, it is excellent at the downscaling 90% image forgery compare to MFR (Median Filter Residual) scheme that had the same 10-Dim. feature vectors and 686-Dim. SPAM (Subtractive Pixel Adjacency Matrix) scheme. In averaging filtering ($3{\times}3$) and median filtering ($3{\times}3$) images, it has a higher detection ratio. Especially, the measured performances of all items in averaging and median filtering ($3{\times}3$), AUC (Area Under Curve) by the sensitivity and 1-specificity is approached to 1. Thus, it is confirmed that the grade evaluation of the proposed algorithm is 'Excellent (A)'.

Forensic Classification of Median Filtering by Hough Transform of Digital Image (디지털 영상의 허프 변환에 의한 미디언 필터링 포렌식 분류)

  • RHEE, Kang Hyeon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.5
    • /
    • pp.42-47
    • /
    • 2017
  • In the distribution of digital image, the median filtering is used for a forgery. This paper proposed the algorithm of a image forensics detection for the classification of median filtering. For the solution of this grave problem, the feature vector is composed of 42-Dim. The detected quantity 32, 64 and 128 of forgery image edges, respectively, which are processed by the Hough transform, then it extracted from the start-end point coordinates of the Hough Lines. Also, the Hough Peaks of the Angle-Distance plane are extracted. Subsequently, both of the feature vectors are composed of the proposed scheme. The defined 42-Dim. feature vector is trained in SVM (Support Vector Machine) classifier for the MF classification of the forged images. The experimental results of the proposed MF detection algorithm is compared between the 10-Dim. MFR and the 686-Dim. SPAM. It confirmed that the MF forensic classification ratio of the evaluated performance is 99% above with the whole test image types: the unaltered, the average filtering ($3{\times}3$), the JPEG (QF=90 and 70)) compression, the Gaussian filtered ($3{\times}3$ and $5{\times}5$) images, respectively.

On the Security of Image-based CAPTCHA using Multi-image Composition (복수의 이미지를 합성하여 사용하는 캡차의 안전성 검증)

  • Byun, Je-Sung;Kang, Jeon-Il;Nyang, Dae-Hun;Lee, Kyung-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.4
    • /
    • pp.761-770
    • /
    • 2012
  • CAPTCHAs(Completely Automated Public Turing tests to tell Computer and Human Apart) have been widely used for preventing the automated attacks such as spam mails, DDoS attacks, etc.. In the early stages, the text-based CAPTCHAs that were made by distorting random characters were mainly used for frustrating automated-bots. Many researches, however, showed that the text-based CAPTCHAs were breakable via AI or image processing techniques. Due to the reason, the image-based CAPTCHAs, which employ images instead of texts, have been considered and suggested. In many image-based CAPTCHAs, however, the huge number of source images are required to guarantee a fair level of security. In 2008, Kang et al. suggested a new image-based CAPTCHA that uses test images made by composing multiple source images, to reduce the number of source images while it guarantees the security level. In their paper, the authors showed the convenience of their CAPTCHA in use through the use study, but they did not verify its security level. In this paper, we verify the security of the image-based CAPTCHA suggested by Kang et al. by performing several attacks in various scenarios and consider other possible attacks that can happen in the real world.

Probabilistic Anatomical Labeling of Brain Structures Using Statistical Probabilistic Anatomical Maps (확률 뇌 지도를 이용한 뇌 영역의 위치 정보 추출)

  • Kim, Jin-Su;Lee, Dong-Soo;Lee, Byung-Il;Lee, Jae-Sung;Shin, Hee-Won;Chung, June-Key;Lee, Myung-Chul
    • The Korean Journal of Nuclear Medicine
    • /
    • v.36 no.6
    • /
    • pp.317-324
    • /
    • 2002
  • Purpose: The use of statistical parametric mapping (SPM) program has increased for the analysis of brain PET and SPECT images. Montreal Neurological Institute (MNI) coordinate is used in SPM program as a standard anatomical framework. While the most researchers look up Talairach atlas to report the localization of the activations detected in SPM program, there is significant disparity between MNI templates and Talairach atlas. That disparity between Talairach and MNI coordinates makes the interpretation of SPM result time consuming, subjective and inaccurate. The purpose of this study was to develop a program to provide objective anatomical information of each x-y-z position in ICBM coordinate. Materials and Methods: Program was designed to provide the anatomical information for the given x-y-z position in MNI coordinate based on the Statistical Probabilistic Anatomical Map (SPAM) images of ICBM. When x-y-z position was given to the program, names of the anatomical structures with non-zero probability and the probabilities that the given position belongs to the structures were tabulated. The program was coded using IDL and JAVA language for 4he easy transplantation to any operating system or platform. Utility of this program was shown by comparing the results of this program to those of SPM program. Preliminary validation study was peformed by applying this program to the analysis of PET brain activation study of human memory in which the anatomical information on the activated areas are previously known. Results: Real time retrieval of probabilistic information with 1 mm spatial resolution was archived using the programs. Validation study showed the relevance of this program: probability that the activated area for memory belonged to hippocampal formation was more than 80%. Conclusion: These programs will be useful for the result interpretation of the image analysis peformed on MNI coordinate, as done in SPM program.