• Title/Summary/Keyword: Text Spam

Search Result 37, Processing Time 0.022 seconds

An Image-based CAPTCHA System with Correction of Sub-images (서브 이미지의 교정을 통한 이미지 기반의 CAPTCHA 시스템)

  • Chung, Woo-Keun;Ji, Seung-Hyun;Cho, Hwan-Gue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.8
    • /
    • pp.873-877
    • /
    • 2010
  • CAPTCHA is a security tool that prevents the automatic sign-up by a spam or a robot. This CAPTCHA usually depends on the smart readability of humans. However, the common and plain CAPTCHA with text-based system is not difficult to be solved by intelligent web-bot and machine learning tools. In this paper, we propose a new sub-image based CAPTCHA system totally different from the text based system. Our system offers a set of cropped sub-image from a whole digital picture and asks user to identify the correct orientation. Though there are some nice machine learning tools for this job, but they are useless for a cropped sub-images, which was clearly revealed by our experiment. Experiment showed that our sub-image based CAPTCHA is easy to human solver, but very hard to all kinds of machine learning or AI tools. Also our CAPTCHA is easy to be generated automatical without any human intervention.

Automatic Inter-Phoneme Similarity Calculation Method Using PAM Matrix Model (PAM 행렬 모델을 이용한 음소 간 유사도 자동 계산 기법)

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.3
    • /
    • pp.34-43
    • /
    • 2012
  • Determining the similarity between two strings can be applied various area such as information retrieval, spell checker and spam filtering. Similarity calculation between Korean strings based on dynamic programming methods firstly requires a definition of the similarity between phonemes. However, existing methods have a limitation that they use manually set similarity scores. In this paper, we propose a method to automatically calculate inter-phoneme similarity from a given set of variant words using a PAM-like probabilistic model. Our proposed method first finds the pairs of similar words from a given word set, and derives derivation rules from text alignment results among the similar word pairs. Then, similarity scores are calculated from the frequencies of variations between different phonemes. As an experimental result, we show an improvement of 10.1%~14.1% and 8.1%~11.8% in terms of sensitivity compared with the simple match-mismatch scoring scheme and the manually set inter-phoneme similarity scheme, respectively, with a specificity of 77.2%~80.4%.

Design and Implementation of Text Classification System based on ETOM+RPost (ETOM+RPost기반의 문서분류시스템의 설계 및 구현)

  • Choi, Yun-Jeong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.2
    • /
    • pp.517-524
    • /
    • 2010
  • Recently, the size of online texts and textual information is increasing explosively, and the automated classification has a great potential for handling data such as news materials and images. Text classification system is based on supervised learning which needs laborous work by human expert. The main goal of this paper is to reduce the manual intervention, required for the task. The other goal is to increase accuracy to be high. Most of the documents have high complexity in contents and the high similarities in their described style. So, the classification results are not satisfactory. This paper shows the implementation of classification system based on ETOM+RPost algorithm and classification progress using SPAM data. In experiments, we verified our system with right-training documents and wrong-training documents. The experimental results show that our system has high accuracy and stability in all situation as 16% improvement in accuracy.

On the Security of Image-based CAPTCHA using Multi-image Composition (복수의 이미지를 합성하여 사용하는 캡차의 안전성 검증)

  • Byun, Je-Sung;Kang, Jeon-Il;Nyang, Dae-Hun;Lee, Kyung-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.4
    • /
    • pp.761-770
    • /
    • 2012
  • CAPTCHAs(Completely Automated Public Turing tests to tell Computer and Human Apart) have been widely used for preventing the automated attacks such as spam mails, DDoS attacks, etc.. In the early stages, the text-based CAPTCHAs that were made by distorting random characters were mainly used for frustrating automated-bots. Many researches, however, showed that the text-based CAPTCHAs were breakable via AI or image processing techniques. Due to the reason, the image-based CAPTCHAs, which employ images instead of texts, have been considered and suggested. In many image-based CAPTCHAs, however, the huge number of source images are required to guarantee a fair level of security. In 2008, Kang et al. suggested a new image-based CAPTCHA that uses test images made by composing multiple source images, to reduce the number of source images while it guarantees the security level. In their paper, the authors showed the convenience of their CAPTCHA in use through the use study, but they did not verify its security level. In this paper, we verify the security of the image-based CAPTCHA suggested by Kang et al. by performing several attacks in various scenarios and consider other possible attacks that can happen in the real world.

A Text Mining-based Intrusion Log Recommendation in Digital Forensics (디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천)

  • Ko, Sujeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.6
    • /
    • pp.279-290
    • /
    • 2013
  • In digital forensics log files have been stored as a form of large data for the purpose of tracing users' past behaviors. It is difficult for investigators to manually analysis the large log data without clues. In this paper, we propose a text mining technique for extracting intrusion logs from a large log set to recommend reliable evidences to investigators. In the training stage, the proposed method extracts intrusion association words from a training log set by using Apriori algorithm after preprocessing and the probability of intrusion for association words are computed by combining support and confidence. Robinson's method of computing confidences for filtering spam mails is applied to extracting intrusion logs in the proposed method. As the results, the association word knowledge base is constructed by including the weights of the probability of intrusion for association words to improve the accuracy. In the test stage, the probability of intrusion logs and the probability of normal logs in a test log set are computed by Fisher's inverse chi-square classification algorithm based on the association word knowledge base respectively and intrusion logs are extracted from combining the results. Then, the intrusion logs are recommended to investigators. The proposed method uses a training method of clearly analyzing the meaning of data from an unstructured large log data. As the results, it complements the problem of reduction in accuracy caused by data ambiguity. In addition, the proposed method recommends intrusion logs by using Fisher's inverse chi-square classification algorithm. So, it reduces the rate of false positive(FP) and decreases in laborious effort to extract evidences manually.

A Classification Model for Attack Mail Detection based on the Authorship Analysis (작성자 분석 기반의 공격 메일 탐지를 위한 분류 모델)

  • Hong, Sung-Sam;Shin, Gun-Yoon;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.35-46
    • /
    • 2017
  • Recently, attackers using malicious code in cyber security have been increased by attaching malicious code to a mail and inducing the user to execute it. Especially, it is dangerous because it is easy to execute by attaching a document type file. The author analysis is a research area that is being studied in NLP (Neutral Language Process) and text mining, and it studies methods of analyzing authors by analyzing text sentences, texts, and documents in a specific language. In case of attack mail, it is created by the attacker. Therefore, by analyzing the contents of the mail and the attached document file and identifying the corresponding author, it is possible to discover more distinctive features from the normal mail and improve the detection accuracy. In this pager, we proposed IADA2(Intelligent Attack mail Detection based on Authorship Analysis) model for attack mail detection. The feature vector that can classify and detect attack mail from the features used in the existing machine learning based spam detection model and the features used in the author analysis of the document and the IADA2 detection model. We have improved the detection models of attack mails by simply detecting term features and extracted features that reflect the sequence characteristics of words by applying n-grams. Result of experiment show that the proposed method improves performance according to feature combinations, feature selection techniques, and appropriate models.

The Design and Implementation of a Effective web-based electronic mailing system (효율적인 웹기반 전자 우편 시스템의 설계 및 구현)

  • An, Syung-Og;Yoo, Sung-Jung;Yoo, Hyun-Ggung
    • The Journal of Engineering Research
    • /
    • v.4 no.1
    • /
    • pp.5-22
    • /
    • 2002
  • With the rapid advance of internet service and the corresponding migration of service environment from the text-based one to WWW (World Wide Web) environment, the number of internet users is growing rapidly due to its easy usage. Accordingly, usage of internet as services for sending electronic mails to the other party over the network is becoming increasingly prevalent. Web-based electronic mailing system is comprised of a server and a client. The former provides the users with e-mail accounts and services, while the latter serves as a user interface. In other words, it enables those public users who dos not own e-mail accounts on the existing mail server to have an access to the mailing service through the web. In this paper, we designed a effective web-based electronic mailing system which is based on the internet explorer and linux operating system, which overcomes limitations of the existing e-mail systems and meets the need of a cost-efficient alternative. Our electronic mailing system also supports the convenience of users through appropriate handling of preregistered spam e-mails and multiple e-mails, and this facilitates the development of a stable e-mail system by being able to avoiding the low system performance due to the bursty characteristics of e-mail messages and the increasing number of users

  • PDF