• Title/Summary/Keyword: Word Detection

Search Result 220, Processing Time 0.023 seconds

GCNXSS: An Attack Detection Approach for Cross-Site Scripting Based on Graph Convolutional Networks

  • Pan, Hongyu;Fang, Yong;Huang, Cheng;Guo, Wenbo;Wan, Xuelin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.4008-4023
    • /
    • 2022
  • Since machine learning was introduced into cross-site scripting (XSS) attack detection, many researchers have conducted related studies and achieved significant results, such as saving time and labor costs by not maintaining a rule database, which is required by traditional XSS attack detection methods. However, this topic came across some problems, such as poor generalization ability, significant false negative rate (FNR) and false positive rate (FPR). Moreover, the automatic clustering property of graph convolutional networks (GCN) has attracted the attention of researchers. In the field of natural language process (NLP), the results of graph embedding based on GCN are automatically clustered in space without any training, which means that text data can be classified just by the embedding process based on GCN. Previously, other methods required training with the help of labeled data after embedding to complete data classification. With the help of the GCN auto-clustering feature and labeled data, this research proposes an approach to detect XSS attacks (called GCNXSS) to mine the dependencies between the units that constitute an XSS payload. First, GCNXSS transforms a URL into a word homogeneous graph based on word co-occurrence relationships. Then, GCNXSS inputs the graph into the GCN model for graph embedding and gets the classification results. Experimental results show that GCNXSS achieved successful results with accuracy, precision, recall, F1-score, FNR, FPR, and predicted time scores of 99.97%, 99.75%, 99.97%, 99.86%, 0.03%, 0.03%, and 0.0461ms. Compared with existing methods, GCNXSS has a lower FNR and FPR with stronger generalization ability.

Design of a Korean Speech Recognition Platform (한국어 음성인식 플랫폼의 설계)

  • Kwon Oh-Wook;Kim Hoi-Rin;Yoo Changdong;Kim Bong-Wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.51
    • /
    • pp.151-165
    • /
    • 2004
  • For educational and research purposes, a Korean speech recognition platform is designed. It is based on an object-oriented architecture and can be easily modified so that researchers can readily evaluate the performance of a recognition algorithm of interest. This platform will save development time for many who are interested in speech recognition. The platform includes the following modules: Noise reduction, end-point detection, met-frequency cepstral coefficient (MFCC) and perceptually linear prediction (PLP)-based feature extraction, hidden Markov model (HMM)-based acoustic modeling, n-gram language modeling, n-best search, and Korean language processing. The decoder of the platform can handle both lexical search trees for large vocabulary speech recognition and finite-state networks for small-to-medium vocabulary speech recognition. It performs word-dependent n-best search algorithm with a bigram language model in the first forward search stage and then extracts a word lattice and restores each lattice path with a trigram language model in the second stage.

  • PDF

A Study on the Synchronous Signal Detection and Error Correction in Radio Data System (RDS 수신 시스템에서 동기식 신호복원과 에러정정에 관한 연구)

  • 김기근;류흥균
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.29A no.8
    • /
    • pp.1-9
    • /
    • 1992
  • Radio data system is a next-generation broadcasting system of digital information communication which multiplexes the digital data into the FM stereo signal in VHF/FM band and provides important and convenient service features. And radio data are composed of groups which are divided into 4 blocks with information word and check word. In this paper, radio data receiver is developed which recovers and process radio data to provide services. Then we confirm that 7dB SNR is required to be 10S0-5TBER of demodulation. Deconding process of shortened-cyclic-decoder has been simulated by computer. Also, the time-compression (by 16 times) method has been adopted for the RDS features post-processing. Via the error probability calculation, simulation and experimentation, the developed receiver system is proved to satisfy the system specification of EBU and implemented by general logic gates and analog circuits.

  • PDF

Profane or Not: Improving Korean Profane Detection using Deep Learning

  • Woo, Jiyoung;Park, Sung Hee;Kim, Huy Kang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.305-318
    • /
    • 2022
  • Abusive behaviors have become a common issue in many online social media platforms. Profanity is common form of abusive behavior in online. Social media platforms operate the filtering system using popular profanity words lists, but this method has drawbacks that it can be bypassed using an altered form and it can detect normal sentences as profanity. Especially in Korean language, the syllable is composed of graphemes and words are composed of multiple syllables, it can be decomposed into graphemes without impairing the transmission of meaning, and the form of a profane word can be seen as a different meaning in a sentence. This work focuses on the problem of filtering system mis-detecting normal phrases with profane phrases. For that, we proposed the deep learning-based framework including grapheme and syllable separation-based word embedding and appropriate CNN structure. The proposed model was evaluated on the chatting contents from the one of the famous online games in South Korea and generated 90.4% accuracy.

Word Extraction from Table Regions in Document Images (문서 영상 내 테이블 영역에서의 단어 추출)

  • Jeong, Chang-Bu;Kim, Soo-Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.369-378
    • /
    • 2005
  • Document image is segmented and classified into text, picture, or table by a document layout analysis, and the words in table regions are significant for keyword spotting because they are more meaningful than the words in other regions. This paper proposes a method to extract words from table regions in document images. As word extraction from table regions is practically regarded extracting words from cell regions composing the table, it is necessary to extract the cell correctly. In the cell extraction module, table frame is extracted first by analyzing connected components, and then the intersection points are extracted from the table frame. We modify the false intersections using the correlation between the neighboring intersections, and extract the cells using the information of intersections. Text regions in the individual cells are located by using the connected components information that was obtained during the cell extraction module, and they are segmented into text lines by using projection profiles. Finally we divide the segmented lines into words using gap clustering and special symbol detection. The experiment performed on In table images that are extracted from Korean documents, and shows $99.16\%$ accuracy of word extraction.

Linguistic Features Discrimination for Social Issue Risk Classification (사회적 이슈 리스크 유형 분류를 위한 어휘 자질 선별)

  • Oh, Hyo-Jung;Yun, Bo-Hyun;Kim, Chan-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.541-548
    • /
    • 2016
  • The use of social media is already essential as a source of information for listening user's various opinions and monitoring. We define social 'risks' that issues effect negative influences for public opinion in social media. This paper aims to discriminate various linguistic features and reveal their effects for building an automatic classification model of social risks. Expecially we adopt a word embedding technique for representation of linguistic clues in risk sentences. As a preliminary experiment to analyze characteristics of individual features, we revise errors in automatic linguistic analysis. At the result, the most important feature is NE (Named Entity) information and the best condition is when combine basic linguistic features. word embedding, and word clusters within core predicates. Experimental results under the real situation in social bigdata - including linguistic analysis errors - show 92.08% and 85.84% in precision respectively for frequent risk categories set and full test set.

A Detection Algorithm for Pulse Repetition Interval Sequence of Radar Signals based on Finite State Machine (유한 상태 머신 기반 레이더 신호의 펄스 반복 주기 검출 알고리즘)

  • Park, Sang-Hwan;Ju, Young-Kwan;Kim, Kwan-Tae;Jeon, Joongnam
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.7
    • /
    • pp.85-91
    • /
    • 2016
  • Typically, radar systems change the pulse repetition interval of their modulated signal in order to avoid detection. On the other hand the radar-signal detection system tries to detect the modulation pattern. The histogram or auto-correlation methods are usually used to detect the PRI pattern of the radar signal. However these methods tend to lost the sequence information of the PRI pulses. This paper proposes a PRI-sequence detection algorithm based on the finite-state machine that could detect not only the PRI pattern but also their sequence.

A study on Countermeasures by Detecting Trojan-type Downloader/Dropper Malicious Code

  • Kim, Hee Wan
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.288-294
    • /
    • 2021
  • There are various ways to be infected with malicious code due to the increase in Internet use, such as the web, affiliate programs, P2P, illegal software, DNS alteration of routers, word processor vulnerabilities, spam mail, and storage media. In addition, malicious codes are produced more easily than before through automatic generation programs due to evasion technology according to the advancement of production technology. In the past, the propagation speed of malicious code was slow, the infection route was limited, and the propagation technology had a simple structure, so there was enough time to study countermeasures. However, current malicious codes have become very intelligent by absorbing technologies such as concealment technology and self-transformation, causing problems such as distributed denial of service attacks (DDoS), spam sending and personal information theft. The existing malware detection technique, which is a signature detection technique, cannot respond when it encounters a malicious code whose attack pattern has been changed or a new type of malicious code. In addition, it is difficult to perform static analysis on malicious code to which code obfuscation, encryption, and packing techniques are applied to make malicious code analysis difficult. Therefore, in this paper, a method to detect malicious code through dynamic analysis and static analysis using Trojan-type Downloader/Dropper malicious code was showed, and suggested to malicious code detection and countermeasures.

A Study of Energy Parameter without Windowing Influence in Speech Signal (윈도우의 영향이 제거된 에너지 파라미터에 관한 연구)

  • 조태수;신동성;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.277-280
    • /
    • 2001
  • The preprocessing is very important course in speech signal processing. It influence the compression-rate in speech coding and the recognition-rate in speech recognition etc. In this paper, we propose that minimizing window-influence method with pitch period and start points. The proposed method is available for voiced detection and word labeling.

  • PDF

Design of New Channel Codes for Speed Up Coding Procedure (코딩 속도향상을 위한 채널 코드의 설계)

  • 공형윤;이창희
    • Proceedings of the IEEK Conference
    • /
    • 2000.06a
    • /
    • pp.5-8
    • /
    • 2000
  • In this paper, we present a new cぉnet coding method, so called MLC (Multi-Level Codes), for error detection and correction in digital wireless communication. MLC coding method we the same coding procedure wed in the convolutional coding but it is distinguished from the existing convolutional coding in point of generating the code word by using multi-level information data (M-ary signal) and in point of speed of coding procedure Through computer simulation, we analyze the performance of the coding method suggested here compared to convolutional coding method in case of modulo-operation and in case of non-binary coding Procedure respectively under various channel environments.

  • PDF