• Title/Summary/Keyword: Korean parsing

Search Result 325, Processing Time 0.022 seconds

Protocol Monitor System Between Cortex M7 Based PLC And HMI

  • Kim, Ki-Su;Lee, Jong-Chan;Ha, Heon-Seong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.6
    • /
    • pp.17-23
    • /
    • 2020
  • In this paper, collecting real-time data frames that occur during RS232 communication between an HMI and PLC of automation equipment by sniffing real-time information data frames through MCU without modification of the HMI or PLC, a method is proposed that allows users to collect data without being dependent on the modification of PLC and HMI systems. The user collects necessary information from the sniffing data through the parsing operation, and the original communication interface is maintained by transmitting the corresponding sniffing frame to the destination. The MCU's UART communication interface circuit is physically designed according to the RS232 communication standard, and this additionally improves efficiency more so than an interrupt-based system by using the DMA device inside the MCU. In addition, the data frame IO operation is performed by logically separating the work of the DMA interrupt service routine from the work of the main thread using the circular queue. Through this method, the user receives the sniffing data frame between the HMI and PLC in RS232 format, and the frame transfer between PLC and HMI arrives normally at the original destination. By sniffing the data frame without further modification of the PLC and HMI, it can be confirmed that it arrives at the user system normally.

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining (베이지안 확률 및 폐쇄 순차패턴 마이닝 방식을 이용한 설명가능한 로그 이상탐지 시스템)

  • Yun, Jiyoung;Shin, Gun-Yoon;Kim, Dong-Wook;Kim, Sang-Soo;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.22 no.2
    • /
    • pp.77-87
    • /
    • 2021
  • With the development of the Internet and personal computers, various and complex attacks begin to emerge. As the attacks become more complex, signature-based detection become difficult. It leads to the research on behavior-based log anomaly detection. Recent work utilizes deep learning to learn the order and it shows good performance. Despite its good performance, it does not provide any explanation for prediction. The lack of explanation can occur difficulty of finding contamination of data or the vulnerability of the model itself. As a result, the users lose their reliability of the model. To address this problem, this work proposes an explainable log anomaly detection system. In this study, log parsing is the first to proceed. Afterward, sequential rules are extracted by Bayesian posterior probability. As a result, the "If condition then results, post-probability" type rule set is extracted. If the sample is matched to the ruleset, it is normal, otherwise, it is an anomaly. We utilize HDFS datasets for the experiment, resulting in F1score 92.7% in test dataset.

A Study on the Feature Point Extraction Methodology based on XML for Searching Hidden Vault Anti-Forensics Apps (은닉형 Vault 안티포렌식 앱 탐색을 위한 XML 기반 특징점 추출 방법론 연구)

  • Kim, Dae-gyu;Kim, Chang-soo
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.61-70
    • /
    • 2022
  • General users who use smartphone apps often use the Vault app to protect personal information such as photos and videos owned by individuals. However, there are increasing cases of criminals using the Vault app function for anti-forensic purposes to hide illegal videos. These apps are one of the apps registered on Google Play. This paper proposes a methodology for extracting feature points through XML-based keyword frequency analysis to explore Vault apps used by criminals, and text mining techniques are applied to extract feature points. In this paper, XML syntax was compared and analyzed using strings.xml files included in the app for 15 hidden Vault anti-forensics apps and non-hidden Vault apps, respectively. In hidden Vault anti-forensics apps, more hidden-related words are found at a higher frequency in the first and second rounds of terminology processing. Unlike most conventional methods of static analysis of APK files from an engineering point of view, this paper is meaningful in that it approached from a humanities and sociological point of view to find a feature of classifying anti-forensics apps. In conclusion, applying text mining techniques through XML parsing can be used as basic data for exploring hidden Vault anti-forensics apps.

Design and Implementation of Content-based Video Database using an Integrated Video Indexing Method (통합된 비디오 인덱싱 방법을 이용한 내용기반 비디오 데이타베이스의 설계 및 구현)

  • Lee, Tae-Dong;Kim, Min-Koo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.6
    • /
    • pp.661-683
    • /
    • 2001
  • There is a rapid increase in the use of digital video information in recent years, it becomes more important to manage video databases efficiently. The development of high speed data network and digital techniques has emerged new multimedia applications such as internet broadcasting, Video On Demand(VOD) combined with video data processing and computer. Video database should be construct for searching fast, efficient video be extract the accurate feature information of video with more massive and more complex characteristics. Video database are essential differences between video databases and traditional databases. These differences lead to interesting new issues in searching of video, data modeling. So, cause us to consider new generation method of database, efficient retrieval method of video. In this paper, We propose the construction and generation method of the video database based on contents which is able to accumulate the meaningful structure of video and the prior production information. And by the proposed the construction and generation method of the video database implemented the video database which can produce the new contents for the internet broadcasting centralized on the video database. For this production, We proposed the video indexing method which integrates the annotation-based retrieval and the content-based retrieval in order to extract and retrieval the feature information of the video data using the relationship between the meaningful structure and the prior production information on the process of the video parsing and extracting the representative key frame. We can improve the performance of the video contents retrieval, because the integrated video indexing method is using the content-based metadata type represented in the low level of video and the annotation-based metadata type impressed in the high level which is difficult to extract the feature information of the video at he same time.

  • PDF

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.