• Title/Summary/Keyword: Sequence information

Search Result 3,994, Processing Time 0.038 seconds

Mining High Utility Sequential Patterns Using Sequence Utility Lists (시퀀스 유틸리티 리스트를 사용하여 높은 유틸리티 순차 패턴 탐사 기법)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.51-62
    • /
    • 2018
  • High utility sequential pattern (HUSP) mining has been considered as an important research topic in data mining. Although some algorithms have been proposed for this topic, they incur the problem of producing a large search space for HUSPs. The tighter utility upper bound of a sequence can prune more unpromising patterns early in the search space. In this paper, we propose a sequence expected utility (SEU) as a new utility upper bound of each sequence, which is the maximum expected utility of a sequence and all its descendant sequences. A sequence utility list for each pattern is used as a new data structure to maintain essential information for mining HUSPs. We devise an algorithm, high sequence utility list-span (HSUL-Span), to identify HUSPs by employing SEU. Experimental results on both synthetic and real datasets from different domains show that HSUL-Span generates considerably less candidate patterns and outperforms other algorithms in terms of execution time.

The Specification of Air-to-Air Combat Tactics Using UML Sequence Diagram (UML Sequence Diagram을 활용한 공대공 교전 전술 명세)

  • Park, Myunghwan;Oh, Jihyun;Kim, Cheonyoung;Seol, Hyeonju
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.6
    • /
    • pp.664-675
    • /
    • 2021
  • Air force air-to-air combat tactics are occurring at a high speed in three-dimensional space. The specification of the tactics requires dealing with a quite amount of information, which makes it a challenge to accurately describe the maneuvering procedure of the tactics. The specification of air-to-air tactics using natural languages is not suitable because of the intrinsic ambiguity of natural languages. Therefore, this paper proposes an approach of using UML Sequence Diagram to describe air-to-air combat tactics. Since the current Sequence Diagram notation is not sufficient to express all aspects of the tactics, we extend the syntax of the Sequence Diagram to accommodate the required features of air-to-air combat tactics. We evaluate the applicability of the extended Sequence Diagram to air-to-air combat tactics using a case example, that is the manned-unmanned teaming combat tactic. The result shows that Sequence Diagram specification is more advantageous than natural language specification in terms of readability, conciseness, and accuracy. However, the expressiveness of the Sequence Diagram is evaluated to be less powerful than natural language, requiring further study to address this issue.

The Sequence Labeling Approach for Text Alignment of Plagiarism Detection

  • Kong, Leilei;Han, Zhongyuan;Qi, Haoliang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.9
    • /
    • pp.4814-4832
    • /
    • 2019
  • Plagiarism detection is increasingly exploiting text alignment. Text alignment involves extracting the plagiarism passages in a pair of the suspicious document and its source document. The heuristics have achieved excellent performance in text alignment. However, the further improvements of the heuristic methods mainly depends more on the experiences of experts, which makes the heuristics lack of the abilities for continuous improvements. To address this problem, machine learning maybe a proper way. Considering the position relations and the context of text segments pairs, we formalize the text alignment task as a problem of sequence labeling, improving the current methods at the model level. Especially, this paper proposes to use the probabilistic graphical model to tag the observed sequence of pairs of text segments. Hence we present the sequence labeling approach for text alignment in plagiarism detection based on Conditional Random Fields. The proposed approach is evaluated on the PAN@CLEF 2012 artificial high obfuscation plagiarism corpus and the simulated paraphrase plagiarism corpus, and compared with the methods achieved the best performance in PAN@CLEF 2012, 2013 and 2014. Experimental results demonstrate that the proposed approach significantly outperforms the state of the art methods.

Improving Malicious Web Code Classification with Sequence by Machine Learning

  • Paik, Incheon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.5
    • /
    • pp.319-324
    • /
    • 2014
  • Web applications make life more convenient. Many web applications have several kinds of user input (e.g. personal information, a user's comment of commercial goods, etc.) for the activities. On the other hand, there are a range of vulnerabilities in the input functions of Web applications. Malicious actions can be attempted using the free accessibility of many web applications. Attacks by the exploitation of these input vulnerabilities can be achieved by injecting malicious web code; it enables one to perform a variety of illegal actions, such as SQL Injection Attacks (SQLIAs) and Cross Site Scripting (XSS). These actions come down to theft, replacing personal information, or phishing. The existing solutions use a parser for the code, are limited to fixed and very small patterns, and are difficult to adapt to variations. A machine learning method can give leverage to cover a far broader range of malicious web code and is easy to adapt to variations and changes. Therefore, this paper suggests the adaptable classification of malicious web code by machine learning approaches for detecting the exploitation user inputs. The approach usually identifies the "looks-like malicious" code for real malicious code. More detailed classification using sequence information is also introduced. The precision for the "looks-like malicious code" is 99% and for the precise classification with sequence is 90%.

Sequence driven features for prediction of subcellular localization of proteins (단백질의 세포내 소 기관별 분포 예측을 위한 서열 기반의 특징 추출 방법)

  • Kim, Jong-Kyoung;Choi, Seung-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.226-228
    • /
    • 2005
  • Predicting the cellular location of an unknown protein gives valuable information for inferring the possible function of the protein. For more accurate Prediction system, we need a good feature extraction method that transforms the raw sequence data into the numerical feature vector, minimizing information loss. In this paper we propose new methods of extracting underlying features only from the sequence data by computing pairwise sequence alignment scores. In addition, we use composition based features to improve prediction accuracy. To construct an SVM ensemble from separately trained SVM classifiers, we propose specificity based weighted majority voting . The overall prediction accuracy evaluated by the 5-fold cross-validation reached $88.53\%$ for the eukaryotic animal data set. By comparing the prediction accuracy of various feature extraction methods, we could get the biological insight on the location of targeting information. Our numerical experiments confirm that our new feature extraction methods are very useful forpredicting subcellular localization of proteins.

  • PDF

Development of the Recommender System of Arabic Books Based on the Content Similarity

  • Alotaibi, Shaykhah Hajed;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.175-186
    • /
    • 2022
  • This research article develops an Arabic books' recommendation system, which is based on the content similarity that assists users to search for the right book and predict the appropriate and suitable books pertaining to their literary style. In fact, the system directs its users toward books, which can meet their needs from a large dataset of Information. Further, this system makes its predictions based on a set of data that is gathered from different books and converts it to vectors by using the TF-IDF system. After that, the recommendation algorithms such as the cosine similarity, the sequence matcher similarity, and the semantic similarity aggregate data to produce an efficient and effective recommendation. This approach is advantageous in recommending previously unrated books to users with unique interests. It is found to be proven from the obtained results that the results of the cosine similarity of the full content of books, the results of the sequence matcher similarity of Arabic titles of the books, and the results of the semantic similarity of English titles of the books are the best obtained results, and extremely close to the average of the result related to the human assigned/annotated similarity. Flask web application is developed with a simple interface to show the recommended Arabic books by using cosine similarity, sequence matcher similarity, and semantic similarity algorithms with all experiments that are conducted.

M-sequence and its applications to nonlinear system identification

  • Kashiwagi, Hiroshi
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1994.10a
    • /
    • pp.7-12
    • /
    • 1994
  • This paper describes an outline of pseudorandom M-sequence and its applications to measurement and control engineering. At first, generation and properties of M-sequence is briefly described and then its applications to delay time measurement, information transmission by use of M-array, two dimensional positioning, fault detection of logical circuit, fault detection of RAM, linear and nonlinear system identification.

  • PDF

MODEL FOR DESIGN MANAGEMENT IN COLLABORATIVE ENVIRONMENT USING DESIGN STRUCTURE MATRIX AND DESIGN PARAMETERS' INFORMATION

  • Salman Akram;Jeonghwan Kim;Jongwon Seo
    • International conference on construction engineering and project management
    • /
    • 2009.05a
    • /
    • pp.1307-1312
    • /
    • 2009
  • Design is an act based on multidisciplinary information. The involvement of various stakeholders makes it difficult to process, plan, and integrate. Iteration is frequent in most of the engineering design and development projects including construction. Design iterations cause rework, and extra efforts are required to get the optimal sequence and to manage the projects. The simple project management techniques are insufficient to fulfill the requirements of integrated design. This paper entails two things: design structure matrix and design parameters' information based model. The emphasis has been given to optimal sequence and crucial iteration using design structure matrix analysis technique. The design projects have been studied using survey data from industry. The optimal sequence and crucial iterations results have been utilized for proposed model. Model integrates two things: information about produced- required key design parameters and information of design changes during the design process. It will help to get familiar with Design management in order to fulfill contemporary needs.

  • PDF

Performance Comparison and Improvement of STDR/SSTDR Schemes Using Various Sequences (여러 가지 수열을 적용한 STDR/SSTDR 기법의 성능 비교 및 개선)

  • Han, Jeong Jae;Park, So Ryoung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39A no.11
    • /
    • pp.637-644
    • /
    • 2014
  • This paper investigates the detection performance of fault location using STDR(sequence time domain reflectometry) and SSTDR(spread spectrum time domain reflectometry) with various length and types of sequences, and then, proposes an improved detection technique by eliminating the injected signal in SSTDR. The detection error rates are compared and analyzed in power line channel model with various fault locations, fault types, and spreading sequences such as m-sequence, binary Barker sequence, and 4-phase Frank sequence. It is shown that the proposed technique is able to improve the detection performance obviously when the reflected signal is weak or the fault location is extremely close.