• Title/Summary/Keyword: Data Sequence

Search Result 3,089, Processing Time 0.033 seconds

Mining High Utility Sequential Patterns Using Sequence Utility Lists (시퀀스 유틸리티 리스트를 사용하여 높은 유틸리티 순차 패턴 탐사 기법)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.51-62
    • /
    • 2018
  • High utility sequential pattern (HUSP) mining has been considered as an important research topic in data mining. Although some algorithms have been proposed for this topic, they incur the problem of producing a large search space for HUSPs. The tighter utility upper bound of a sequence can prune more unpromising patterns early in the search space. In this paper, we propose a sequence expected utility (SEU) as a new utility upper bound of each sequence, which is the maximum expected utility of a sequence and all its descendant sequences. A sequence utility list for each pattern is used as a new data structure to maintain essential information for mining HUSPs. We devise an algorithm, high sequence utility list-span (HSUL-Span), to identify HUSPs by employing SEU. Experimental results on both synthetic and real datasets from different domains show that HSUL-Span generates considerably less candidate patterns and outperforms other algorithms in terms of execution time.

Predictive Convolutional Networks for Learning Stream Data (스트림 데이터 학습을 위한 예측적 컨볼루션 신경망)

  • Heo, Min-Oh;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.11
    • /
    • pp.614-618
    • /
    • 2016
  • As information on the internet and the data from smart devices are growing, the amount of stream data is also increasing in the real world. The stream data, which is a potentially large data, requires online learnable models and algorithms. In this paper, we propose a novel class of models: predictive convolutional neural networks to be able to perform online learning. These models are designed to deal with longer patterns as the layers become higher due to layering convolutional operations: detection and max-pooling on the time axis. As a preliminary check of the concept, we chose two-month gathered GPS data sequence as an observation sequence. On learning them with the proposed method, we compared the original sequence and the regenerated sequence from the abstract information of the models. The result shows that the models can encode long-range patterns, and can generate a raw observation sequence within a low error.

Design & Implementation of Extractor for Design Sequence of DB tables using Data Flow Diagrams (자료흐름도를 사용한 테이블 설계순서 추출기의 설계 및 구현)

  • Lim, Eun-Ki
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.3
    • /
    • pp.43-49
    • /
    • 2012
  • Information obtained from DFD(Data Flow Diagram) are very important in system maintenance, because most legacy systems are analyzed using DFD in structured analysis. In our thesis, we design and implement an extractor for design sequence of database tables using DFD. Our extractor gets DFDs as input data, transform them into a directed graph, and extract design sequence of DB tables. We show practicality of our extractor by applying it to a s/w system in operation.

Assembly Sequence Determination from Design Data Using Voxelization (복셀화를 통한 디자인 데이타로부터의 조립순서 결정)

  • Lee, Changho;Cho, Hyunbo;Jung, Mooyoung
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.13 no.6
    • /
    • pp.90-101
    • /
    • 1996
  • Determination of assembly sequence of components is a key issue in assembly operation. Although a number of articles dealing with assembly sequence determination have appeared, an efficient and general methodology for complex products has yet to appear. The objective of this paper is to present the problems and models used to generate assembly sequence from design data. An essential idea of this research is to acquire a finite number of voxels from any complex geometric entity, such as 3D planar polygons, hollow spheres, cylinders. cones, tori, etc. In order to find a feasible assembly sequence, the following four steps are needed: (1) The components composing of an assembly product are identified and then the geometric entities of each component are extracted. (2) The geometric entities extracted in the first step are translated into a number of voxels. (3) All the mating or coupling relations between components are found by considering relations between voxels. (4) The components to be disassembled are determined using CCGs (Component Coupling Graph).

  • PDF

DNA Information Hiding Method for DNA Data Storage (DNA 데이터 저장을 위한 DNA 정보 은닉 기법)

  • Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.10
    • /
    • pp.118-127
    • /
    • 2014
  • DNA data storage refers to any technique for storing massive digital data in base sequence of DNA and has been recognized as the future storage medium recently. This paper presents an information hiding method for DNA data storage that the massive data is hidden in non-coding strand based on DNA steganography. Our method maps the encrypted data to the data base sequence using the numerical mapping table and then hides it in the non-coding strand using the key that consists of the seed and sector length. Therefore, our method can preserve the protein, extract the hidden data without the knowledge of host DNA sequence, and detect the position of mutation error. Experimental results verify that our method has more high data capacity than conventional methods and also detects the positions of mutation errors by the parity bases.

The Conversion of a Set, a Sequence, and a Map in VDM to a Linked List in a Programming Language (VDM의 자료구조인 set, sequency, map의 프로그래밍 언어 자료구조인 linked list로의 변환)

  • Yu, Mun-Seong
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.421-426
    • /
    • 2001
  • A formal development method is used to develop software rigorously and systematically. In a formal development method, we specify system by a formal specification language and gradually develop the system more concretely until we can implement the system. VDM is one of formal specification languages. VDM uses mathematical data structures such as sets, sequences, and maps to specify the system, but most programming languages do not have such data structures. Therefore, these data structures should be converted. We can convert mathematical data structures in VDM to a linked list, a data structure in a programming language. In this article, we propose a method to convert a set, a sequence, and a map in VDM to a linked list in a programming language and prove the correctness of this conversion mathematically.

  • PDF

Building an Integrated Protein Data Management System Using the XPath Query Process

  • Cha Hyo Soung;Jung Kwang Su;Jung Young Jin;Ryu Keun Ho
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.99-102
    • /
    • 2004
  • Recently according to developing of bioinformatics techniques, there are a lot of researches about large amount of biological data. And a variety of files and databases are being used to manage these data efficiently. However, because of the deficiency of standardization there are a lot of problems to manage the data and transform one into the other among heterogeneous formats. We are interested in integrating. saving, and managing gene and protein sequence data generated through sequencing. Accordingly, in this paper the goal of our research is to implement the system to manage sequence data and transform a sequence file format into other format. To satisfy these requirements, we adopt BSML (Bioinformatics Sequence Markup Language) as the standard to manage the bioinformatics data. And then we integrate and store the heterogeneous 리at file formats using BSML schema based DTD. And we developed the system to apply the characteristics of object-oriented database and to process XPath query, one of the efficient structural query. that saves and manages XML documents easily.

  • PDF

Clustering Technique for Sequence Data Sets in Multidimensional Data Space (다차원 데이타 공간에서 시뭔스 데이타 세트를 위한 클러스터링 기법)

  • Lee, Seok-Lyong;LiIm, Tong-Hyeok;Chung, Chin-Wan
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.655-664
    • /
    • 2001
  • The continuous data such as video streams and voice analog signals can be modeled as multidimensional data sequences(MDS's) in the feature space, In this paper, we investigate the clustering technique for multidimensional data sequence, Each sequence is represented by a small number by hyper rectangular clusters for subsequent storage and similarity search processing. We present a linear clustering algorithm that guarantees a predefined level of clustering quality and show its effectiveness via experiments on various video data sets.

  • PDF

A Novel Hitting Frequency Point Collision Avoidance Method for Wireless Dual-Channel Networks

  • Quan, Hou-De;Du, Chuan-Bao;Cui, Pei-Zhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.3
    • /
    • pp.941-955
    • /
    • 2015
  • In dual-channel networks (DCNs), all frequency hopping (FH) sequences used for data channels are chosen from the original FH sequence used for the control channel by shifting different initial phases. As the number of data channels increases, the hitting frequency point problem becomes considerably serious because DCNs is non-orthogonal synchronization network and FH sequences are non-orthogonal. The increasing severity of the hitting frequency point problem consequently reduces the resource utilization efficiency. To solve this problem, we propose a novel hitting frequency point collision avoidance method, which consists of a sequence-selection strategy called sliding correlation (SC) and a collision avoidance strategy called keeping silent on hitting frequency point (KSHF). SC is used to find the optimal phase-shifted FH sequence with the minimum number of hitting frequency points for a new data channel. The hitting frequency points and their locations in this optimal sequence are also derived for KSHF according to SC strategy. In KSHF, the transceivers transmit or receive symbol information not on the hitting frequency point, but on the next frequency point during the next FH period. Analytical and simulation results demonstrate that unlike the traditional method, the proposed method can effectively reduce the number of hitting frequency points and improve the efficiency of the code resource utilization.

Applied Computational Tools for Crop Genome Research

  • Love Christopher G;Batley Jacqueline;Edwards David
    • Journal of Plant Biotechnology
    • /
    • v.5 no.4
    • /
    • pp.193-195
    • /
    • 2003
  • A major goal of agricultural biotechnology is the discovery of genes or genetic loci which are associated with characteristics beneficial to crop production. This knowledge of genetic loci may then be applied to improve crop breeding. Agriculturally important genes may also benefit crop production through transgenic technologies. Recent years have seen an application of high throughput technologies to agricultural biotechnology leading to the production of large amounts of genomic data. The challenge today is the effective structuring of this data to permit researchers to search, filter and importantly, make robust associations within a wide variety of datasets. At the Plant Biotechnology Centre, Primary Industries Research Victoria in Melbourne, Australia, we have developed a series of tools and computational pipelines to assist in the processing and structuring of genomic data to aid its application to agricultural biotechnology resear-ch. These tools include a sequence database, ASTRA, for the processing and annotation of expressed sequence tag data. Tools have also been developed for the discovery of simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) molecular markers from large sequence datasets. Application of these tools to Brassica research has assisted in the production of genetic and comparative physical maps as well as candidate gene discovery for a range of agronomically important traits.