• Title/Summary/Keyword: sequence database

Search Result 566, Processing Time 0.026 seconds

Design of Postal Address File for Address Interpretation and Retrieval (주소해석 및 검색을 위한 우편주소파일 설계)

  • Chang, Tai-Woo;Kim, Ho-Yon;Lim, Kil-Taek
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.74-88
    • /
    • 2007
  • In order to automate the process of mail sorting by delivery sequence, it is necessary to prepare a postal address database and to interpret written addresses on the mail-pieces with the database and OCR technology. The address database is a critical factor of automation and informatization of postal service since it could be used not only in address recognition but also in various mail processing. In this study, we design the schema of postal address database, design the postal address file based on it and explain the method of address interpretation and retrieval using it. We analyze infonnation requirements for transformation of postal address into the standardized format and consider them in the process of design. The postal address file can be used by address matching or retrieval system as well as by Hangul address recognition system for automation of delivery sequence mail-sorting.

  • PDF

A management Technique for Protein Version Information based on Local Sequence Alignment and Trigger (로컬 서열 정렬과 트리거 기반의 단백질 버전 정보 관리 기법)

  • Jung Kwang-Su;Park Sung-Hee;Ryu Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.12D no.1 s.97
    • /
    • pp.51-62
    • /
    • 2005
  • After figuring out the function of an amino acid sequence, we can infer the function of the other amino acids that have similar sequence composition. Besides, it is possible that we alter protein whose function we know, into useful protein using genetic engineering method. In this process. an original protein amino sequence produces various protein sequences that have different sequence composition. Here, a systematic technique is needed to manage protein version sequences and reference data of those sequences. Thus, in this paper we proposed a technique of managing protein version sequences based on local sequence alignment and a technique of managing protein historical reference data using Trigger This method automatically determines the similarity between an original sequence and each version sequence while the protein version sequences are stored into database. When this technique is employed, the storage space that stores protein sequences is also reduced. After storing the historical information of protein and analyzing the change of protein sequence, we expect that a new useful protein and drug are able to be discovered based on analysis of version sequence.

Automatic Vowel Sequence Reproduction for a Talking Robot Based on PARCOR Coefficient Template Matching

  • Vo, Nhu Thanh;Sawada, Hideyuki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.3
    • /
    • pp.215-221
    • /
    • 2016
  • This paper describes an automatic vowel sequence reproduction system for a talking robot built to reproduce the human voice based on the working behavior of the human articulatory system. A sound analysis system is developed to record a sentence spoken by a human (mainly vowel sequences in the Japanese language) and to then analyze that sentence to give the correct command packet so the talking robot can repeat it. An algorithm based on a short-time energy method is developed to separate and count sound phonemes. A matching template using partial correlation coefficients (PARCOR) is applied to detect a voice in the talking robot's database similar to the spoken voice. Combining the sound separation and counting the result with the detection of vowels in human speech, the talking robot can reproduce a vowel sequence similar to the one spoken by the human. Two tests to verify the working behavior of the robot are performed. The results of the tests indicate that the robot can repeat a sequence of vowels spoken by a human with an average success rate of more than 60%.

Bioinformatics for the Korean Functional Genomics Project

  • Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.45-52
    • /
    • 2000
  • Genomic approach produces massive amount of data within a short time period, New high-throughput automatic sequencers can generate over a million nucleotide sequence information overnight. A typical DNA chip experiment produces tens of thousands expression information, not to mention the tens of megabyte image files, These data must be handled automatically by computer and stored in electronic database, Thus there is a need for systematic approach of data collection, processing, and analysis. DNA sequence information is translated into amino acid sequence and is analyzed for key motif related to its biological and/or biochemical function. Functional genomics will play a significant role in identifying novel drug targets and diagnostic markers for serious diseases. As an enabling technology for functional genomics, bioinformatics is in great need worldwide, In Korea, a new functional genomics project has been recently launched and it focuses on identi☞ing genes associated with cancers prevalent in Korea, namely gastric and hepatic cancers, This involves gene discovery by high throughput sequencing of cancer cDNA libraries, gene expression profiling by DNA microarray and proteomics, and SNP profiling in Korea patient population, Our bioinformatics team will support all these activities by collecting, processing and analyzing these data.

  • PDF

Content similarity matching for video sequence identification

  • Kim, Sang-Hyun
    • International Journal of Contents
    • /
    • v.6 no.3
    • /
    • pp.5-9
    • /
    • 2010
  • To manage large database system with video, effective video indexing and retrieval are required. A large number of video retrieval algorithms have been presented for frame-wise user query or video content query, whereas a few video identification algorithms have been proposed for video sequence query. In this paper, we propose an effective video identification algorithm for video sequence query that employs the Cauchy function of histograms between successive frames and the modified Hausdorff distance. To effectively match the video sequences with a low computational load, we make use of the key frames extracted by the cumulative Cauchy function and compare the set of key frames using the modified Hausdorff distance. Experimental results with several color video sequences show that the proposed algorithm for video identification yields remarkably higher performance than conventional algorithms such as Euclidean metric, and directed divergence methods.

A Method for Time Warping Based Similarity Search in Sequence Databases (시퀀스 데이터베이스를 위한 타임 워핑 기반 유사 검색)

  • Kim, Sang-Wook;Park, Sang-Hyun
    • Journal of Industrial Technology
    • /
    • v.20 no.B
    • /
    • pp.219-226
    • /
    • 2000
  • In this paper, we propose a new novel method for similarity search that supports time warping. Our primary goal is to innovate on search performance in large databases without false dismissal. To attain this goal, we devise a new distance function $D_{tw-lb}$ that consistently underestimates the time warping distance and also satisfies the triangular inequality. $D_{tw-lb}$ uses a 4-tuple feature vector extracted from each sequence and is invariant to time warping. For efficient processing, we employ a multidimensional index that uses the 4-tuple feature vector as indexing attributes and $D_{tw-lb}$ as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we perform extensive experiments. The results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data.

  • PDF

Genotyping of HLA-DRB1 by Polymerase Chain Reaction-Sequence Specific Primer (Polymerase Chain Reaction-Sequence Specific Primer를 이용한 HLA-DRB1 유전자의 DNA 다형성)

  • Jang, Soon-Mo
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.37 no.3
    • /
    • pp.139-142
    • /
    • 2005
  • Most expressed HLA(human leukocyte antigen) loci exhibit a remarkable degree of allelic polymorphism, which is derived from sequenceing differences predominantly localized to discrete hypervariable regions of the amino-terminal domain of the molecule. In this study, the HLA-DRB1 genotypes were determined in twenty students using the PCR-SSP (polymerase chain reaction-sequence specific primer) technique. Two specific primer pairs in assigning the DRB1 gene were used. The results of PCR-SSP, the $HLA-DRB1^{\ast}0101$ primer detected nine and $HLA-DRB1^{\ast}1501$ primer detected three people. This study shows that the PCR-SSP technique is relatively simple, fast and a practical tool for the determination of the HLA-DRB1 genotypes. Moreover, these genotype frequency results of the HLA DRB1 gene could be useful for database study before being applied to individual identification and transplantation immunity.

  • PDF

Feature selection and frequent pattern analysis in protein motif sequence (모티프 서열에서의 특징추출 및 빈발패턴 분석)

  • Kim, Dae-Sung;Lee, Bum-Ju;Ryu, Keun-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.10-13
    • /
    • 2007
  • 모티프는 진화과정을 거치면서 단백질 서열상에서 부분적으로 높게 보존된 지역을 의미한다. 이러한 모티프는 단백질의 기능과 구조를 예측하거나 생물학적으로 관련성이 있는 단백질의 공통적인 특성을 기술하는데 사용된다. 또한, 모티프와 단백질 서열의 상관관계는 생물학적 기능 예측에 필수적이며, 이러한 예측 문제는 모티프 검색을 통해 서열에 존재하는 빈발한 서열패턴과 구조패턴을 통해 단백질 서열에 대한 분석이 가능하다. 이 논문에서는 단백질 서열에 존재하는 2차 구조 특성과 빈발패턴을 검색하고 추출된 정보를 이용하여 단백질 기능 분류에 활용하고자 한다.

  • PDF

MOTIF BASED PROTEIN FUNCTION ANALYSIS USING DATA MINING

  • Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.812-815
    • /
    • 2006
  • Proteins are essential agents for controlling, effecting and modulating cellular functions, and proteins with similar sequences have diverged from a common ancestral gene, and have similar structures and functions. Function prediction of unknown proteins remains one of the most challenging problems in bioinformatics. Recently, various computational approaches have been developed for identification of short sequences that are conserved within a family of closely related protein sequence. Protein function is often correlated with highly conserved motifs. Motif is the smallest unit of protein structure and function, and intends to make core part among protein structural and functional components. Therefore, prediction methods using data mining or machine learning have been developed. In this paper, we describe an approach for protein function prediction of motif-based models using data mining. Our work consists of three phrases. We make training and test data set and construct classifier using a training set. Also, through experiments, we evaluate our classifier with other classifiers in point of the accuracy of resulting classification.

  • PDF

Database using Personal Information Management System

  • Kim, Jae-Woo;Kim, Don-Go;Kang, Sang-Gil;Kim, Dong-Hyun;Kim, Won-Il
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.4
    • /
    • pp.260-263
    • /
    • 2008
  • In this paper we propose Personal Information Management System for Library Database. It manages personal search pattern for the given user and provide specific book list for library book search system. With the proposed system, the conventional overlap searching time will be decreased with personalized information and search history. This system manages the individual data according to personal searching pattern, sequence and usability. Therefore, the user can locate necessary book information more accurately with their distinct interest and search history.