• 제목/요약/키워드: DNA sequence database

검색결과 209건 처리시간 0.028초

TFSCAN 검색 프로그램 TFSCAN의 개발

  • 이병욱;박기정;김기봉;박완;박용하
    • 한국미생물·생명공학회지
    • /
    • 제24권3호
    • /
    • pp.371-375
    • /
    • 1996
  • TFD is a transcription factor database which consists of short functional DNA sequences called as signals and their references. SIGNAL SCAN, developed by Dan S. Prestridge, is used to determine what signals of TFD may exist in a DNA sequence. This program searches TFD database by using a simple algorithm for character string comparison. We developed TFSCAN that aims at searching for signals in an input DNA sequence more efficently than SIGNAL SCAN. Our algorithms consist of two parts, one constructs an automata by scanning sequences of rFD, the other searches for signals through this automata. Searching for signal-related references is radically improved in time by using an indexing method. Usage of TFSCAN is very simple and its output is obvious. We developed and installed a TFSCAN input form and a CGI program in GINet Web server, to use TFSCAN. The algorithm applying automata showed drastical results in improvement of computing time. This approach may apply to recognizing several biological patterns. We have been developing our algorithm to optimize the automata and to search more sensitively for signals.

  • PDF

염기문자의 빈도와 위치정보를 이용한 DNA 인덱스구조 (A DNA Index Structure using Frequency and Position Information of Genetic Alphabet)

  • 김우철;박상현;원정임;김상욱;윤지희
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제32권3호
    • /
    • pp.263-275
    • /
    • 2005
  • 대규모 DNA 데이타베이스를 대상으로 원하는 서열을 빠르게 검색하기 위해 인덱싱 기법을 많이 사용하고 있다. 그러나 대부분의 인덱싱 기법은 원래 데이타베이스보다 더 큰 저장공간을 사용하고 DBMS와의 밀 결합이 어렵다는 문제점을 가지고 있다. 본 논문에서는 완전 매치, 와일드카드 매치, k-미스매치와 같은 근사 매치 질의 처리를 위해 작은 공간을 사용하는 디스크 기반의 효율적인 인덱싱 기법과 질의 처리 기법을 제안한다 인덱싱을 위해서 DNA 염기서열에 일정 크기의 슬라이딩 윈도우를 위치시킨 후, 윈도우 내에서 각 문자의 출현 빈도를 이용해 서명을 추출해서 R*-트리와 같은 다차원 공간 인덱스에 저장한다. 특히 윈도우 내의 각 위치에 따라서 가중치를 줌으로써 서명들이 인덱스 공간에 집중되는 현상을 억제한다. 제안된 질의 처리방법은 질의 시퀀스를 다차원 사각형으로 변환하고 그 사각형과 중첩되는 서명들을 인덱스로부터 찾아낸다 제안된 방법을 실제 생물학자들이 사용하는 데이타를 이용해 실험한 결과 서픽스 트리 기반의 방법에 비해서 완전 매치인 경우 3배 이상, 와일드카드 매치인 경우 2배 이상, k-미스매치인 경우 수십 배 이상의 성능향상을 보였다.

Comparative Analysis of Expressed Sequence Tags from Flammulina velutipes at Different Developmental Stages

  • Joh, Joong-Ho;Kim, Kyung-Yun;Lim, Jong-Hyun;Son, Eun-Suk;Park, Hye-Ran;Park, Young-Jin;Kong, Won-Sik;Yoo, Young-Bok;Lee, Chang-Soo
    • Journal of Microbiology and Biotechnology
    • /
    • 제19권8호
    • /
    • pp.774-780
    • /
    • 2009
  • Flammulina velutipes is a popular edible basidiomycete mushroom found in East Asia and is commonly known as winter mushroom. Mushroom development showing dramatic morphological changes by different environmental factors is scientifically and commercially interesting. To create a genetic database and isolate genes regulated during mushroom development, cDNA libraries were constructed from three developmental stages of mycelium, primordium, and fruit body in F. velutipes. We generated a total of 5,431 expressed sequence tags (ESTs) from randomly selected clones from the three cDNA libraries. Of these, 3,332 different unique genes (unigenes) were consistent with 2,442 (73%) singlets and 890 (27%) contigs. This corresponds to a redundancy of 39%. Using a homology search in the gene ontology database, the EST unigenes were classified into the three categories of molecular function (28%), biological process (29%), and cellular component (6%). Comparative analysis found great variations in the unigene expression pattern among the three different unigene sets generated from the cDNA libraries of mycelium, primordium, and fruit body. The 19-34% of total unigenes were unique to each unigene set and only 3% were shared among all three unigene sets. The unique and common representation in F. velutipes unigenes from the three different cDNA libraries suggests great differential gene expression profiles during the different developmental stages of F. velutipes mushroom.

Analysis of Partial cDNA Sequence from Human Fetal Liver

  • Kim, Jae-Wha;Song, Jae-Chan;Lee, In-Ae;Lee, Young-Hee;Nam, Myoung-Soo;Hahn, Yoon-Soo;Chung, Jae-Hoon;Choe, In-Seong
    • BMB Reports
    • /
    • 제28권5호
    • /
    • pp.402-407
    • /
    • 1995
  • Single-run Partial cDNA sequencing was conducted on 1,592 randomly selected human fetal liver cDNA clones of Korean origin to isolate novel genes related to liver functions. Each partial cDNA sequence determined was analyzed by comparing it with the databases. GenBank, Protein Information Resource (PIR) and SWISS-PROT Protein Sequence Data Bank. From a set of 1.592 cDNA clones reported here, 1,433 (90.0% of the total) were informative cDNA sequences. The other 159 clones were identified as DNA sequences which had originated from the cloning vector. Among 1,433 informative partial cDNA sequences, 851 (59.3%) clones were revealed to be identical to known human genes. These known genes have been classified into 225 different kinds of genes. In addition, 340 clones (23.7%) showed various degrees of homology to previously known human genes. Ninety four (6.6%) clones contained various repeated sequences. Twenty four (1.7%) partial cDNA sequences were found to have considerable homology to known genes from evolutionarily distant organism such as yeast, rice, Arabidopsis, mouse and rat, based on database matches, whereas 124 (8.7%) had no Significant matches. Human homologues to functionally characterized genes from different organisms could be classified as candidates for novel human genes of similar functions. Information from the partial cDNA sequences in this study may facilitate the analysis of genes expressed in human fetal liver.

  • PDF

Genes expression monitoring using cDNA microarray: Protocol and Application

  • Muramatsu Masa-aki
    • 한국독성학회:학술대회논문집
    • /
    • 한국독성학회 2000년도 국제심포지움 및 추계학술대회
    • /
    • pp.31-41
    • /
    • 2000
  • The major issue in the post genome sequencing era is determination of gene expression patterns in variety of biological systems. A microarray system is a powerful technology for analyzing the expression profile of thousands of genes at one experiment. In this study, we constructed cDNA microarray which carries 2,304 cDNAS derived from oligo-capped mouse cDNA library. Using this hand-made microarray we determined gene expression in various biological systems. To determine tissue specific genes, we compared Nine genes were highly-expressed in adult mouse brain compared to kidney, liver, and skeletal muscle. Tissue distribution analysis using DNA microarray extracted 9 genes that were predominantly expressed in the brain. A database search showed that five of the 9 genes, MBP, SC1, HiAT3, S100 protein-beta, and SNAP25, were previously known to be expressed at high level in the brain and in the nervous system. One gene was highly sequence similar to rat S-Rex-s/human NSP-C, suggesting that the gene is a mouse homologue. The remaining three genes did not match to known genes in the GenBank/EMBL database, indicating that these are novel genes highly-expressed in the brain. Our DNA microarray was also used to detect differentiation specific genes, hormone dependent genes, and transcription-factor-induced genes. We conclude that DNA microarray is an excellent tool for identifying differentially expressed genes.

  • PDF

Application of rDNA-PCR Amplification and DGGE Fingerprinting for Detection of Microbial Diversity in a Malaysian Crude Oil

  • Liew, Pauline Woan Ying;Jong, Bor Chyan
    • Journal of Microbiology and Biotechnology
    • /
    • 제18권5호
    • /
    • pp.815-820
    • /
    • 2008
  • Two culture-independent methods, namely ribosomal DNA libraries and denaturing gradient gel electrophoresis (DGGE), were adopted to examine the microbial community of a Malaysian light crude oil. In this study, both 16S and 18S rDNAs were PCR-amplified from bulk DNA of crude oil samples, cloned, and sequenced. Analyses of restriction fragment length polymorphism (RFLP) and phylogenetics clustered the 16S and 18S rDNA sequences into seven and six groups, respectively. The ribosomal DNA sequences obtained showed sequence similarity between 90 to 100% to those available in the GenBank database. The closest relatives documented for the 16S rDNAs include member species of Thermoincola and Rhodopseudomonas, whereas the closest fungal relatives include Acremonium, Ceriporiopsis, Xeromyces, Lecythophora, and Candida. Others were affiliated to uncultured bacteria and uncultured ascomycete. The 16S rDNA library demonstrated predomination by a single uncultured bacterial type by >80% relative abundance. The predomination was confirmed by DGGE analysis.

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

누에 유충의 cDNA 유전자 은행 제작 및 cDNA 클론의 부분염기서울 분석 (Construction of the cDNA Library from Bombyx mori Larvae and Analysis of the Partial cDNA Sequences)

  • 김상현;윤은영
    • 한국잠사곤충학회지
    • /
    • 제38권1호
    • /
    • pp.13-18
    • /
    • 1996
  • 곤충의 다양한 기능해석을 유전자 수준에서 수행하기 위하여 주요 익충인 누에를 대상으로 유전 자원 확보를 시도하였다. 우선 5령의 누에유충에서 cDNA 유전자 은행을 제작하여 1.3 X 106개의 cDNA 유전자원을 확보하였다. 누에유충의 cDNA 유전자 은행에서 무작위로 plaques을 선정하였고, 이를 플라스미드로 전환하여 SK primer를 이용한 부분 염기서열을 결정하였다. 결정된 cDNA 클론의 부분 염기서열을 GenBank 데이타베이스에서 검색하여 37개의 발현 유전자 꼬리표를 생산하였다. 이들 중 15개는 데이터베이스와의 비교부위가 150bp 이상이고 DNA 상동성이 약 60% 이상으로 비교적 높은 DNA 상동 유의성을 나타내는 것으로 혈림프에서 발견되는 수종의 저장 단백질, 곤충의 기동성, 체벽 형성, 효소 및 초파리의 돌연변이 유전자형과 유사한 종류들이었다. 또한 15개의 발현 유전자 꼬리표 중 누에에서 밝혀진 것은 3종이고 그 나머지는 누에에서 처음 밝혀진 것으로 이 클론에 대한 정확한 동정이 요구된다.

  • PDF

유전자 알고리즘을 이용한 Promoter 예측 (Promoter Prediction using Genetic Algorithm)

  • 오민경;김창훈;김기봉;공은배;김승목
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 1999년도 가을 학술발표논문집 Vol.26 No.2 (2)
    • /
    • pp.12-14
    • /
    • 1999
  • Promoter는 transcript start site 앞부분에 위치하여 RNA polymerase가 높은 친화성을 보이며 바인당하는 DNA상의 특별한 부위로서 여기서부터 DNA transcription이 시작된다. function이나 tissue-specific gene들의 그룹별로 그 promoter들의 특이한 패턴들의 조합을 발견함으로써 Specific한 transcription을 조절하는 것으로 알려져 있어 promoter로 인한 그 gene의 정보를 어느 정도 알 수가 있다. 사람의 housekeeping gene promoter들을 EPD(eukaryotic promoter database)와 EMBL nucleic acid sequence database로부터 수집하여 이것들 간에 의미 있게 나타나는 모든 패턴들을 optimization algorithm으로 알려진 genetic algorithm을 이용해서 찾아보았다.

  • PDF

노각나무(Stewartia koreana Nakai)의 cDNA library 제작 및 EST 분석 (Construction of a Full-length cDNA Library from Korean Stewartia (Stewartia koreana Nakai) and Characterization of EST Dataset)

  • 임수빈;김준기;최영인;최선희;권혜진;송호경;임용표
    • 원예과학기술지
    • /
    • 제29권2호
    • /
    • pp.116-122
    • /
    • 2011
  • 본 연구에서는 지리산에서 자생하는 한국 특산종인 노각나무(Stewartia koreana Nakai)의 EST library를 제작하고 서열을 분석하였다. 노각나무의 유엽을 재료로 cDNA library 만들었고 1,392개의 cDNA에 대한 부분 서열 분석을 진행하였다. EST와 unigene 서열의 분석은 컴퓨터를 기반으로한 filtering과 수작업 그리고 NCBI의 BLAST 분석을 통해 수행하였다. 벡터 서열과 100bp 이하의 서열을 제거한 후 1,301개의 EST를 분석하였다. 전체 150개의 contig와 743개의 singleton을 분리하여 총 893개의 unigene을 분리해냈으며 서열 분석을 통해 95개의 microsatellite를 확인하였다. NCBI 데이터베이스의 BLASTX로 상동성을 검색한 결과 EST의 65%는 기능을 알고 있는 유전자와 11.6%의 EST는 아직까지 기능이 보고되지 않은 유전자와 높은 상동성을 보였다. 남아 있는 23.2%의 EST는 기존에 데이터베이스에 보고된 유전자와 상동성을 보이지 않는 유전자로 밝혀졌다. 다양한 데이터베이스를 기반으로 한 유사성 기반 기능 분석은 노각나무의 EST가 포도나무와 포플러와 높은 유사성을 보인 것을 확인하였다. 기능에 따른 분류에 있어 molecular function은 nucleotide binding, biological process는 transport, cellular component는 plastid가 가장 높은 비율로 나왔다. 본 연구를 통해 얻어진 EST 자료는 노각나무의 새로운 유전자원에 대한 연구의 기본 자료로 유용하게 활용될 것이다.