• Title/Summary/Keyword: Sequence database

Search Result 567, Processing Time 0.028 seconds

Construction of a Full-length cDNA Library from Korean Stewartia (Stewartia koreana Nakai) and Characterization of EST Dataset (노각나무(Stewartia koreana Nakai)의 cDNA library 제작 및 EST 분석)

  • Im, Su-Bin;Kim, Joon-Ki;Choi, Young-In;Choi, Sun-Hee;Kwon, Hye-Jin;Song, Ho-Kyung;Lim, Yong-Pyo
    • Horticultural Science & Technology
    • /
    • v.29 no.2
    • /
    • pp.116-122
    • /
    • 2011
  • In this study, we report the generation and analysis of 1,392 expressed sequence tags (ESTs) from Korean Stewartia (Stewartia koreana Nakai). A cDNA library was generated from the young leaf tissue and a total of 1,392 cDNA were partially sequenced. EST and unigene sequence quality were determined by computational filtering, manual review, and BLAST analyses. Finally, 1,301 ESTs were acquired after the removal of the vector sequence and filtering over a minimum length 100 nucleotides. A total of 893 unigene, consisting of 150 contigs and 743 singletons, was identified after assembling. Also, we identified 95 new microsatellite-containing sequences from the unigenes and classified the structure according to their repeat unit. According to homology search with BLASTX against the NCBI database, 65% of ESTs were homologous with known function and 11.6% of ESTs were matched with putative or unknown function. The remaining 23.2% of ESTs showed no significant similarity to any protein sequences found in the public database. Annotation based searches against multiple databases including wine grape and populus sequences helped to identify putative functions of ESTs and unigenes. Gene ontology (GO) classification showed that the most abundant GO terms were transport, nucleotide binding, plastid, in terms biological process, molecular function and cellular component, respectively. The sequence data will be used to characterize potential roles of new genes in Stewartia and provided for the useful tools as a genetic resource.

Construction of Web-Based Database for Anisakis Research (고래회충 연구를 위한 웹기반 데이터베이스 구축)

  • Lee, Yong-Seok;Baek, Moon-Ki;Jo, Yong-Hun;Kang, Se-Won;Lee, Jae-Bong;Han, Yeon-Soo;Cha, Hee-Jae;Yu, Hak-Sun;Ock, Mee-Sun
    • Journal of Life Science
    • /
    • v.20 no.3
    • /
    • pp.411-415
    • /
    • 2010
  • Anisakis simplex is one of the parasitic nematodes, and has a complex life cycle in crustaceans, fish, squid or whale. When people eat under-processed or raw fish, it causes anisakidosis and also plays a critical role in inducing serious allergic reactions in humans. However, no web-based database on A. simplex at the level of DNA or protein has been so far reported. In this context, we constructed a web-based database for Anisakis research. To build up the web-based database for Anisakis research, we proceeded with the following measures: First, sequences of order Ascaridida were downloaded and translated into the multifasta format which was stored as database for stand-alone BLAST. Second, all of the nucleotide and EST sequences were clustered and assembled. And EST sequences were translated into amino acid sequences for Nuclear Localization Signal prediction. In addition, we added the vector, E. coli, and repeat sequences into the database to confirm a potential contamination. The web-based database gave us several advantages. Only data that agrees with the nucleotide sequences directly related with the order Ascaridida can be found and retrieved when searching BLAST. It is also very convenient to confirm contamination when making the cDNA or genomic library from Anisakis. Furthermore, BLAST results on the Anisakis sequence information can be quickly accessed. Taken together, the Web-based database on A. simplex will be valuable in developing species specific PCR markers and in studying SNP in A. simplex-related researches in the future.

A Study on method of load attribute for Spatial Scheduling (공간일정계획에서의 부하조정을 위한 방법론 연구)

  • Back Dong-Sik;Yoon Duck-Young;Kwak Hyun Ho
    • Proceedings of the Korea Committee for Ocean Resources and Engineering Conference
    • /
    • 2004.05a
    • /
    • pp.96-100
    • /
    • 2004
  • In the ship building industry various problems of erection is counterfeited due to formation of bottle necks in the block erection flow pattern This kind of problems cause accumulated problems in real-time erection right on the floor, When such a problem is approached, a support data of the entire erection sequence should be available, Here planning is done by reasoning about the future events in order to verify the existence of a reasonable series of actions to accomplish a goal. This technique helps in achieving benefits like handling search complications, in resolving goal conflicts and anticipation of bottleneck formation well in advance to take necessary countermeasures and boosts the decision support system, The data is being evaluated and an anticipatory function is to be developed This function is quite relevant in day to day planning operation. The system updates database with rearrangement of off-critical blocks in the erection sequence diagram, As a result of such a system, planners can foresee months ahead and can effectively make decisions regarding the control of loads on the man, machine and work flow pattern, culminating to an efficient load management. Such a foreseeing concept helps us in eliminating backtracking related adjustment which is less efficient compared to the look-ahead concept. An attempt is made to develop a computer program to update the database of block arrangement pattern based on heuristic formulation.

  • PDF

Metagenome Analysis of Protein Domain Collocation within Cellulase Genes of Goat Rumen Microbes

  • Lim, SooYeon;Seo, Jaehyun;Choi, Hyunbong;Yoon, Duhak;Nam, Jungrye;Kim, Heebal;Cho, Seoae;Chang, Jongsoo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.26 no.8
    • /
    • pp.1144-1151
    • /
    • 2013
  • In this study, protein domains with cellulase activity in goat rumen microbes were investigated using metagenomic and bioinformatic analyses. After the complete genome of goat rumen microbes was obtained using a shotgun sequencing method, 217,892,109 pair reads were filtered, including only those with 70% identity, 100-bp matches, and thresholds below $E^{-10}$ using METAIDBA. These filtered contigs were assembled and annotated using blastN against the NCBI nucleotide database. As a result, a microbial community structure with 1431 species was analyzed, among which Prevotella ruminicola 23 bacteria and Butyrivibrio proteoclasticus B316 were the dominant groups. In parallel, 201 sequences related with cellulase activities (EC.3.2.1.4) were obtained through blast searches using the enzyme.dat file provided by the NCBI database. After translating the nucleotide sequence into a protein sequence using Interproscan, 28 protein domains with cellulase activity were identified using the HMMER package with threshold E values below $10^{-5}$. Cellulase activity protein domain profiling showed that the major protein domains such as lipase GDSL, cellulase, and Glyco hydro 10 were present in bacterial species with strong cellulase activities. Furthermore, correlation plots clearly displayed the strong positive correlation between some protein domain groups, which was indicative of microbial adaption in the goat rumen based on feeding habits. This is the first metagenomic analysis of cellulase activity protein domains using bioinformatics from the goat rumen.

A Study of Similarity Measures on Multidimensional Data Sequences Using Semantic Information (의미 정보를 이용한 다차원 데이터 시퀀스의 유사성 척도 연구)

  • Lee, Seok-Lyong;Lee, Ju-Hong;Chun, Seok-Ju
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.283-292
    • /
    • 2003
  • One-dimensional time-series data have been studied in various database applications such as data mining and data warehousing. However, in the current complex business environment, multidimensional data sequences (MDS') become increasingly important in addition to one-dimensional time-series data. For example, a video stream can be modeled as an MDS in the multidimensional space with respect to color and texture attributes. In this paper, we propose the effective similarity measures on which the similar pattern retrieval is based. An MDS is partitioned into segments, each of which is represented by various geometric and semantic features. The similarity measures are defined on the basis of these segments. Using the measures, irrelevant segments are pruned from a database with respect to a given query. Both data sequences and query sequences are partitioned into segments, and the query processing is based upon the comparison of the features between data and query segments, instead of scanning all data elements of entire sequences.

A Study on Clustering and Identifying Gene Sequences using Suffix Tree Clustering Method and BLAST (서픽스트리 클러스터링 방법과 블라스트를 통합한 유전자 서열의 클러스터링과 기능검색에 관한 연구)

  • Han, Sang-Il;Lee, Sung-Gun;Kim, Kyung-Hoon;Lee, Ju-Yeong;Kim, Young-Han;Hwang, Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.10
    • /
    • pp.851-856
    • /
    • 2005
  • The DNA and protein data of diverse species have been daily discovered and deposited in the public archives according to each established format. Database systems in the public archives provide not only an easy-to-use, flexible interface to the public, but also in silico analysis tools of unidentified sequence data. Of such in silico analysis tools, multiple sequence alignment [1] methods relying on pairwise alignment and Smith-Waterman algorithm [2] enable us to identify unknown DNA, protein sequences or phylogenetic relation among several species. However, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST was combined with a clustering tool. Our clustering and annotating tool is summarized as the following steps: (1) construction of suffix tree; (2) masking of cross-matching pairs; (3) clustering of gene sequences and (4) annotating gene clusters by BLAST search. The system was successfully evaluated with 22 gene sequences in the pyrubate pathway of bacteria, clustering 7 clusters and finding out representative common subsequences of each cluster

Proteomics Approach on Puroindoline Gene of Pre-harvest Sprouting Wheat

  • Kamal, Abu Hena Mostafa;Park, Cheol-Soo;Heo, Hwa-Young;Chung, Keun-Yook;Cho, Yong-Gu;Kim, Hong-Sig;Song, Beom-Heon;Lee, Chul-Won;Woo, Sun-Hee
    • Korean Journal of Breeding Science
    • /
    • v.41 no.3
    • /
    • pp.205-212
    • /
    • 2009
  • Wheat (Triticum aestivum L.) grain texture is an important determinant of milling properties and end product use. Two linked genes, puroindoline a (PINA) and puroindoline b (PINB), control most of the genetic variation in wheat grain texture. Wheat seed proteins were examined to identify PINA and PINB gene using two pre-harvest sprouting wheat cultivars; Jinpum (resistant) and Keumgang (susceptible).Wheat seed proteins were separated by two-dimensional electrophoresis with IEF gels over pH ranges: pH 3-10. A total of 73 spots were digested with trypsin resulting peptide fragmentation were analyzed by matrix assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF/MS). Mass spectra were automatically processed and searched through NCBInr, SWISS-PORT and MSDB database with mono isotopic masses and complete gene sequence were found by UniProt database. Puroindoline a and puroindoline b that is responsible for grain texture related with baking performance and roughness. Two spots were found Pin b (16.7 kDa) and Pin a (16.3 kDa) in Jinpum compare to seven spots were identified Pin a (16.1 kDa, 16.3 kDa) and Pin b (16.7 kDa, 9.5 kDa and 14.4 kDa) in Keumgang. Some selected spots were identified puroindoline like grain softness protein (16.9 kDa, 17 kDa and 18.1 kDa) in Keumgang. Moreover, to gain a better inferring the identification of puroindoline related proteins using proteomics, we accomplished a complete gene sequence of PINA and PINB gene in pre-harvesting sprouting wheat seeds between resistant (Jinpum) and susceptible (Keumgang).

Analysis for Diagnosis of Patients with Cerebral Infarction by Sequence Modeling (순차규칙 모델링을 활용한 뇌경색증 환자 진단 분석)

  • Shin, A.M.;Park, H.J.;Lee, I.H.;Kim, Y.N.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.2 no.1
    • /
    • pp.51-56
    • /
    • 2009
  • This study was tried to analyze the diagnosis of patients with cerebral infarction by sequence modeling that was one of data mining analysis method and find out previous disease or complication of patients with cerebral infarction. Mass data that the diagnosis code of cerebral infarction was 163 in 2000 to 2007 were extracted from A hospital's database and then the data mart was constructed for analysis. Total 2,267 patients illnesses were diagnosed as cerebral infarction and 32,692 cases related diagnosis were extracted. Sequence modeling in Clementine 12.0 program was used to analyze diagnosis of patients with cerebral infarction and 8 meaningful rules were found in this paper. This result could be used as a basic data to make secondary cerebral infarction prevention program and to prevent complication of cerebral infarction.

  • PDF

Genetic mapping and sequence analysis of Phi class Glutathione S-transferases (BrGSTFs) candidates from Brassica rapa

  • Park, Tae-Ho;Jin, Mi-Na;Lee, Sang-Choon;Hong, Joon-Ki;Kim, Jung-Sun;Kim, Jin-A;Kwon, Soo-Jin;Zang, Yun-Xiang;Park, Young-Doo;Park, Beom-Seok
    • Journal of Plant Biotechnology
    • /
    • v.35 no.4
    • /
    • pp.265-274
    • /
    • 2008
  • Glutathione S-transferases (GSTs) are multifunctional proteins encoded by a large gene family divided into Phi, Tau, Theta, Zeta, Lambda and DHAR classes on the basis of sequence identity. The Phi(F) and Tau(U) classes are plant-specific and ubiquitous. Their roles have been defined as herbicide detoxification and responses to biotic and abiotic stresses. Fifty-two members of the GST super-family were identified in the Arabidopsis thaliana genome, 13 members of which belong to the Phi class of GSTs (AtGSTFs). Based on the sequence similarities of AtGSTFs, 11 BAC clones were identified from Brassica rapa. Seven unique sequences of ORFs designated the Phi class candidates of GST derived from B. rapa (BrGSTFs) were detected from these 11 BAC clones by blast search and sequence alignment. Some of BrGSTFs were present in the same BAC clones indicating that BrGSTFs could also be clustered as usual in plant. They were mapped on B. rapa linkage group 2, 3, 9 and 10 and their nucleotide and amino acid sequences were highly similar to those of AtGSTFs. In addition, in silico analysis of BrGSTFs using Korea Brassica Genome Project 24K oligochip and microarray database for cold, salt and drought stresses revealed 15 unigenes to be highly similar to AtGSTFs and six of these were identical to one of BrGSTFs identified in the BAC clones indicating their expression. The sequences of BrGSTFs and unigenes identified in this study will facilitate further studies to apply GST genes to medical and agriculture purposes.

CGRID construction based on Etherboot technology and its utilization to sequence analysis (Etherboot 기반의 CGRID 구축과 서열분석에의 적용)

  • Kim Tae-Kyung;Cho Wan-Sup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.6 s.38
    • /
    • pp.195-208
    • /
    • 2005
  • Recently, amount of the data such as sequences is being increased rapidly due to deploying computational technique and advance of experiment tools in the biological areas. In bioinformatics, it is very significant to extract the knowledge from such huge biological data. Sequence comparisons are most frequently used to predict the function of the genes or proteins. However it takes so much time to process the persistently increasing data In this paper, we propose hardware-based grid, CGRID(Chungbuk National University GRID), to improve performance and complement existing middleware-only approach and apply it in the sequence comparison. Hardware-based approach is easy to construct, maintain, and manage the grid as not requiring the software installation individually for every node. We reduce orthologous database construction time from 33 weeks to just a week. Furthermore, CGRID guarantees that the performance increases proportionally as adding the nodes.

  • PDF