• Title/Summary/Keyword: contig assembly

Search Result 25, Processing Time 0.025 seconds

Characterization of a Strain of Malva Vein Clearing Virus in Alcea rosea via Deep Sequencing

  • Wang, Defu;Cui, Liyan;Pei, Yanni;Ma, Zhennan;Shen, Shaofei;Long, Dandan;Li, Lingyu;Niu, Yanbing
    • The Plant Pathology Journal
    • /
    • v.36 no.5
    • /
    • pp.468-475
    • /
    • 2020
  • Malva vein clearing virus (MVCV) is a member of the Potyvirus species, and has a negative impact on the aesthetic development of Alcea rosea. It was first reported in Germany in 1957, but its complete genome sequence data are still scarce. In the present work, A. rosea leaves with vein-clearing and mosaic symptoms were sampled and analyzed with small RNA deep sequencing. By denovo assembly the raw sequences of virus-derived small interfering RNAs (vsiRs) and whole genome amplification of malva vein cleaning virus SX strain (MVCV-SX) by specific primers targeting identified contig gaps, the full-length genome sequences (9,645 nucleotides) of MVCV-SX were characterized, constituting of an open reading frame that is long enough to encode 3,096 amino acids. Phylogenetic analysis showed that MVCV-SX was clustered with euphorbia ringspot virus and yam mosaic virus. Further analyses of the vsiR profiles revealed that the most abundant MVCV-vsiRs were between 21 and 22 nucleotides in length and a strong bias was found for "A" and "U" at the 5′-terminal residue. The results of polarity assessment indicated that the amount of sense strand was almost equal to that of the antisense strand in MVCV-vsiRs, and the main hot-spot region in MVCV-SX genome was found at cylindrical inclusion. In conclusion, our findings could provide new insights into the RNA silencing-mediated host defence mechanism in A. rosea infected with MVCV-SX, and offer a basis for the prevention and treatment of this virus disease.

Single Nucleotide Polymorphism Marker Discovery from Transcriptome Sequencing for Marker-assisted Backcrossing in Capsicum

  • Kang, Jin-Ho;Yang, Hee-Bum;Jeong, Hyeon-Seok;Choe, Phillip;Kwon, Jin-Kyung;Kang, Byoung-Cheorl
    • Horticultural Science & Technology
    • /
    • v.32 no.4
    • /
    • pp.535-543
    • /
    • 2014
  • Backcross breeding is the method most commonly used to introgress new traits into elite lines. Conventional backcross breeding requires at least 4-5 generations to recover the genomic background of the recurrent parent. Marker-assisted backcrossing (MABC) represents a new breeding approach that can substantially reduce breeding time and cost. For successful MABC, highly polymorphic markers with known positions in each chromosome are essential. Single nucleotide polymorphism (SNP) markers have many advantages over other marker systems for MABC due to their high abundance and amenability to genotyping automation. To facilitate MABC in hot pepper (Capsicum annuum), we utilized expressed sequence tags (ESTs) to develop SNP markers in this study. For SNP identification, we used Bukang $F_1$-hybrid pepper ESTs to prepare a reference sequence through de novo assembly. We performed large-scale transcriptome sequencing of eight accessions using the Illumina Genome Analyzer (IGA) IIx platform by Solexa, which generated small sequence fragments of about 90-100 bp. By aligning each contig to the reference sequence, 58,151 SNPs were identified. After filtering for polymorphism, segregation ratio, and lack of proximity to other SNPS or exon/intron boundaries, a total of 1,910 putative SNPs were chosen and positioned to a pepper linkage map. We further selected 412 SNPs evenly distributed on each chromosome and primers were designed for high throughput SNP assays and tested using a genetic diversity panel of 27 Capsicum accessions. The SNP markers clearly distinguished each accession. These results suggest that the SNP marker set developed in this study will be valuable for MABC, genetic mapping, and comparative genome analysis.

Simple sequence repeat marker development from Codonopsis lanceolata and genetic relation analysis

  • Kim, Serim;Jeong, Ji Hee;Chung, Hee;Kim, Ji Hyeon;Gil, Jinsu;Yoo, Jemin;Um, Yurry;Kim, Ok Tae;Kim, Tae Dong;Kim, Yong-Yul;Lee, Dong Hoon;Kim, Ho Bang;Lee, Yi
    • Journal of Plant Biotechnology
    • /
    • v.43 no.2
    • /
    • pp.181-188
    • /
    • 2016
  • In this study, we developed 15 novel polymorphic simple sequence repeat (SSR) markers by SSR-enriched genomic library construction from Codonopsis lanceolata. We obtained a total of 226 non-redundant contig sequences from the assembly process and designed primer sets. These markers were applied to 53 accessions representing the cultivated C. lanceolata in South Korea. Fifteen markers were sufficiently polymorphic, and were used to analyze the genetic relationships between the cultivated C. lanceolata. One hundred three alleles of the 15 SSR markers ranged from 3 to 19 alleles at each locus, with an average of 6.87. By cluster analysis, we detected clear genetic differences in most of the accessions, with genetic distance varying from 0.73 to 0.93. Phylogenic analysis indicated that the accessions that were collected from the same area were distributed evenly in the phylogenetic tree. These results indicate that there is no correlative genetic relationship between geographic areas. These markers will be useful in differentiating C. lanceolata genetic resources and in selecting suitable lines for a systemic breeding program.

Functional Analysis of Expressed Sequence Tags from Hanwoo (Korean Cattle) cDNA Libraries (한우 cDNA 라이브러리에서 발현된 ESTs의 기능분석)

  • Lim, Da-Jeong;Byun, Mi-Jeong;Cho, Yong-Min;Yoon, Du-Hak;Lee, Seung-Hwan;Shin, Youn-Hee;Im, Seok-Ki
    • Journal of Animal Science and Technology
    • /
    • v.51 no.1
    • /
    • pp.1-8
    • /
    • 2009
  • We generated 57,598 expressed sequence tags (ESTs) from 3 cDNA libraries of Hanwooo (Korean Cattle), fat, loin, liver. Liver, intermuscular fat and longissimus dorsi tissues were obtained from a 24-month-old Hanwoo steer immediately after slaughter. cDNA library was constructed according to the oligocapped method. The EST data were clustered and assembled into unique sequences, 4,759 contigs and 7,587 singletons. To carry out functional analysis, Gene Ontology annotation and identification of significant leaf nodes were performed that were detected by searching significant p-values from $2^{nd}$ level GO terms to leaf nodes using Bonferroni correction. We found that 13, 26 and 8 significant leaf nodes are unique in the transcripts according to 3 GO categories, molecular function, biological process and cellular component. Also digital gene expression profiling using the Audic's test was performed and tissue specific genes were detected in the above 3 libraries.

EST Analysis system for panning gene

  • Hur, Cheol-Goo;Lim, So-Hyung;Goh, Sung-Ho;Shin, Min-Su;Cho, Hwan-Gue
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.21-22
    • /
    • 2000
  • Expressed sequence tags (EFTs) are the partial segments of cDNA produced from 5 or 3 single-pass sequencing of cDNA clones, error-prone and generated in highly redundant sets. Advancement and expansion of Genomics made biologists to generate huge amount of ESTs from variety of organisms-human, microorganisms as well as plants, and the cumulated number of ESTs is over 5.3 million, As the EST data being accumulate more rapidly, it becomes bigger that the needs of the EST analysis tools for extraction of biological meaning from EST data. Among the several needs of EST analyses, the extraction of protein sequence or functional motifs from ESTs are important for the identification of their function in vivo. To accomplish that purpose the precise and accurate identification of the region where the coding sequences (CDSs) is a crucial problem to solve primarily, and it will be helpful to extract and detect of genuine CD5s and protein motifs from EST collections. Although several public tools are available for EST analysis, there is not any one to accomplish the object. Furthermore, they are not targeted to the plant ESTs but human or microorganism. Thus, to correspond the urgent needs of collaborators deals with plant ESTs and to establish the analysis system to be used as general-purpose public software we constructed the pipelined-EST analysis system by integration of public software components. The software we used are as follows - Phred/Cross-match for the quality control and vector screening, NCBI Blast for the similarity searching, ICATools for the EST clustering, Phrap for EST contig assembly, and BLOCKS/Prosite for protein motif searching. The sample data set used for the construction and verification of this system was 1,386 ESTs from human intrathymic T-cells that verified using UniGene and Nr database of NCBI. The approach for the extraction of CDSs from sample data set was carried out by comparison between sample data and protein sequences/motif database, determining matched protein sequences/motifs that agree with our defined parameters, and extracting the regions that shows similarities. In recent future, in addition to these components, it is supposed to be also integrated into our system and served that the software for the peptide mass spectrometry fingerprint analysis, one of the proteomics fields. This pipelined-EST analysis system will extend our knowledge on the plant ESTs and proteins by identification of unknown-genes.

  • PDF