• Title/Summary/Keyword: Seq2Seq(Sequence to Sequence)

Search Result 47, Processing Time 0.032 seconds

Genome Survey and Microsatellite Marker Selection of Tegillarca granosa (꼬막(Tegillarca granosa)의 유전적 다양성 분석을 위한 드래프트 게놈분석과 마이크로새틀라이트 마커 발굴)

  • Kim, Jinmu;Lee, Seung Jae;Jo, Euna;Choi, Eunkyung;Kim, Hyeon Jin;Lee, Jung Sick;Park, Hyun
    • Journal of Marine Life Science
    • /
    • v.6 no.1
    • /
    • pp.38-46
    • /
    • 2021
  • The blood clam, Tegillarca granosa, is economically important in marine bivalve and is used in fisheries industry among western Pacific Ocean Coasts especially in Korea, China, and Japan. The number of chromosomes in the blood clam is known as 2n=38, but the genome size and genetic information of the genome are not still clear. In order to predict the genomic size of the T. granosa, the in-silico analysis analysed the genomic size using short DNA sequence information obtained using the NGS Illumina HiSeq platform. As a result, the genomic size of T. granosa was estimated to be 770.61 Mb. Subsequently, a draft genome assembly was performed through the MaSuRCA assembler, and a simple sequence repeat (SSR) analysis was done by using the QDD pipeline. 43,944 SSRs were detected from the genome of T. granosa and 69.51% di-nucleotide, 16.68% trinucleotide, 12.96% tetra-nucleotide, 0.82% penta-nucleotide, and 0.03% hexa-nucleotide were consisted. 100 primer sets that could be used for genetic diversity studies were selected. In the future, this study will help identify the genetic diversity of T. granosa and population genetic studies, and further identify the classification of origin between homogenous groups.

Transcriptome analysis of a medicinal plant, Pistacia chinensis

  • Choi, Ki-Young;Park, Duck Hwan;Seong, Eun-Soo;Lee, Sang Woo;Hang, Jin;Yi, Li Wan;Kim, Jong-Hwa;Na, Jong-Kuk
    • Journal of Plant Biotechnology
    • /
    • v.46 no.4
    • /
    • pp.274-281
    • /
    • 2019
  • Pistacia chinensis Bunge has not only been used as a medicinal plant to treat various illnesses but its young shoots and leaves have also been used as vegetables. In addition, P. chinensis is used as a rootstock for Pistacia vera (pistachio). Here, the transcriptome of P. chinensis was sequenced to enrich genetic resources and identify secondary metabolite biosynthetic pathways using Illumina RNA-seq methods. De novo assembly resulted in 18,524 unigenes with an average length of 873 bp from 19 million RNA-seq reads. A Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation tool assigned KO (KEGG orthology) numbers to 6,553 (36.2%) unigenes, among which 4,061 unigenes were mapped into 391 different metabolic pathways. For terpenoid backbone and carotenoid biosynthesis pathways, 44 and 22 unigenes encode enzymes corresponding to 30 and 16 entries, respectively. Twenty-two unigenes encode proteins for 16 entries of the carotenoid biosynthesis pathway. As for the phenylpropanoid and flavonoid biosynthesis pathways, 63 and 24 unigenes were homologous to 17 and 14 entry proteins, respectively. Mining of simple sequence repeat identified 2,599 simple sequence repeats from P. chinensis unigenes. The results of the present study provide a valuable resource for in-depth studies on comparative and functional genomics to unravel the underlying mechanisms of the medicinal properties of Pistacia L.

Characterization of simple sequence repeats (SSRs) in Pleurotus pulmonarius cultivars (산느타리(Pleurotus pulmonarius) 품종의 초위성체(simple sequence repeats) 특성구명)

  • Choi, Jong In;Na, Kyeong Sook;Oh, Min-Ji;Ryu, Jae-San
    • Journal of Mushroom
    • /
    • v.19 no.4
    • /
    • pp.341-346
    • /
    • 2021
  • Simple sequence repeats (SSRs) were isolated from major Pleurotus pulmonarius cultivars in Korea, namely 'HS47' (monokaryon, gamete of 'Santari'), 'GB19' (monokaryon, gamete of 'Santari'), 'Hosan,' 'Yeoleumneutali1,' 'Sambok,' 'Gangsan,' 'Yaksan,' 'Jasan,' 'Hyangsan,' and 'Yeoleumneutali2,' and characterized via HiSeq genome sequencing and bioinformatic analysis. The genome sizes of the monokaryons 'HS47' and 'GB19' were estimated to be 37.3 and 37.2 Mb, respectively, and those of the other dikaryotic cultivars ranged from 47.1 to 61.1 Mb. A total of 711 (smallest) and 1,106 (1.5 times the smallest) SSRs were found in the 'HS47' and 'Gangsan' genomes, respectively. Hexanucleotide and octanucleotide motifs accounted for the top two fractions of all SSRs. CGA/TCG, A/T, and CTC/GAG were the most frequently detected nucleotides in the SSRs. Most of the SSRs were 21~30 nucleotides long (hypervariable for application), accounting for 70% of all lengths of SSRs.

RNA-Seq Transcriptome Analysis of the Cutlass Fish Reveals Photoreceptors Gene Expression in Peripheral Tissues (RNA-Seq transcriptome 분석을 통한 갈치 광수용체 유전자 탐색 및 mRNA 조직발현)

  • Hyeon, Ji-Yeon;Kim, Mun-Kwan;Lim, Bong-Soo;Byun, Jun-Hwan;Moon, Ji-Sung;Kang, Hyeong-Cheol;Hur, Sung-Pyo;Oh, Seong-Rip
    • Ocean and Polar Research
    • /
    • v.39 no.2
    • /
    • pp.149-158
    • /
    • 2017
  • The opsin family of light sensitive proteins family makes up are the universal photoreceptor molecules of all visual systems in the vertebrates including teleosts. They can change their conformation from a resting state to a signaling state upon light absorption, which activates the G-protein coupled receptor, thereby resulting in a signaling cascade that produces physiological responses. However, this species is poorly characterized at molecular level due to little sequence information available in public databases. We have investigated the opsin family of nocturnal cutlass fish using the whole transcriptome sequencing method. The opsin genes were cloned and its expression in the tissues and organs were examined by qPCR. We cloned 6 opsin genes (RRH, Opn4, Rh1, Rh2, VA-opsin, and Opn3) in retina and brain tissue. It contained the seven presumed transmembrane domains that are characteristic of the G-protein-coupled receptor family. However, short wavelength sensitive pigment (SWS) and long wavelength sensitive pigment (LWS) were not detected in this study. The mRNA expression of the 6 photoreceptor genes were detected in retina and peripheral tissue. Our studies will lead to further investigation of the photic entrainment mechanism at molecular and cellular levels in cutlass fish and can be used in comparative studies of other fishes.

Microbial Community of Tannery Wastewater Involved in Nitrification Revealed by Illumina MiSeq Sequencing

  • Ma, Xiaojian;Wu, Chongde;Jun, Huang;Zhou, Rongqing;Shi, Bi
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.7
    • /
    • pp.1168-1177
    • /
    • 2018
  • The aim of this study was to investigate the microbial community of three tannery wastewater treatment plants (WWTPs) involved in nitrification by Illumina MiSeq sequencing. The results showed that highly diverse communities were present in tannery wastewater. A total of six phyla, including Proteobacteria (37-41%), Bacteroidetes (6.04-16.80), Planctomycetes (3.65-16.55), Chloroflexi (2.51-11.48), Actinobacteria (1.91-9.21), and Acidobacteria (3.04-6.20), were identified as the main phyla, and Proteobacteria dominated in all the samples. Within Proteobacteria, Beta-proteobacteria was the most abundant class, with the sequence percentages ranging from 9.66% to 17.44%. Analysis of the community at the genus level suggested that Thauera, Gp4, Ignavibacterium, Phycisphaera, and Arenimonas were the core genera shared by at least two tannery WWTPs. A detailed analysis of the abundance of ammonia-oxidizing bacteria (AOB) and nitrite-oxidizing bacteria (NOB) indicated that Nitrosospira, Nitrosomonas, and Nitrospira were the main AOB and NOB in tannery wastewater, respectively, which exhibited relatively high abundance in all samples. In addition, real-time quantitative PCR was conducted to validate the results by quantifying the abundance of the AOB and total bacteria, and similar results were obtained. Overall, the results presented in this study may provide new insights into our understanding of key microorganisms and the entire community of tannery wastewater and contribute to improving the nitrogen removal efficiency.

Microbial Forensics: Comparison of MLVA Results According to NGS Methods, and Forensic DNA Analysis Using MLVA (미생물법의학: 차세대염기서열분석 방법에 따른 MLVA 결과 비교 및 이를 활용한 DNA 감식)

  • Hyeongseok Yun;Seungho Lee;Seunghyun Lim;Daesang Lee;Sehun Gu;Jungeun Kim;Juhwan Jeong;Seongjoo Kim;Gyeunghaeng Hur;Donghyun Song
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.4
    • /
    • pp.507-515
    • /
    • 2024
  • Microbial forensics is a scientific discipline for analyzing evidence related to biological crimes by identifying the origin of microorganisms. Multiple locus variable number tandem repeat analysis(MLVA) is one of the microbiological analysis methods used to specify subtypes within a species based on the number of tandem repeat in the genome, and advances in next generation sequencing(NGS) technology have enabled in silico anlysis of full-length whole genome sequences. In this paper, we analyzed unknown samples provided by Robert Koch Institute(RKI) through The United Nations Secretary-General's Mechanism(UNSGM)'s external quality assessment exercise(EQAE) project, which we officially participated in 2023. We confirmed that the 3 unknown samples were B. anthracis through nucleic acid isolation and genetic sequence analysis studies. MLVA results on 32 loci of B. anthracis were analysed by using genome sequences obtained from NGS(NextSeq and MinION) and Sanger sequencing. The MLVA typing using short-reads based NGS platform(NextSeq) showed a high probability of causing assembly error when a size of the tandem repeats was grater than 200 bp, while long-reads based NGS platform(MinION) showed higher accuracy than NextSeq, although insertion and deletion was observed. We also showed hybrid assembly can correct most indel error caused by MinION. Based on the MLVA results, genetic identification was performed compared to the 2,975 published MLVA databases of B. anthracis, and MLVA results of 10 strains were identical with 3 unkonwn samples. As a result of whole genome alignment of the 10 strains and 3 unknown samples, all samples were identified as B. anthracis strain A4564 which is associated with injectional anthrax isolates in heroin users.

Analysis of germinating seed stage expressed sequence tags in Oryza sativa L. (벼 발아종자 발현유전자의 발현특성분석)

  • Yoon, Ung-Han;Lee, Gang-Seob;Kim, Chang-Kug;Lee, Jung-Sook;Hahn, Jang-Ho;Yun, Doh-Won;Ji, Hyeon-So;Lee, Tae-Ho;Lee, Jeong-Hwa;Park, Sung-Han;Kim, Gun-Wook;Seo, Mi-Suk;Kim, Yong-Hwan
    • Journal of Plant Biotechnology
    • /
    • v.36 no.3
    • /
    • pp.281-288
    • /
    • 2009
  • Seed germination is the important stage to express many genes for regulation of energy metabolism, starch degradation and cell division from seed dormancy state. For the functional analysis of seed germination mechanisms, we were analyzed the rice cDNA clones (Oryzasativa cultivar Ilpum) obtained from seed imbibition during 48 hours. Total number of 18,101 Expressed Sequence Tags (ESTs) were clustered using SeqMan program. Among them, 8,836 clones were identified as unique clones. We identified the chitinase gene specifically expressed in seed germination and amylase gene involved to starch degradation from the full length cDNA analysis, and several genes were registered to NCBI GeneBank. To analyzed the commonly expressed genes between inmature seed and germinated seed, 25,66 inmature ESTs and 18,101 germinated ESTs were clustered using SeqMan program and identified 2,514 clones as commonly expressed unigene. Among them, alpha-glubulin and alcohol dehydrogenase I were supposed to LEA genes only expressed in the immature and germinated seed stages. For the clustering of orthologous group genes, we further analyzed the 8,836 EST clones from germinating seeds using NCBI clusters of orthologous groups database. Among the clones, 5,076 clones were categorized into information storage and processing, cellular processes and signaling, metabolism and poorly characterized genes, proportioning 783 (14.29%), 1,484 (27%), 1,363 (24.8%) and 1,869 (34%) clones to the previous four categories, respectively.

Development of SNP markers for the identification of apple flesh color based on RNA-Seq data (RNA-Seq data를 이용한 사과 과육색 판별 SNP 분자표지 개발)

  • Kim, Se Hee;Park, Seo Jun;Cho, Kang Hee;Lee, Han Chan;Lee, Jung Woo;Choi, In Myung
    • Journal of Plant Biotechnology
    • /
    • v.44 no.4
    • /
    • pp.372-378
    • /
    • 2017
  • For comparison of the transcription profiles in apple (Malus domestica L.) cultivars differing in flesh color expression, two cDNA libraries were constructed. Differences in gene expression between red flesh apple cultivar, 'Redfield' and white flesh apple cultivar, 'Granny Smith' were investigated by next-generation sequencing (NGS). Expressed sequence tag (EST) of clones from the red flesh apple cultivar and white flesh apple cultivar were selected for nucleotide sequence determination and homology searches. High resolution melting (HRM) technique measures temperature induced strand separation of short PCR amplicons, and is able to detect variation as small as one base difference between red flesh apple cultivars and white flesh apple cultivars. We applied high resolution melting (HRM) analysis to discover single nucleotide polymorphisms (SNP) based on the predicted SNP information derived from the apple EST database. All 103 pairs of SNPs were discriminated, and the HRM profiles of amplicons were established. Putative SNPs were screened from the apple EST contigs by HRM analysis displayed specific difference between 10 red flesh apple cultivars and 11 white flesh apple cultivars. In this study, we report an efficient method to develop SNP markers from an EST database with HRM analysis in apple. These SNP markers could be useful for apple marker assisted breeding and provide a good reference for relevant research on molecular mechanisms of color variation in apple cultivars.

Complete Mitochondrial Genome Sequences of Korean Phytophthora infestans Isolates and Comparative Analysis of Mitochondrial Haplotypes

  • Seo, Jin-Hee;Choi, Jang-Gyu;Park, Hyun-Jin;Cho, Ji-Hong;Park, Young-Eun;Im, Ju-Sung;Hong, Su-Young;Cho, Kwang-Soo
    • The Plant Pathology Journal
    • /
    • v.38 no.5
    • /
    • pp.541-549
    • /
    • 2022
  • Potato late blight caused by Phytophthora infestans is a destructive disease in Korea. To elucidate the genomic variation of the mitochondrial (mt) genome, we assembled its complete mt genome and compared its sequence among different haplotypes. The mt genome sequences of four Korean P. infestans isolates were revealed by Illumina HiSeq. The size of the circular mt genome of the four major genotypes, KR_1_A1, KR_2_A2, SIB-1, and US-11, was 39,872, 39,836, 39,872, and 39,840 bp, respectively. All genotypes contained the same 61 genes in the same order, comprising two RNA-encoding genes, 16 ribosomal genes, 25 transfer RNA, 17 genes encoding electron transport and ATP synthesis, 11 open reading frames of unknown function, and one protein import-related gene, tatC. The coding region comprised 91% of the genome, and GC content was 22.3%. The haplotypes were further analyzed based on sequence polymorphism at two hypervariable regions (HVRi), carrying a 2 kb insertion/deletion sequence, and HVRii, carrying 36 bp variable number tandem repeats (VNTRs). All four genotypes carried the 2 kb insertion/deletion sequence in HVRi, whereas HVRii had two VNTRs in KR_1_A1 and SIB-1 but three VNTRs in US-11 and KR_2_A2. Minimal spanning network and phylogenetic analysis based on 5,814 bp of mtDNA sequences from five loci, KR_1_A1 and SIB-1 were classified as IIa-6 haplotype, and isolates KR_1_A2 and US-11 as haplotypes IIa-5 and IIb-2, respectively. mtDNA sequences of KR_1_A1 and SIB-1 shared 100% sequence identity, and both were 99.9% similar to those of KR_2_A2 and US-11.

Data Processing and Analysis of Non-Intrusive Electrical Appliances Load Monitoring in Smart Farm (스마트팜 개별 전기기기의 비간섭적 부하 식별 데이터 처리 및 분석)

  • Kim, Hong-Su;Kim, Ho-Chan;Kang, Min-Jae;Jwa, Jeong-Woo
    • Journal of IKEEE
    • /
    • v.24 no.2
    • /
    • pp.632-637
    • /
    • 2020
  • The non-intrusive load monitoring (NILM) is an important way to cost-effective real-time monitoring the energy consumption and time of use for each appliance in a home or business using aggregated energy from a single recording meter. In this paper, we collect from the smart farm's power consumption data acquisition system to the server via an LTE modem, converted the total power consumption, and the power of individual electric devices into HDF5 format and performed NILM analysis. We perform NILM analysis using open source denoising autoencoder (DAE), long short-term memory (LSTM), gated recurrent unit (GRU), and sequence-to-point (seq2point) learning methods.