• Title/Summary/Keyword: seq2seq

Search Result 216, Processing Time 0.036 seconds

Next Location Prediction with a Graph Convolutional Network Based on a Seq2seq Framework

  • Chen, Jianwei;Li, Jianbo;Ahmed, Manzoor;Pang, Junjie;Lu, Minchao;Sun, Xiufang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.1909-1928
    • /
    • 2020
  • Predicting human mobility has always been an important task in Location-based Social Network. Previous efforts fail to capture spatial dependence effectively, mainly reflected in weakening the location topology information. In this paper, we propose a neural network-based method which can capture spatial-temporal dependence to predict the next location of a person. Specifically, we involve a graph convolutional network (GCN) based on a seq2seq framework to capture the location topology information and temporal dependence, respectively. The encoder of the seq2seq framework first generates the hidden state and cell state of the historical trajectories. The GCN is then used to generate graph embeddings of the location topology graph. Finally, we predict future trajectories by aggregated temporal dependence and graph embeddings in the decoder. For evaluation, we leverage two real-world datasets, Foursquare and Gowalla. The experimental results demonstrate that our model has a better performance than the compared models.

Automatic Conversion of English Pronunciation Using Sequence-to-Sequence Model (Sequence-to-Sequence Model을 이용한 영어 발음 기호 자동 변환)

  • Lee, Kong Joo;Choi, Yong Seok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.5
    • /
    • pp.267-278
    • /
    • 2017
  • As the same letter can be pronounced differently depending on word contexts, one should refer to a lexicon in order to pronounce a word correctly. Phonetic alphabets that lexicons adopt as well as pronunciations that lexicons describe for the same word can be different from lexicon to lexicon. In this paper, we use a sequence-to-sequence model that is widely used in deep learning research area in order to convert automatically from one pronunciation to another. The 12 seq2seq models are implemented based on pronunciation training data collected from 4 different lexicons. The exact accuracy of the models ranges from 74.5% to 89.6%. The aim of this study is the following two things. One is to comprehend a property of phonetic alphabets and pronunciations used in various lexicons. The other is to understand characteristics of seq2seq models by analyzing an error.

Q&A and management AI chatbot service in the context of a university non-face-to-face remote lecture using the Seq2Seq model (Seq2Seq 모델을 활용한 대학교 비대면 원격강의 상황에서 질문 문답 및 관리 인공지능 챗봇 서비스)

  • Na, Dongjun;Ahn, Jaewook;Park, Sejin
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.11a
    • /
    • pp.325-327
    • /
    • 2020
  • 최근 비대면 원격강의의 비율이 증가하였지만 비대면 상황에서 원격으로 진행하는 강의로 인해 강의를 수강하는 학생들의 강의를 진행하는 교수와의 질문에 대한 즉각적인 상호작용과 피드백이 부족하고 교수 또한 비대면 상황에서 학생들과의 소통의 어려움으로 인해 질문에 대한 답변을 하는 것에 어려움 있다. 본 논문에서는 이러한 문제를 해결하기 위해 학생들에게 질문에 대한 즉각적인 답변을 해주고 교수에게는 질문-답변을 관리할 수 있는 인공지능 챗봇 웹 서비스를 제안한다. 웹 서비스는 강의를 수강하는 학생과 강의를 진행하는 교수로 나눠져 제공된다. 구현을 위해 Seq2Seq 모델을 활용하였고 질문-답변 데이터셋으로 학습을 하여 테스트 하였다.

  • PDF

ChIP-seq Analysis of Histone H3K27ac and H3K27me3 Showing Different Distribution Patterns in Chromatin

  • Kang, Jin;Kim, AeRi
    • Biomedical Science Letters
    • /
    • v.28 no.2
    • /
    • pp.109-119
    • /
    • 2022
  • Histone proteins can be modified by the addition of acetyl group or methyl group to specific amino acids. The modifications have different distribution patterns in chromatin. Recently, histone modifications are studied based on ChIP-seq data, which requires reasonable analysis of sequencing data depending on their distribution patterns. Here we have analyzed histone H3K27ac and H3K27me3 ChIP-seq data and it showed that the H3K27ac is enriched at narrow regions while H3K27me3 distributes broadly. To properly analyze the ChIP-seq data, we called peaks for H3K27ac and H3K27me3 using MACS2 (narrow option and broad option) and SICER methods, and compared propriety of the peaks using signal-to-background ratio. As results, H3K27ac-enriched regions were well identified by both methods while H3K27me3 peaks were properly identified by SICER, which indicates that peak calling method is more critical for histone modifications distributed broadly. When ChIP-seq data were compared in different sequencing depth (15, 30, 60, 120 M), high sequencing depth caused high false-positive rate in H3K27ac peak calling, but it reflected more properly the broad distribution pattern of H3K27me3. These results suggest that sequencing depth affects peak calling from ChIP-seq data and high sequencing depth is required for H3K27me3. Taken together, peak calling tool and sequencing depth should be chosen depending on the distribution pattern of histone modification in ChIP-seq analysis.

Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods

  • Yeonjae Ryu;Geun Hee Han;Eunsoo Jung;Daehee Hwang
    • Molecules and Cells
    • /
    • v.46 no.2
    • /
    • pp.106-119
    • /
    • 2023
  • With the increased number of single-cell RNA sequencing (scRNA-seq) datasets in public repositories, integrative analysis of multiple scRNA-seq datasets has become commonplace. Batch effects among different datasets are inevitable because of differences in cell isolation and handling protocols, library preparation technology, and sequencing platforms. To remove these batch effects for effective integration of multiple scRNA-seq datasets, a number of methodologies have been developed based on diverse concepts and approaches. These methods have proven useful for examining whether cellular features, such as cell subpopulations and marker genes, identified from a certain dataset, are consistently present, or whether their condition-dependent variations, such as increases in cell subpopulations in particular disease-related conditions, are consistently observed in different datasets generated under similar or distinct conditions. In this review, we summarize the concepts and approaches of the integration methods and their pros and cons as has been reported in previous literature.

Comparative analysis of HiSeq3000 and BGISEQ-500 sequencing platform with shotgun metagenomic sequencing data

  • Animesh Kumar;Espen M. Robertsen;Nils P. Willassen;Juan Fu;Erik Hjerde
    • Genomics & Informatics
    • /
    • v.21 no.4
    • /
    • pp.49.1-49.11
    • /
    • 2023
  • Recent advances in sequencing technologies and platforms have enabled to generate metagenomics sequences using different sequencing platforms. In this study, we analyzed and compared shotgun metagenomic sequences generated by HiSeq3000 and BGISEQ-500 platforms from 12 sediment samples collected across the Norwegian coast. Metagenomics DNA sequences were normalized to an equal number of bases for both platforms and further evaluated by using different taxonomic classifiers, reference databases, and assemblers. Normalized BGISEQ-500 sequences retained more reads and base counts after preprocessing, while a slightly higher fraction of HiSeq3000 sequences were taxonomically classified. Kaiju classified a higher percentage of reads relative to Kraken2 for both platforms, and comparison of reference database for taxonomic classification showed that MAR database outperformed RefSeq. Assembly using MEGAHIT produced longer assemblies and higher total contigs count in majority of HiSeq3000 samples than using metaSPAdes, but the assembly statistics notably improved with unprocessed or normalized reads. Our results indicate that both platforms perform comparably in terms of the percentage of taxonomically classified reads and assembled contig statistics for metagenomics samples. This study provides valuable insights for researchers in selecting an appropriate sequencing platform and bioinformatics pipeline for their metagenomics studies.

A demonstration of the H3 trimethylation ChIP-seq analysis of galline follicular mesenchymal cells and male germ cells

  • Chokeshaiusaha, Kaj;Puthier, Denis;Nguyen, Catherine;Sananmuang, Thanida
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.6
    • /
    • pp.791-797
    • /
    • 2018
  • Objective: Trimethylation of histone 3 (H3) at 4th lysine N-termini (H3K4me3) in gene promoter region was the universal marker of active genes specific to cell lineage. On the contrary, coexistence of trimethylation at 27th lysine (H3K27me3) in the same loci-the bivalent H3K4m3/H3K27me3 was known to suspend the gene transcription in germ cells, and could also be inherited to the developed stem cell. In galline species, throughout example of H3K4m3 and H3K27me3 ChIP-seq analysis was still not provided. We therefore designed and demonstrated such procedures using ChIP-seq and mRNA-seq data of chicken follicular mesenchymal cells and male germ cells. Methods: Analytical workflow was designed and provided in this study. ChIP-seq and RNA-seq datasets of follicular mesenchymal cells and male germ cells were acquired and properly preprocessed. Peak calling by Model-based analysis of ChIP-seq 2 was performed to identify H3K4m3 or H3K27me3 enriched regions ($Fold-change{\geq}2$, $FDR{\leq}0.01$) in gene promoter regions. Integrative genomics viewer was utilized for cellular retinoic acid binding protein 1 (CRABP1), growth differentiation factor 10 (GDF10), and gremlin 1 (GREM1) gene explorations. Results: The acquired results indicated that follicular mesenchymal cells and germ cells shared several unique gene promoter regions enriched with H3K4me3 (5,704 peaks) and also unique regions of bivalent H3K4m3/H3K27me3 shared between all cell types and germ cells (1,909 peaks). Subsequent observation of follicular mesenchyme-specific genes-CRABP1, GDF10, and GREM1 correctly revealed vigorous transcriptions of these genes in follicular mesenchymal cells. As expected, bivalent H3K4m3/H3K27me3 pattern was manifested in gene promoter regions of germ cells, and thus suspended their transcriptions. Conclusion: According the results, an example of chicken H3K4m3/H3K27me3 ChIP-seq data analysis was successfully demonstrated in this study. Hopefully, the provided methodology should hereby be useful for galline ChIP-seq data analysis in the future.

Long-tail Query Expansion using Extractive and Generative Methods (롱테일 질의 확장을 위한 추출 및 생성 기반 모델)

  • Kim, Lae-Seon;Kim, Seong-soon;Jang, Heon-Seok;Park, Seok-Won;Kang, In-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.267-273
    • /
    • 2020
  • 검색 엔진에 입력되는 질의 중 입력 빈도는 낮지만 상대적으로 길이가 긴 질의를 롱테일 질의라고 일컫는다. 롱테일 질의가 전체 검색 로그에서 차지하는 비중은 높은 반면, 그 형태가 매우 다양하고 검색 의도가 상세하며 개별 질의의 양은 충분하지 않은 경우가 많기 때문에 해당 질의에 대한 적절한 검색어를 추천하는 것은 어려운 문제다. 본 논문에서는 롱테일 질의 입력 시 적절한 검색어 추천을 제공하기 위하여 질의-문서 클릭 정보를 활용한 추출기반 모델 및 Seq2seq와 GPT-2 기반 생성모델을 활용한 질의 확장 방법론을 제안한다. 실험 및 결과 분석을 통하여 제안 방법이 기존에 대응하지 못했던 롱테일 질의를 자연스럽게 확장할 수 있음을 보였다. 본 연구 결과를 실제 서비스에 접목함으로써 사용자의 검색 편리성을 증대하는 동시에, 언어 모델링 기반 질의 확장에 대한 가능성을 확인하였다.

  • PDF

Development of Block-based Code Generation and Recommendation Model Using Natural Language Processing Model (자연어 처리 모델을 활용한 블록 코드 생성 및 추천 모델 개발)

  • Jeon, In-seong;Song, Ki-Sang
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.3
    • /
    • pp.197-207
    • /
    • 2022
  • In this paper, we develop a machine learning based block code generation and recommendation model for the purpose of reducing cognitive load of learners during coding education that learns the learner's block that has been made in the block programming environment using natural processing model and fine-tuning and then generates and recommends the selectable blocks for the next step. To develop the model, the training dataset was produced by pre-processing 50 block codes that were on the popular block programming language web site 'Entry'. Also, after dividing the pre-processed blocks into training dataset, verification dataset and test dataset, we developed a model that generates block codes based on LSTM, Seq2Seq, and GPT-2 model. In the results of the performance evaluation of the developed model, GPT-2 showed a higher performance than the LSTM and Seq2Seq model in the BLEU and ROUGE scores which measure sentence similarity. The data results generated through the GPT-2 model, show that the performance was relatively similar in the BLEU and ROUGE scores except for the case where the number of blocks was 1 or 17.

COVID-19 Chat Bot by using Deep Learning (딥러닝을 이용한 코로나 챗봇)

  • Lee, Se-Hoon;Jeong, Ji-Seok;Kim, Young-Jin;Kwon, Hyeon-guen;Seo, Hee-Ju
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.315-316
    • /
    • 2020
  • 본 논문에서는 현재 이슈가 되고 있는 코로나에 대해서 사람들이 실생활에서 궁금해할 정보들을 Seq2seq 기술을 사용한 챗봇으로 정보를 제공한다.

  • PDF