• 제목/요약/키워드: seq2seq

검색결과 219건 처리시간 0.032초

Computational approaches for prediction of protein-protein interaction between Foot-and-mouth disease virus and Sus scrofa based on RNA-Seq

  • Park, Tamina;Kang, Myung-gyun;Nah, Jinju;Ryoo, Soyoon;Wee, Sunghwan;Baek, Seung-hwa;Ku, Bokkyung;Oh, Yeonsu;Cho, Ho-seong;Park, Daeui
    • 한국동물위생학회지
    • /
    • 제42권2호
    • /
    • pp.73-83
    • /
    • 2019
  • Foot-and-Mouth Disease (FMD) is a highly contagious trans-boundary viral disease caused by FMD virus, which causes huge economic losses. FMDV infects cloven hoofed (two-toed) mammals such as cattle, sheep, goats, pigs and various wildlife species. To control the FMDV, it is necessary to understand the life cycle and the pathogenesis of FMDV in host. Especially, the protein-protein interaction between FMDV and host will help to understand the survival cycle of viruses in host cell and establish new therapeutic strategies. However, the computational approach for protein-protein interaction between FMDV and pig hosts have not been applied to studies of the onset mechanism of FMDV. In the present work, we have performed the prediction of the pig's proteins which interact with FMDV based on RNA-Seq data, protein sequence, and structure information. After identifying the virus-host interaction, we looked for meaningful pathways and anticipated changes in the host caused by infection with FMDV. A total of 78 proteins of pig were predicted as interacting with FMDV. The 156 interactions include 94 interactions predicted by sequence-based method and the 62 interactions predicted by structure-based method using domain information. The protein interaction network contained integrin as well as STYK1, VTCN1, IDO1, CDH3, SLA-DQB1, FER, and FGFR2 which were related to the up-regulation of inflammation and the down-regulation of cell adhesion and host defense systems such as macrophage and leukocytes. These results provide clues to the knowledge and mechanism of how FMDV affects the host cell.

문장대문장 학습을 이용한 음차변환 모델과 한글 음차변환어의 발음 유사도 기반 부분매칭 방법론 (A Transliteration Model based on the Seq2seq Learning and Methods for Phonetically-Aware Partial Match for Transliterated Terms in Korean)

  • 박주희;박원준;서희철
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2018년도 제30회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.443-448
    • /
    • 2018
  • 웹검색 결과의 품질 향상을 위해서는 질의의 정확한 매칭 뿐만이 아니라, 서로 같은 대상을 지칭하는 한글 문자열과 영문 문자열(예: 네이버-naver)의 매칭과 같은 유연한 매칭 또한 중요하다. 본 논문에서는 문장대문장 학습을 통해 영문 문자열을 한글 문자열로 음차변환하는 방법론을 제시한다. 또한 음차변환 결과로 얻어진 한글 문자열을 동일 영문 문자열의 다양한 음차변환 결과와 매칭시킬 수 있는 발음 유사성 기반 부분 매칭 방법론을 제시하고, 위키피디아의 리다이렉트 키워드를 활용하여 이들의 성능을 정량적으로 평가하였다. 이를 통해 본 논문은 문장대문장 학습 기반의 음차 변환 결과가 복잡한 문맥을 고려할 수 있으며, Damerau-Levenshtein 거리의 계산에 자모 유사도를 활용하여 기존에 비해 효과적으로 한글 키워드들 간의 부분매칭이 가능함을 보였다.

  • PDF

Identification of Hemimethylcted DNA Binding Activity in the seqA Mutant

  • Lee, Ho;Kang, Suk-Hyun;Yim, Jeong-Bin;Hwang, Deog-Su
    • Animal cells and systems
    • /
    • 제2권3호
    • /
    • pp.351-353
    • /
    • 1998
  • A 245 bp segment of E. coli chromosomal replication origin, oriC, contains 11 repeats of the GATC sequence in which adenine is methylated by Dam methylase. Newly replicated oriC is hemimethylated. The parental strand of the newly replicated oriC is methylated, but the nascent strand is not yet methylated until methylated by Dam methylase. The hemimethylated oriC plays an important role in the regulation of chromosomal replication. Activity in the seqA mutant was identified to bind preferentially to hemimethylated DNA, but not to fully-methylated DNA. This activity may participate in the sequestration of initiation of chromosomal replication.

  • PDF

한국어 오픈 도메인 대화 모델의 CTRL을 활용한 혐오 표현 생성 완화 (Mitigating Hate Speech in Korean Open-domain Chatbot using CTRL)

  • 좌승연;차영록;한문수;신동훈
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2021년도 제33회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.365-370
    • /
    • 2021
  • 대형 코퍼스로 학습한 언어 모델은 코퍼스 안의 사회적 편견이나 혐오 표현까지 학습한다. 본 연구에서는 한국어 오픈 도메인 대화 모델에서 혐오 표현 생성을 완화하는 방법을 제시한다. Seq2seq 구조인 BART [1]를 기반으로 하여 컨트롤 코드을 추가해 혐오 표현 생성 조절을 수행하였다. 컨트롤 코드를 사용하지 않은 기준 모델(Baseline)과 비교한 결과, 컨트롤 코드를 추가해 학습한 모델에서 혐오 표현 생성이 완화되었고 대화 품질에도 변화가 없음을 확인하였다.

  • PDF

한국어 토큰-프리 사전학습 언어모델 KeByT5를 이용한 한국어 생성 기반 대화 상태 추적 (Korean Generation-based Dialogue State Tracking using Korean Token-Free Pre-trained Language Model KeByT5)

  • 이기영;신종훈;임수종;권오욱
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2023년도 제35회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.644-647
    • /
    • 2023
  • 대화 시스템에서 대화 상태 추적은 사용자와의 대화를 진행하면서 사용자의 의도를 파악하여 시스템 응답을 결정하는데 있어서 중요한 역할을 수행한다. 특히 목적지향(task-oriented) 대화에서 사용자 목표(goal)를 만족시키기 위해서 대화 상태 추적은 필수적이다. 최근 다양한 자연어처리 다운스트림 태스크들이 사전학습 언어모델을 백본 네트워크로 사용하고 그 위에서 해당 도메인 태스크를 미세조정하는 방식으로 좋은 성능을 내고 있다. 본 논문에서는 한국어 토큰-프리(token-free) 사전학습 언어모델인 KeByT5B 사용하고 종단형(end-to-end) seq2seq 방식으로 미세조정을 수행한 한국어 생성 기반 대화 상태 추적 모델을 소개하고 관련하여 수행한 실험 결과를 설명한다.

  • PDF

Deep Learning-based Delinquent Taxpayer Prediction: A Scientific Administrative Approach

  • YongHyun Lee;Eunchan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권1호
    • /
    • pp.30-45
    • /
    • 2024
  • This study introduces an effective method for predicting individual local tax delinquencies using prevalent machine learning and deep learning algorithms. The evaluation of credit risk holds great significance in the financial realm, impacting both companies and individuals. While credit risk prediction has been explored using statistical and machine learning techniques, their application to tax arrears prediction remains underexplored. We forecast individual local tax defaults in Republic of Korea using machine and deep learning algorithms, including convolutional neural networks (CNN), long short-term memory (LSTM), and sequence-to-sequence (seq2seq). Our model incorporates diverse credit and public information like loan history, delinquency records, credit card usage, and public taxation data, offering richer insights than prior studies. The results highlight the superior predictive accuracy of the CNN model. Anticipating local tax arrears more effectively could lead to efficient allocation of administrative resources. By leveraging advanced machine learning, this research offers a promising avenue for refining tax collection strategies and resource management.

Analysis of opposing histone modifications H3K4me3 and H3K27me3 reveals candidate diagnostic biomarkers for TNBC and gene set prediction combination

  • Park, Hyoung-Min;Kim, HuiSu;Lee, Kang-Hoon;Cho, Je-Yoel
    • BMB Reports
    • /
    • 제53권5호
    • /
    • pp.266-271
    • /
    • 2020
  • Breast cancer encompasses a major portion of human cancers and must be carefully monitored for appropriate diagnoses and treatments. Among the many types of breast cancers, triple negative breast cancer (TNBC) has the worst prognosis and the least cases reported. To gain a better understanding and a more decisive precursor for TNBC, two major histone modifications, an activating modification H3K4me3 and a repressive modification H3K27me3, were analyzed using data from normal breast cell lines against TNBC cell lines. The combination of these two histone markers on the gene promoter regions showed a great correlation with gene expression. A list of signature genes was defined as active (highly enriched H3K4me3), including NOVA1, NAT8L, and MMP16, and repressive genes (highly enriched H3K27me3), IRX2 and ADRB2, according to the distribution of these histone modifications on the promoter regions. To further enhance the investigation, potential candidates were also compared with other types of breast cancer to identify signs specific to TNBC. RNA-seq data was implemented to confirm and verify gene regulation governed by the histone modifications. Combinations of the biomarkers based on H3K4me3 and H3K27me3 showed the diagnostic value AUC 93.28% with P-value of 1.16e-226. The results of this study suggest that histone modification analysis of opposing histone modifications may be valuable toward developing biomarkers and targets for TNBC.

RNA-seq profiling of skin in temperate and tropical cattle

  • Morenikeji, Olanrewaju B.;Ajayi, Oyeyemi O.;Peters, Sunday O.;Mujibi, Fidalis D.;De Donato, Marcos;Thomas, Bolaji N.;Imumorin, Ikhide G.
    • Journal of Animal Science and Technology
    • /
    • 제62권2호
    • /
    • pp.141-158
    • /
    • 2020
  • Skin is a major thermoregulatory organ in the body controlling homeothermy, a critical function for climate adaptation. We compared genes expressed between tropical- and temperate-adapted cattle to better understand genes involved in climate adaptation and hence thermoregulation. We profiled the skin of representative tropical and temperate cattle using RNA-seq. A total of 214,754,759 reads were generated and assembled into 72,993,478 reads and were mapped to unique regions in the bovine genome. Gene coverage of unique regions of the reference genome showed that of 24,616 genes, only 13,130 genes (53.34%) displayed more than one count per million reads for at least two libraries and were considered suitable for downstream analyses. Our results revealed that of 255 genes expressed differentially, 98 genes were upregulated in tropically-adapted White Fulani (WF; Bos indicus) and 157 genes were down regulated in WF compared to Angus, AG (Bos taurus). Fifteen pathways were identified from the differential gene sets through gene ontology and pathway analyses. These include the significantly enriched melanin metabolic process, proteinaceous extracellular matrix, inflammatory response, defense response, calcium ion binding and response to wounding. Quantitative PCR was used to validate six representative genes which are associated with skin thermoregulation and epithelia dysfunction (mean correlation 0.92; p < 0.001). Our results contribute to identifying genes and understanding molecular mechanisms of skin thermoregulation that may influence strategic genomic selection in cattle to withstand climate adaptation, microbial invasion and mechanical damage.

Identification of a novel immune-related gene in the immunized black soldier fly, Hermetia illucens (L.)

  • Jung, Seong-Tae;Goo, Tae-Won;Kim, Seong Ryul;Choi, Gwang-Ho;Kim, Sung-Wan;Nga, Pham Thi;Park, Seung-Won
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • 제36권2호
    • /
    • pp.25-30
    • /
    • 2018
  • The larvae of Hermetia. illucens have a high probability of coming into contact with microorganisms such as bacteria and fungi. Therefore, the survival of H. illucens is primarily the protection of their own against microbial infection. This effect depends on the development of the innate immune system. Antimicrobial Peptides (AMPs) exhibit antimicrobial activity against other bacterial strains and can provide important data to understand the basis of the innate immunity of H. illucens. In this study, we injected larvae with Enterococcus. faecalis (gram-positive bacteria) and Serratia. marcescens as (gram-negative bacteria) to test the hypothesis that H. illucens is protected from infection by its immune-related gene expression repertoire. To identify the inducible immune-related genes, we performed and cataloged the transcriptomes by RNA-Seq analysis. We compared the transcriptomes of whole larvae and obtained a DNA fragment of 465 bp including the poly (A) tail by RACE as a novel H. illucens immune-related gene against bacteria. A novel target mRNA expression was higher in immunized larvae with E. faecalis and S. marcescens groups than non-immunized group. We expect our study to provide evidence that the global RNA-Seq approach allowed for the identification of a gene of interest which was further analyzed by quantitative RT-PCR, together with genes chosen from the available literature.

Single-cell RNA sequencing identifies distinct transcriptomic signatures between PMA/ionomycin- and αCD3/αCD28-activated primary human T cells

  • Jung Ho Lee;Brian H Lee;Soyoung Jeong;Christine Suh-Yun Joh;Hyo Jeong Nam;Hyun Seung Choi;Henry Sserwadda;Ji Won Oh;Chung-Gyu Park;Seon-Pil Jin;Hyun Je Kim
    • Genomics & Informatics
    • /
    • 제21권2호
    • /
    • pp.18.1-18.11
    • /
    • 2023
  • Immunologists have activated T cells in vitro using various stimulation methods, including phorbol myristate acetate (PMA)/ionomycin and αCD3/αCD28 agonistic antibodies. PMA stimulates protein kinase C, activating nuclear factor-κB, and ionomycin increases intracellular calcium levels, resulting in activation of nuclear factor of activated T cell. In contrast, αCD3/αCD28 agonistic antibodies activate T cells through ZAP-70, which phosphorylates linker for activation of T cell and SH2-domain-containing leukocyte protein of 76 kD. However, despite the use of these two different in vitro T cell activation methods for decades, the differential effects of chemical-based and antibody-based activation of primary human T cells have not yet been comprehensively described. Using single-cell RNA sequencing (scRNA-seq) technologies to analyze gene expression unbiasedly at the single-cell level, we compared the transcriptomic profiles of the non-physiological and physiological activation methods on human peripheral blood mononuclear cell-derived T cells from four independent donors. Remarkable transcriptomic differences in the expression of cytokines and their respective receptors were identified. We also identified activated CD4 T cell subsets (CD55+) enriched specifically by PMA/ionomycin activation. We believe this activated human T cell transcriptome atlas derived from two different activation methods will enhance our understanding, highlight the optimal use of these two in vitro T cell activation assays, and be applied as a reference standard when analyzing activated specific disease-originated T cells through scRNA-seq.