• 제목/요약/키워드: sequence data

검색결과 3,108건 처리시간 0.027초

3차원 탄성파자료의 층서구분을 위한 패치기반 기계학습 방법의 개선 (Improvements in Patch-Based Machine Learning for Analyzing Three-Dimensional Seismic Sequence Data)

  • 이동욱;문혜진;김충호;문성훈;이수환;주형태
    • 지구물리와물리탐사
    • /
    • 제25권2호
    • /
    • pp.59-70
    • /
    • 2022
  • 최근의 연구들을 통해 기계학습은 탄성파 해석 분야에 그 적용 범위를 확장하고 있으며, 탄성파 해석에서 중요한 탄성파 층서 구분을 수행하는 합성곱 신경망들의 개발도 수행되었다. 하지만 지도 학습의 경우 대량의 학습 자료가 필요하며, 비용과 시간의 한계로 탄성파 층서구분의 지도학습은 학습 자료의 부족이 문제가 될 수 있다. 이번 연구에서는 자료 부족 문제를 보완하기위해 탄성파 단면에 패치 분할과 자료증강을 적용하였다. 또한 패치 분할로 손실될 수 있는 공간정보를 제공하기 위해 깊이를 고려할 수 있는 인공 채널을 생성하여 추가하였다. 실험을 위한 학습 모델로 U-Net을 사용하였으며, 층서 구분을 위한 학습 자료가 제공되는 F3 block 자료를 이용하여 학습과 예측 결과에 대한 평가를 수행하였다. 분석 결과 자료증강과 인공 채널의 추가로 패치 기반의 층서 구분 학습 모델을 개선할 수 있음을 확인하였다.

RIA 기반 DNA서열 분석도구의 설계 및 구현 (The Design and Implementation of RIA-Based DNA Sequence Analysis Tools)

  • 김명관;조충효
    • 한국인터넷방송통신학회논문지
    • /
    • 제9권2호
    • /
    • pp.29-36
    • /
    • 2009
  • 생명정보학 분야의 발전에 따라 방대한 양의 DNA서열 데이터를 효율적으로 분석하기 위해 분석도구가 사용되고 있다. 하지만 기존의 분석도구들은 분석하고자 하는 데이터를 찾고, 적용해야 하는 불편함이 있다. 본 논문에서는 이러한 문제점을 해결하기 위하여 웹2.0기반 RIA(Rich Internet Application) 방식으로 구현한 분석도구를 제안한다. RIA방식을 적용한 분석도구는 기존 웹 방식의 문제점을 보완한 웹2.0기반에서 DNA서열 데이터를 찾고, 실시간으로 분석내용을 보여준다. 개발된 웹 에플리케이션은 윈도우 시스템 상에서 Flex2를 이용하였다.

  • PDF

Detection of hydin Gene Duplication in Personal Genome Sequence Data

  • Kim, Jong-Il;Ju, Young-Seok;Kim, Shee-Hyun;Hong, Dong-Wan;Seo, Jeong-Sun
    • Genomics & Informatics
    • /
    • 제7권3호
    • /
    • pp.159-162
    • /
    • 2009
  • Human personal genome sequencing can be done with high efficiency by aligning a huge number of short reads derived from various next generation sequencing (NGS) technologies to the reference genome sequence. One of the major obstacles is the incompleteness of human reference genome. We tried to analyze the effect of hidden gene duplication on the NGS data using the known example of hydin gene. Hydin2, a duplicated copy of hydin on chromosome 16q22, has been recently found to be localized to chromosome 1q21, and is not included in the current version of standard human genome reference. We found that all of eight personal genome data published so far do not contain hydin2, and there is large number of nsSNPs in hydin. The heterozygosity of those nsSNPs was significantly higher than expected. The sequence coverage depth in hydin gene was about two fold of average depth. We believe that these unique finding of hydin can be used as useful indicators to discover new hidden multiplication in human genome.

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • 제3권2호
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Multihop Vehicle-to-Infrastructure Routing Based on the Prediction of Valid Vertices for Vehicular Ad Hoc Networks

  • Shrestha, Raj K.;Moh, Sangman;Chung, IlYong;Shin, Heewook
    • 대한임베디드공학회논문지
    • /
    • 제5권4호
    • /
    • pp.243-253
    • /
    • 2010
  • Multihop data delivery in vehicular ad hoc networks (VANETs) suffers from the fact that vehicles are highly mobile and inter-vehicle links are frequently disconnected. In such networks, for efficient multihop routing of road safety information (e.g. road accident and emergency message) to the area of interest, reliable communication and fast delivery with minimum delay are mandatory. In this paper, we propose a multihop vehicle-to-infrastructure routing protocol named Vertex-Based Predictive Greedy Routing (VPGR), which predicts a sequence of valid vertices (or junctions) from a source vehicle to fixed infrastructure (or a roadside unit) in the area of interest and, then, forwards data to the fixed infrastructure through the sequence of vertices in urban environments. The well known predictive directional greedy routing mechanism is used for data forwarding phase in VPGR. The proposed VPGR leverages the geographic position, velocity, direction and acceleration of vehicles for both the calculation of a sequence of valid vertices and the predictive directional greedy routing. Simulation results show significant performance improvement compared to conventional routing protocols in terms of packet delivery ratio, end-to-end delay and routing overhead.

Gene Duplications Revealed during the Process of SNP Discovery in Soybean[Glycine max(L.) Merr.]

  • Cai, Chun Mei;Van, Kyu-Jung;Lee, Suk-Ha
    • Journal of Crop Science and Biotechnology
    • /
    • 제10권4호
    • /
    • pp.237-242
    • /
    • 2007
  • Genome duplication(i.e. polyploidy) is a common phenomenon in the evolution of plants. The objective of this study was to achieve a comprehensive understanding of genome duplication for SNP discovery by Thymine/Adenine(TA) cloning for confirmation. Primer pairs were designed from 793 EST contigs expressed in the roots of a supernodulating soybean mutant and screened between 'Pureunkong' and 'Jinpumkong 2' by direct sequencing. Almost 27% of the primer sets were failed to obtain sequence data due to multiple bands on agarose gel or poor quality sequence data from a single band. TA cloning was able to identify duplicate genes and the paralogous sequences were coincident with the nonspecific peaks in direct sequencing. Our study confirmed that heterogeneous products by the co-amplification of a gene family member were the main cause of obtaining multiple bands or poor quality sequence data in direct sequencing. Counts of amplified bands on agarose gel and peaks of sequencing trace suggested that almost 27% of nonrepetitive soybean sequences were present in as many as four copies with an average of 2.33 duplications per segment. Copy numbers would be underestimated because of the presence of long intron between primer binding sites or mutation on priming site. Also, the copy numbers were not accurately estimated due to deletion or tandem duplication in the entire soybean genome.

  • PDF

고속 Burst 영상법 - pulse sequence 중심으로 (Fast Burt Imaging)

  • 강호경;노용만
    • Investigative Magnetic Resonance Imaging
    • /
    • 제3권1호
    • /
    • pp.13-19
    • /
    • 1999
  • MRI imaging provides many benefits such as noninvasive, 3-dimensional imaging capabilities. But it has relatively serious drawback that is the long data collection time, compared with other imaging modality. Many studies have been performed for fast MR imaging. But EPI and SEPI (4-6) are required to expensive hardware. In this paper, we introduce to Burst imaging technique. It can reduce imaging time by use of a mulitple RF excitation technique. Further it is easily implemented to the normal MRI system. But a pixel profile in the conventional burst sequence is so poor that excited area by burst sequence is a small portion of a pixel. This causes poor signal to noise ratio in burst image. therefore frequency sweeping of RF pulse for burst imaging sequence is proposed to improve pixel profile. A burst pulse train is shaped by liner or nonlinear frequency sweeping function so that all the spins within a pixel are excited, thereby improving the signal to noise ratio. It also shows that the pixel profiles are dependent on how frequency sweep is made. Computer simulations with Bloch equation and experimental results obtained using a 1.0 T NMR imaging system are presented.

  • PDF

DNA Chip Technologies

  • Hwang, Seoung-Yong;Lim, Geun-Bae
    • Biotechnology and Bioprocess Engineering:BBE
    • /
    • 제5권3호
    • /
    • pp.159-163
    • /
    • 2000
  • The genome sequencing project has generated and will contitute to generate enormous amounts of sequence data. Since the first complete genome sequence of bacterium Haemophilus in fluenzae was published in 1995, the complete genome sequences of 2 eukaryotic and about 22 prokaryotic organisms have detemined. Given this everincreasing amounts of sequence information, new strategies are necessary to efficiently pursue the phase of the geome project- the elucidation of gene expression patterns and gene product function on a whole genome scale. In order to assign functional information to the genome sequence, DNA chip technology was developed to efficienfly identify the differential expression pattern of indepondent biogical samples. DNA chip provides a new tool for genome expreesion analysis that may revolutionize revolutionize many aspects of human kife including mew surg discovery and human disease diagnostics.

  • PDF

SSR-Primer Generator: A Tool for Finding Simple Sequence Repeats and Designing SSR-Primers

  • Hong, Chang-Pyo;Choi, Su-Ryun;Lim, Yong-Pyo
    • Genomics & Informatics
    • /
    • 제9권4호
    • /
    • pp.189-193
    • /
    • 2011
  • Simple sequence repeats (SSRs) are ubiquitous short tandem duplications found within eukaryotic genomes. Their length variability and abundance throughout the genome has led them to be widely used as molecular markers for crop-breeding programs, facilitating the use of marker-assisted selection as well as estimation of genetic population structure. Here, we report a software application, "SSR-Primer Generator " for SSR discovery, SSR-primer design, and homology-based search of in silico amplicons from a DNA sequence dataset. On submission of multiple FASTA-format DNA sequences, those analyses are batch processed in a Java runtime environment (JRE) platform, in a pipeline, and the resulting data are visualized in HTML tabular format. This application will be a useful tool for reducing the time and costs associated with the development and application of SSR markers.

Optical Fiber Code-Division Multiple-Access Networks Using Concatenated Codes

  • Lam, Pham-Manh;Minh, Do-Quang
    • Journal of Communications and Networks
    • /
    • 제4권3호
    • /
    • pp.170-175
    • /
    • 2002
  • An optical fiber code-division multiple-access (CDMA) network is proposed in which encoding is based on the use of concatenated sequences of relatively large weight. The first short component sequence in the concatenated sequence permits realistic electronic encoding of each data bit. The chips of this sequence are then all-optically encoded at substantially higher rate. In spite of the relatively large weight of the sequence the all-optical encoder is practical by virtue of the shortness of the component sequences. The use of Gold and Lempel sequences as component sequences for generating the concatenated sequences is studied and the bit-error rate (BER) performance of the proposed system is presented as a function of the received optical power with the number of simultaneous users as parameter.