• 제목/요약/키워드: sequence data

검색결과 3,115건 처리시간 0.039초

사용자 검색 질의 단어의 순서 및 단어간의 인접 관계에 기반한 검색 기법의 구현 (Implementation of Search Method based on Sequence and Adjacency Relationship of User Query)

  • 소병철;정진우
    • 한국지능시스템학회논문지
    • /
    • 제21권6호
    • /
    • pp.724-729
    • /
    • 2011
  • 정보 검색은 다수 자료에서 사용자가 원하는 부분을 찾는 과정을 의미한다. 일반적으로 대규모 자료 집합의 관리를 위해서는 데이터베이스가 사용되는데 인터넷과 같은 복잡한 문서구조들이 공존하는 환경에서는 한 번에 사용자가 원하는 문서를 정확히 찾아내는 것이 어렵기 때문에, 문서에 순위를 부여하여 사용자에게 제시하는 방법이 일반적으로 많이 사용된다. 본 논문에서는 자료에 포함되어 있는 단어들을 단순히 검색하는 것 뿐만 아니라 단어들 간의 순서 및 인접성을 고려한 검색방법을 용어빈도-역문헌빈도 및 n-gram 기법을 응용하여 구현하였다. 그 결과 19,000개 이상의 다수 문서 집합에서 73%의 정확율로 보다 정확한 검색이 가능하게 되었다.

Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

  • Lim, Jong-Sung;Choi, Beom-Soon;Lee, Jeong-Soo;Shin, Chan-Seok;Yang, Tae-Jin;Rhee, Jae-Sung;Lee, Jae-Seong;Choi, Ik-Young
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.1-8
    • /
    • 2012
  • Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the nextgeneration DNA sequencer (NGS) Roche/454 and Illumina/ Solexa systems, along with bioinformation analysis technologies of whole-genome $de$ $novo$ assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing $de$ $novo$ assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least $2{\times}$ and $30{\times}$ depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive shortlength reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a wholegenome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through $de$ $novo$ assembly in any whole-genome sequenced species. The $20{\times}$ and $50{\times}$ coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average $30{\times}$ coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

DEA 기반 벤치마킹에서의 효율성 개선 경로 선정을 위한 최적화 접근법에 관한 연구 (An Optimization Approach to the Construction of a Sequence of Benchmark Targets in DEA-Based Benchmarking)

  • 박재훈;임성묵;배혜림
    • 대한산업공학회지
    • /
    • 제40권6호
    • /
    • pp.628-641
    • /
    • 2014
  • Stepwise efficiency improvement in data envelopment analysis (DEA)-based benchmarking is a realistic and effective method by which inefficient decision making units (DMUs) can choose benchmarks in a stepwise manner and, thereby, effect gradual performance improvement. Most of the previous research relevant to stepwise efficiency improvement has focused primarily on how to stratify DMUs into multiple layers and how to select immediate benchmark targets in leading levels for lagging-level DMUs. It can be said that the sequence of benchmark targets was constructed in a myopic way, which can limit its effectiveness. To address this issue, this paper proposes an optimization approach to the construction of a sequence of benchmarks in DEA-based benchmarking, wherein two optimization criteria are employed : similarity of input-output use patterns, and proximity of input-output use levels between DMUs. To illustrate the proposed method, we applied it to the benchmarking of 23 national universities in South Korea.

Proteomics Data Analysis using Representative Database

  • Kwon, Kyung-Hoon;Park, Gun-Wook;Kim, Jin-Young;Park, Young-Mok;Yoo, Jong-Shin
    • Bioinformatics and Biosystems
    • /
    • 제2권2호
    • /
    • pp.46-51
    • /
    • 2007
  • In the proteomics research using mass spectrometry, the protein database search gives the protein information from the peptide sequences that show the best match with the tandem mass spectra. The protein sequence database has been a powerful knowledgebase for this protein identification. However, as we accumulate the protein sequence information in the database, the database size gets to be huge. Now it becomes hard to consider all the protein sequences in the database search because it consumes much computing time. For the high-throughput analysis of the proteome, usually we have used the non-redundant refined database such as IPI human database of European Bioinformatics Institute. While the non-redundant database can supply the search result in high speed, it misses the variation of the protein sequences. In this study, we have concerned the proteomics data in the point of protein similarities and used the network analysis tool to build a new analysis method. This method will be able to save the computing time for the database search and keep the sequence variation to catch the modified peptides.

  • PDF

Common Due-Date Assignment and Scheduling with Sequence-Dependent Setup Times: a Case Study on a Paper Remanufacturing System

  • Kim, Jun-Gyu;Kim, Ji-Su;Lee, Dong-Ho
    • Management Science and Financial Engineering
    • /
    • 제18권1호
    • /
    • pp.1-12
    • /
    • 2012
  • In this paper, we report a case study on the common due-date assignment and scheduling problem in a paper remanufacturing system that produces corrugated cardboards using collected waste papers for a given set of orders under the make-to-order (MTO) environment. Since the system produces corrugated cardboards in an integrated process and has sequence-dependent setups, the problem considered here can be regarded as common due-date assignment and sequencing on a single machine with sequence-dependent setup times. The objective is to minimize the sum of the penalties associated with due-date assignment, earliness, and tardiness. In the study, the earliness and tardiness penalties were obtained from inventory holding and backorder costs, respectively. To solve the problem, we adopted two types of algorithms: (a) branch and bound algorithm that gives the optimal solutions; and (b) heuristic algorithms. Computational experiments were done on the data generated from the case and the results show that both types of algorithms work well for the case data. In particular, the branch and bound algorithm gave the optimal solutions quickly. However, it is recommended to use the heuristic algorithms for large-sized instances, especially when the solution time is very critical.

준난수 몬테칼로 방법을 이용한 다중자산 옵션 가격의 추정 (Application of quasi-Monte Carlo methods in multi-asset option pricing)

  • 모은비;박종선
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권4호
    • /
    • pp.669-677
    • /
    • 2013
  • 본 연구에서는 다중자산 옵션 가격의 추정에 있어 자산의 수, 상관계수, 자산의 값들과 표준편차의 여러 조합에 대한 시뮬레이션을 통하여 저불일치 수열에 따르는 준난수 몬테칼로 방법들을 비교하였다. 결과적으로 준난수와 모로 역변환을 이용하는 것이 기본적인 몬테칼로 방법보다 정확하였으며 자산의 수와 관계없이 준난수 방법들 중 혼합법들이 더욱 효과적임을 알 수 있었다.

Generalized Self Spread-Spectrum Communications with Turbo Soft Despreading and Decoding

  • Tomasin Stefano;Veronesi Daniele
    • Journal of Communications and Networks
    • /
    • 제8권3호
    • /
    • pp.267-274
    • /
    • 2006
  • Self-spreading (SSP) is a spread spectrum technique where the spreading sequence is generated from data bits. Although SSP allows communications with low probability of interception by unintended receivers, despreading by the intended receiver is prone to error propagation. In this paper, we propose both a new transmitter and a new receiver based on SSP with the aim to a) reduce error propagation and b) increase the concealment of the transmission. We first describe a new technique for the generation of SSP spreading sequence, which generalizes SSPs of existing literature. We include also coding at the transmitter, in order to further reduce the effects of error propagation at the receiver. For the receiver, we propose a turbo architecture based on the exchange of information between a soft despreader and a soft-input soft-output decoder. We design the despreader in order to fully exploit the information provided by the decoder. Lastly, we propose a chip decoder that extracts the information on data bits contained in the spreading sequence from the received signal. The performance of the proposed scheme is evaluated and compared with existing spread-spectrum systems.

직교 시퀀스를 이용한 양자통신에서의 효율적인 신호 검출 기법 (Efficient Signal Detection Technique Using Orthogonal Sequence for Quantum Communication)

  • 김윤현;김진영
    • 한국위성정보통신학회논문지
    • /
    • 제7권1호
    • /
    • pp.21-26
    • /
    • 2012
  • 우리나라는 지난 20여 년 디지털 정보기술 강국을 지향해 왔지만 선진국에서 이미 투자를 시작한 양자 정보 과학 분야에 대한 연구 및 투자는 거의 이루어지지 않았으며, 양자 정보 통신 기술의 수준 또한 개발 선진국들에 비해 턱없이 부족한 상황이다. 최근, 양자역학에 기반을 두고 있는 양자 정보 처리 및 통신에 대한 연구가 세계적으로 활발히 진행 중이다. 90년대부터 본격화된 양자정보이론의 연구는 양자 컴퓨팅, 양자 통신, 양자 정보이론 등의 분야에서 발전해오고 있으며, 90년대 말에 이르러 양자 암호 통신 및 양자 알고리즘 등의 분야에서 큰 연구 성과를 나타내기 시작하였다. 본 논문에서는, 양자 통신 시스템에서 효율적인 양자 신호 전송 및 검출을 위해 직교 시퀀스를 이용한 효율적인 양자 신호 검출 방안에 대해 논하고자 한다.

Analysis of the Genetic Relationship among Mulberry (Morus spp.) Cultivars Using Inter-Simple Sequence Repeat (ISSR) Markers

  • Park, Eun-Ju;Kang, Min-Uk;Choi, Myoung-Seob;Sung, Gyoo-Byung;Nho, Si-Kab
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • 제41권2호
    • /
    • pp.56-62
    • /
    • 2020
  • Mulberry (Morus spp. family: Moraceae) has prime importance in the sericulture industry, and its foliage is the only natural feed of the silkworm Bombyx mori L. Traditional classification methods using morphological traits were largely unsuccessful in assessing the diversity and relationships among different mulberry species because of environmental influences on the traits of interest. For these reasons, it is difficult to differentiate between the varieties and cultivars of Morus spp. In the present study, inter-simple sequence repeat (ISSR) markers were used to investigate the genetic diversity of 48 mulberry samples genotyped using nine ISSR primers. The ISSR markers exhibited polymorphisms (53.2%) among mulberry genotypes. Furthermore, similarity coefficient estimated for these ISSR markers was found to vary between 0.67 and 0.99 for the combined pooled data. The phenogram drawn using the UPGMA cluster method based on combined pooled data of the ISSR markers divided the 48 mulberry genotypes into seven major groups. No genetic association was found in the collection area, and there was a mixed pattern between the mulberry lines. The hybridization between different mulberry species is highly likely to be homogenized due to natural hybridization.

Mitochondrial sequence based characterization and morphometric assessment of Diara buffalo population

  • Singh, Karan Veer;Purohit, Hitesh;Singh, Ramesh Kumar
    • Animal Bioscience
    • /
    • 제35권7호
    • /
    • pp.949-954
    • /
    • 2022
  • Objective: The present study is aimed at phenotypic characterization and mitochondrial d-loop analysis of indigenous "Diara" buffalo population, which are mostly confined to the villages on the South and North Gangetic marshy plains in the Bihar state of India. These buffaloes are well adapted and are best suited for ploughing and puddling the wet fields meant for paddy cultivation. Methods: Biometric data on 172 buffaloes were collected using a standard flexible tape measure. Animals are medium in size; the typical morphometric features are long head with a broad forehead and moderately long and erect ears. Genomic DNA was isolated from unrelated animals. The mtDNA d-loop 358-bp sequence data was generated and compared with 338 sequences belonging to riverine and swamp buffaloes. Results: Based on the mitochondrial d-loop analysis the Diara buffaloes were grouped along with the haplotypes reported for riverine buffalo. Sequence analysis revealed the presence of 7 mitochondrial D loop haplotypes with haplotype diversity of 0.9643. Five of the haplotypes were shared with established swamp breeds and with Buffalo population of Orissa in India. Conclusion: Morphometric analyses clearly shows distinguishing features like long and broad forehead which may be useful in identification. The germplasm of Diara buffalo is much adapted to the marshy banks of river Ganga and its tributaries. It constitutes a valuable genetic resource which needs to be conserved on priority basis.