DOI QR코드

DOI QR Code

A Comprehensive Review of Emerging Computational Methods for Gene Identification

  • Yu, Ning (Dept. of Computer Science, Georgia State University) ;
  • Yu, Zeng (Dept. of Computer Science, Georgia State University) ;
  • Li, Bing (Dept. of Computer Science, Georgia State University) ;
  • Gu, Feng (Dept. of Computer Science, College of Staten Island, City University of New York) ;
  • Pan, Yi (Dept. of Computer Science, Georgia State University)
  • Received : 2016.01.29
  • Accepted : 2016.02.23
  • Published : 2016.03.31

Abstract

Gene identification is at the center of genomic studies. Although the first phase of the Encyclopedia of DNA Elements (ENCODE) project has been claimed to be complete, the annotation of the functional elements is far from being so. Computational methods in gene identification continue to play important roles in this area and other relevant issues. So far, a lot of work has been performed on this area, and a plethora of computational methods and avenues have been developed. Many review papers have summarized these methods and other related work. However, most of them focus on the methodologies from a particular aspect or perspective. Different from these existing bodies of research, this paper aims to comprehensively summarize the mainstream computational methods in gene identification and tries to provide a short but concise technical reference for future studies. Moreover, this review sheds light on the emerging trends and cutting-edge techniques that are believed to be capable of leading the research on this field in the future.

Keywords

References

  1. W. Klimke, C. O'Donovan, O. White, J. R. Brister, K. Clark, B. Fedoro, and T. Tatusova, "Solving the problem: genome annotation standards before the data deluge," Standards in Genomic Sciences, vol. 5, no. 1, pp. 168-193, 2011. https://doi.org/10.4056/sigs.2084864
  2. ENCODE Project Consortium, "An integrated encyclopedia of DNA elements in the human genome," Nature, vol. 489, no. 7414, pp. 57-74, 2012. https://doi.org/10.1038/nature11247
  3. S. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, et al., "Landscape of transcription in human cells," Nature, vol. 489, no. 7414, pp. 101-108, 2012. https://doi.org/10.1038/nature11233
  4. J. Harrow, A. Nagy, A. Reymond, T. Alioto, L. Patthy, S. Antonarakis, and R. Guigo, "Identifying protein-coding genes in genomic sequences," Genome Biology, vol. 10, no. 1, article ID. 201, 2009.
  5. M. Hiller, B. T. Schaar, and G. Bejerano, "Hundreds of conserved noncoding genomic regions are independently lost in mammals," Nucleic Acids Research, vol. 40, no. 22, pp. 11463-11476, 2012. https://doi.org/10.1093/nar/gks905
  6. M. E. Dinger, K. C. Pang, T. R. Mercer, and J. S. Mattick, "Differentiating protein-coding and noncoding RNA: challenges and ambiguities," PLoS Computational Biology, vol. 4, no. 11, article ID. e1000176, 2008.
  7. J. W. Fickett, "Finding genes by computer: the state of the art," Trends in Genetics, vol. 12, no. 8, pp. 316-320, 1996. https://doi.org/10.1016/0168-9525(96)10038-X
  8. C. Mathe, M. F. Sagot, T. Schiex, and P. Rouze, "Current methods of gene prediction, their strengths and weaknesses," Nucleic Acids Research, vol. 30, no. 19, pp. 4103-4117, 2002. https://doi.org/10.1093/nar/gkf543
  9. R. She, "Fast and accurate gene prediction by protein homology," Ph.D. dissertation, Simon Fraser University, Burnaby, British Columbia, Canada, 2010.
  10. N. Goel, S. Singh, and T. C. Aseri, "A review of soft computing techniques for gene prediction," ISRN Genomics, vol. 2013, article ID. 191206, 2013.
  11. C. Yang, E. Bolotin, T. Jiang, F. M. Sladek, and E. Martinez, "Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters," Gene, vol. 389, no. 1, pp. 52-65, 2007. https://doi.org/10.1016/j.gene.2006.09.029
  12. P. Bucher, "Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences," Journal of Molecular Biology, vol. 212, no. 4, pp. 563-578, 1990. https://doi.org/10.1016/0022-2836(90)90223-9
  13. M. Q. Zhang, "Computational prediction of eukaryotic protein-coding genes," Nature Reviews Genetics, vol. 3, no. 9, pp. 698-709, 2002. https://doi.org/10.1038/nrg890
  14. C. Trapnell, L. Pachter, and S. L. Salzberg, "TopHat: discovering splice junctions with RNA-seq," Bioinformatics, vol. 25, no. 9, pp. 1105-1111, 2009. https://doi.org/10.1093/bioinformatics/btp120
  15. M. Akhtar, J. Epps, and E. Ambikairajah, "Signal processing in sequence analysis: advances in eukaryotic gene prediction," IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 3, pp. 310-321, 2008. https://doi.org/10.1109/JSTSP.2008.923854
  16. J. W. Fickett, "Recognition of protein coding regions in DNA sequences," Nucleic Acids Research, vol. 10, no. 17, pp. 5303-5318, 1982. https://doi.org/10.1093/nar/10.17.5303
  17. D. Kotlar and Y. Lavner, "Gene prediction by spectral rotation measure: a new method for identifying proteincoding regions," Genome Research, vol. 13, no. 8, pp. 1930-1937, 2003. https://doi.org/10.1101/gr.1261703
  18. N. Yu, X. Guo, F. Gu, and Y. Pan, "DNA AS X: an information-coding based model to improve the sensitivity in comparative gene analysis," in Proceedings of the 11th International Symposium on Bioinformatics Research and Applications, Norfolk, VA, 2015, pp. 366-377.
  19. R. F. Voss, "Evolution of long-range fractal correlations and 1/f noise in DNA base sequences," Physical Review Letters, vol. 68, no. 25, pp. 3805-3808, 1992. https://doi.org/10.1103/PhysRevLett.68.3805
  20. I. Cosic, "Macromolecular bioactivity: is it resonant interaction between macromolecules? Theory and applications," IEEE Transactions on Biomedical Engineering, vol. 41, no. 12, pp. 1101-1114, 1994. https://doi.org/10.1109/10.335859
  21. H. K. Kwan and S. Arniker, "Numerical representation of DNA sequences," in Proceedings of IEEE International Conference on Electro/Information Technology (eit'09), Windsor, ON, 2009, pp. 307-310.
  22. B. D. Silverman and R. Linsker, "A measure of DNA periodicity," Journal of Theoretical Biology, vol. 118, no. 3, pp. 295-300, 1986. https://doi.org/10.1016/S0022-5193(86)80060-1
  23. S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy, "Prediction of probable genes by fourier analysis of genomic sequences," Computer Applications in the Biosciences (CABIOS), vol. 13, no. 3, pp. 263-270, 1997.
  24. D. Anastassiou, "Frequency-domain analysis of biomolecular sequences," Bioinformatics, vol. 16, no. 12, pp. 1073-1081, 2000. https://doi.org/10.1093/bioinformatics/16.12.1073
  25. N. Rao and S. Shepherd, "Detection of 3-periodicity for small genomic sequences based on AR technique," in Proceedings of 2004 International Conference on Communications, Circuits and Systems (ICCCAS2004), Cheongdu, China, 2004, pp. 1032-1036.
  26. G. Liu and Y. Luan, "Identification of protein coding regions in the eukaryotic DNA sequences based on marple algorithm and wavelet packets transform," Abstract and Applied Analysis, vol. 2014, article ID. 402567, 2014.
  27. G. Zhang and G. Zhou, "The Marple algorithm for the autoregressive spectral estimates of the SMMW Fourier transform spectroscopy data," International Journal of Infrared and Millimeter Waves, vol. 10, no. 2, pp. 257-267, 1989. https://doi.org/10.1007/BF01011241
  28. I. Barrodale, L. M. Delves, R. E. Erickson, and C. A. Zala, "Computational experience with Marple's algorithm for autoregressive spectrum analysis," Geophysics, vol. 48, no. 9, pp. 1274-1286, 1983. https://doi.org/10.1190/1.1441551
  29. O. Abbasi, A. Rostami, and G. Karimian, "Identification of exonic regions in DNA sequences using crosscorrelation and noise suppression by discrete wavelet transform," BMC Bioinformatics, vol. 12, article ID. 430, 2011.
  30. S. Deng, L. Yuan, K. Feng, G. Ding, and Y. Li, "A new approach for identifying protein-coding regions by combining chirp z and wavelet transform," Current Bioinformatics, vol. 8, no. 5, pp. 557-563, 2013. https://doi.org/10.2174/1574893611308050006
  31. H. K. Kwan, R. Atwal, and B. Y. M. Kwan, "Wavelet analysis of DNA sequences," in Proceedings of International Conference on Communications, Circuits and Systems (ICCCAS2008), Fujian, China, 2008, pp. 816-820.
  32. E. Ambikairajah, J. Epps, and M. Akhtar, "Gene and exon prediction using time domain algorithms," in Proceedings of the 8th International Symposium on Signal Processing and Its Applications (ISSPA2005), Sydney, Australia, 2005, pp. 199-202.
  33. M. Akhtar, J. Epps, and E. Ambikairajah, "Time and frequency domain methods for gene and exon prediction in eukaryotes," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2007), Honolulu, HI, 2007, pp. 573-576.
  34. M. Roy and S. Barman, "Effective gene prediction by high resolution frequency estimator based on least-norm solution technique," EURASIP Journal on Bioinformatics and Systems Biology, vol. 2014, no. 1, pp. 1-13, 2014. https://doi.org/10.1186/1687-4153-2014-1
  35. S. S. Sahu and G. Panda, "Identification of protein-coding regions in DNA sequences using a time-frequency filtering approach," Genomics, Proteomics & Bioinformatics, vol. 9, no. 1-2, pp. 45-55, 2011. https://doi.org/10.1016/S1672-0229(11)60007-7
  36. S. Deng, Y. Shi, L. Yuan, Y. Li, and G. Ding, "Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics," BMC Genomics, vol. 13, no. Suppl 8, article ID. S19, 2012. https://doi.org/10.1186/1471-2164-13-S6-S19
  37. S. Mereuta and V. Munteanu, "A new information theoretic approach to exon-intron classification," in Proceedings of International Symposium on Signals, Circuits and Systems (ISSCS2007), Iasi, Romania, 2007, pp. 1-4.
  38. W. Zhu, A. Lomsadze, and M. Borodovsky, "Ab initio gene identification in metagenomic sequences," Nucleic Acids Research, vol. 38, no. 12, article ID. e132, 2010.
  39. M. Borodovsky and J. McIninch, "Genmark: parallel gene recognition for both DNA strands," Computers & Chemistry, vol. 17, no. 2, pp. 123-133, 1993. https://doi.org/10.1016/0098-1354(93)85018-H
  40. C. Burge and S. Karlin, "Prediction of complete gene structures in human genomic DNA," Journal of Molecular Biology, vol. 268, no. 1, pp. 78-94, 1997. https://doi.org/10.1006/jmbi.1997.0951
  41. A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky, "Gene identification in novel eukaryotic genomes by self-training algorithm," Nucleic Acids Research, vol. 33, no. 20, pp. 6494-6506, 2005. https://doi.org/10.1093/nar/gki937
  42. D. Kulp, D. Haussler, M. G. Reese, and F. H. Eeckman, "A generalized hidden Markov model for the recognition of human genes in DNA," in Proceeding of the 4th International Conference on Intelligent Systems for Molecular Biology, St. Louis, MO, 1996, pp. 134-142.
  43. L. R. Rabiner, "A tutorial on hidden markov models and selected applications in speech recognition," in Readings in Speech Recognition, A. Waibel and K. F. Lee, Eds. San Francisco, CA: Morgan Kaufmann Publishers, 1990, pp. 267-296.
  44. D. Sankoff, "Efficient optimal decomposition of a sequence into disjoint regions, each matched to some template in an inventory," Mathematical Biosciences, vol. 111, no. 2, pp. 279-293, 1992. https://doi.org/10.1016/0025-5564(92)90075-8
  45. A. J. Viterbi, "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," IEEE Transactions on Information Theory, vol. 13, no. 2, pp. 260-269, 1967. https://doi.org/10.1109/TIT.1967.1054010
  46. V. Ter-Hovhannisyan, A. Lomsadze, Y. O. Chernoff, and M. Borodovsky, "Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training," Genome Research, vol. 18, no. 12, p. 1979- 1990, 2008. https://doi.org/10.1101/gr.081612.108
  47. A. Lomsadze, P. D. Burns, and M. Borodovsky, "Integration of mapped RNA-seq reads into automatic training of eukaryotic gene finding algorithm," Nucleic Acids Research, vol. 42, no. 15, article ID. e119, 2014.
  48. R. Staden, "Computer methods to locate signals in nucleic acid sequences," Nucleic Acids Research, vol. 12, no. 1 (Pt 2), pp. 505-519, 1984. https://doi.org/10.1093/nar/12.1Part2.505
  49. R. Guigo, S. Knudsen, N. Drake, and T. Smith, "Prediction of gene structure," Journal of Molecular Biology, vol. 226, no. 1, pp. 141-157, 1992. https://doi.org/10.1016/0022-2836(92)90130-C
  50. E. E. Snyder and G. D. Stormo, "Identification of protein coding regions in genomic DNA," Journal of Molecular Biology, vol. 248, no. 1, pp. 1-18, 1995. https://doi.org/10.1006/jmbi.1995.0198
  51. M. Q. Zhang and T. G. Marr, "A weight array method for splicing signal analysis," Computer applications in the Biosciences (CABIOS), vol. 9, no. 5, pp. 499-509, 1993.
  52. J. Henderson, S. Salzberg, and K. H. Fasman, "Finding genes in DNA with a hidden Markov model," Journal of Computational Biology, vol. 4, no. 2, pp. 127-141, 1997. https://doi.org/10.1089/cmb.1997.4.127
  53. I. Korf, P. Flicek, D. Duan, and M. R. Brent, "Integrating genomic homology into gene structure prediction," Bioinformatics, vol. 17, no. Suppl 1, pp. S140-S148, 2001. https://doi.org/10.1093/bioinformatics/17.suppl_1.S140
  54. J. Wu and D. Haussler, "Coding exon detection using comparative sequences," Journal of Computational Biology, vol. 13, no. 6, pp. 1148-1164, 2006. https://doi.org/10.1089/cmb.2006.13.1148
  55. W. H. Majoros, M. Pertea, and S. L. Salzberg, "TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders," Bioinformatics, vol. 20, no. 16, pp. 2878-2879, 2004. https://doi.org/10.1093/bioinformatics/bth315
  56. E. C. Uberbacher and R. J. Mural, "Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach," Proceedings of the National Academy of Sciences, vol. 88, no. 24, pp. 11261- 11265, 1991.
  57. R. Ranawana and V. Palade, "A neural network based multi-classifier system for gene identification in DNA sequences," Neural Computing & Applications, vol. 14, no. 2, pp. 122-131, 2005. https://doi.org/10.1007/s00521-004-0447-7
  58. Y. Xu, J. R. Einstein, R. Mural, M. Shah, and E. C. Uberbacher, "An improved system for exon recognition and gene modeling in human DNA sequences," in Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology, San Francisco, CA, 1994, pp. 376-384.
  59. L. Roberts, N. Steele, C. Reeves, and G. King, "Training neural networks to identify coding regions in genomic DNA," in Proceedings of the 4th International Conference on Artificial Neural Networks, Cambridge, UK, 1995, pp. 399-403.
  60. E. E. Snyder and G. D. Stormo, "Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks." Nucleic Acids Research, vol. 21, no. 3, p. 607-613, 1993. https://doi.org/10.1093/nar/21.3.607
  61. Y. Xu, R. Mural, J. Einstein, M. Shah, and E. Uberbacher, "GRAIL: a multi-agent neural network system for gene identification," Proceedings of the IEEE, vol. 84, no. 10, pp. 1544-1552, 1996.
  62. J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley, 1991.
  63. C. Li, P. He, and J. Wang, "Artificial neural network method for predicting protein-coding genes in the yeast genome," Internet Electronic Journal of Molecular Design, vol. 2, pp. 527-538, 2003.
  64. M. K. K. Leung, H. Y. Xiong, L. J. Lee, and B. J. Frey, "Deep learning of the tissue-regulated splicing code," Bioinformatics, vol. 30, no. 12, pp. i121-i129, 2014. https://doi.org/10.1093/bioinformatics/btu277
  65. Y. Bengio, A. Courville, and P. Vincent, "Representation learning: a review and new perspectives," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, 2013. https://doi.org/10.1109/TPAMI.2013.50
  66. G. Hinton, P. Dayan, B. Frey, and R. Neal, "The 'wake-sleep' algorithm for unsupervised neural networks," Science, vol. 268, no. 5214, pp. 1158-1161, 1995. https://doi.org/10.1126/science.7761831
  67. G. E. Hintonemail, "Learning multiple layers of representation," Trends in Cognitive Sciences, vol. 11, no. 10, pp. 428-434, 2007. https://doi.org/10.1016/j.tics.2007.09.004
  68. L. Deng, G. Hinton, and B. Kingsbury, "New types of deep neural network learning for speech recognition and related applications: an overview," in Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, 2013, pp. 8599-8603.
  69. P. Di Lena, K. Nagata, and P. Baldi, "Deep architectures for protein contact map prediction," Bioinformatics, vol. 28, no. 19, pp. 2449-2457, 2012. https://doi.org/10.1093/bioinformatics/bts475
  70. J. Eickholt and J. Cheng, "Predicting protein residue-residue contacts using deep networks and boosting," Bioinformatics, vol. 28, no. 23, pp. 3066-3072, 2012. https://doi.org/10.1093/bioinformatics/bts598
  71. A. Ben-Hur, C. S. Ong, S. Sonnenburg, B. Scholkopf, and G. Ratsch, "Support vector machines and kernels for computational biology," PLoS Computational Biology, vol. 4, no. 10, article ID. e1000173, 2008.
  72. A. Zien, G. Rätsch, S. Mika, B. Schölkopf, T. Lengauer, and K. R. Muller, "Engineering support vector machine kernels that recognize translation initiation sites," Bioinformatics, vol. 16, no. 9, pp. 799-807, 2000. https://doi.org/10.1093/bioinformatics/16.9.799
  73. S. Sonnenburg, A. Zien, and G. Ratsch, "ARTS: accurate recognition of transcription starts in human," Bioinformatics, vol. 22, no. 14, pp. e472-e480, 2006. https://doi.org/10.1093/bioinformatics/btl250
  74. S. Sonnenburg, G. Schweikert, P. Philips, J. Behr, and G. Ratsch, "Accurate splice site prediction using support vector machines," BMC Bioinformatics, vol. 8, no. Suppl 10, article ID. S7, 2007.
  75. H. Liu, H. Han, J. Li, and L. Wong, "An in-silico method for prediction of polyadenylation signals in human sequences," Genome Informatics, vol. 14, pp. 84-93, 2003.
  76. B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2002.
  77. G. Ratsch and S. Sonnenburg, "Large scale hidden semi-Markov SVMs," in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2007, pp. 1161-1168.
  78. C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, no. 3, pp. 273-297, 1995. https://doi.org/10.1007/BF00994018
  79. C. Yu, M. Deng, L. Zheng, R. L. He, J. Yang, and S. S. T. Yau, "DFA7, a new method to distinguish between intron-containing and intronless genes," PLoS ONE, vol. 9, no. 7, article ID. e101363, 2014.
  80. Y. Liu, J. Guo, G. Hu, and H. Zhu, "Gene prediction in metagenomic fragments based on the SVM algorithm," BMC Bioinformatics, vol. 14, no. Suppl 5, article ID. S12, 2013.
  81. C. Leslie, E. Eskin, and W. S. Noble, "The spectrum kernel: a string kernel for SVM protein classification," Pacific Symposium on Biocomputing, vol. 7, pp. 564-575, 2002.
  82. G. Ratsch, S. Sonnenburg, and B. Scholkopf, "RASE: recognition of alternatively spliced exons in C. elegans," Bioinformatics, vol. 21, no. Suppl 1, pp. i369-i377, 2005. https://doi.org/10.1093/bioinformatics/bti1053
  83. S. Sonnenburg, G. Rätsch, C. Schafer, and B. Scholkopf, "Large scale multiple kernel learning," Journal of Machine Learning Research, vol. 7, pp. 1531-1565, 2006.
  84. C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, "Mismatch string kernels for discriminative protein classification," Bioinformatics, vol. 20, no. 4, pp. 467-476, 2004. https://doi.org/10.1093/bioinformatics/btg431
  85. P. Meinicke, M. Tech, B. Morgenstern, and R. Merkl, "Oligo kernels for data mining on biological sequences: a case study on prokaryotic translation initiation sites," BMC Bioinformatics, vol. 5, article ID. 169, 2004.
  86. D. Haussler, "Convolution kernels on discrete structures," University of California at Santa Cruz, CA, Technical Report UCS-CRL-99-10, 1999.
  87. L. Sun, H. Luo, D. Bu, G. Zhao, K. Yu, C. Zhang, Y. Liu, R. Chen, and Y. Zhao, "Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts," Nucleic Acids Research, vol. 41, no. 17, article ID. e166, 2013.
  88. L. Liao and W. S. Noble, "Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships," Journal of Computational Biology, vol. 10, no. 6, pp. 857-868, 2003. https://doi.org/10.1089/106652703322756113
  89. H. Saigo, J. P. Vert, N. Ueda, and T. Akutsu, "Protein homology detection using string alignment kernels," Bioinformatics, vol. 20, no. 11, pp. 1682-1689, 2004. https://doi.org/10.1093/bioinformatics/bth141
  90. J. Vert, H. Saigo, and T. Akutsu, "Local alignment kernels for biological sequences," in Kernel Methods in Computational Biology, B. Scholkopf, K. Tsuda, and J. P. Vert, Eds. Cambridge, MA: MIT Press, 2004, pp. 131- 154.
  91. K. Tsuda, M. Kawanabe, G. Rtsch, S. Sonnenburg, and K. R. Muller, "A new discriminative kernel from probabilistic models," Neural Computation, vol. 14, no. 10, pp. 2397-2414, 2002. https://doi.org/10.1162/08997660260293274
  92. M. Seeger, "Covariance kernels from Bayesian generative models," in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2002, pp. 905-912.
  93. K. Tsuda, T. Kin, and K. Asai, "Marginalized kernels for biological sequences," Bioinformatics, vol. 18, no. Suppl 1, pp. S268-S275, 2002. https://doi.org/10.1093/bioinformatics/18.suppl_1.S268
  94. G. Schweikert, A. Zien, G. Zeller, J. Behr, C. Dieterich, C. S. Ong, et al., "mGENE: accurate svm-based gene finding with an application to nematode genomes," Genome Research, vol. 19, no. 11, pp. 2133-2143, 2009. https://doi.org/10.1101/gr.090597.108
  95. U. Kamath, K. De Jong, and A. Shehu, "Effective automated feature construction and selection for classification of biological sequences," PLoS ONE, vol. 9, no. 7, article ID. e99982, 2014.
  96. R. Zhang and C. T. Zhang, "Z curves, an intuitive tool for visualizing and analyzing the DNA sequences," Journal of Biomolecular Structure and Dynamics, vol. 11, no. 4, pp. 767-782, 1994. https://doi.org/10.1080/07391102.1994.10508031
  97. S. Schwartz, W. J. Kent, A. Smit, Z. Zhang, R. Baertsch, R. C. Hardison, D. Haussler, and W. Miller, "Humanmouse alignments with BLASTZ," Genome Research, vol. 13, no. 1, pp. 103-107, 2003. https://doi.org/10.1101/gr.809403
  98. S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997. https://doi.org/10.1093/nar/25.17.3389
  99. B. Ma, J. Tromp, and M. Li, "PatternHunter: faster and more sensitive homology search," Bioinformatics, vol. 18, no. 3, pp. 440-445, 2002. https://doi.org/10.1093/bioinformatics/18.3.440
  100. M. Chaisson and G. Tesler, "Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory," BMC Bioinformatics, vol. 13, no. 1, article ID. 238, 2012.
  101. T. Wiehe, S. Gebauer-Jung, T. Mitchell-Olds, and R. Guigo, "SGP-1: prediction and validation of homologous genes based on sequence alignments," Genome Research, vol. 11, no. 9, pp. 1574-1583, 2001. https://doi.org/10.1101/gr.177401
  102. R. Guigo, E. T. Dermitzakis, P. Agarwal, C. P. Ponting, G. Parra, A. Reymond, et al., "Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes," Proceedings of the National Academy of Sciences, vol. 100, no. 3, pp. 1140-1145, 2003.
  103. S. Batzoglou, L. Pachter, J. P. Mesirov, B. Berger, and E. S. Lander, "Human and mouse gene structure: comparative analysis and application to exon prediction," Genome Research, vol. 10, no. 7, pp. 950-958, 2000. https://doi.org/10.1101/gr.10.7.950
  104. S. Kurtz, A. Phillippy, A. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S. Salzberg, "Versatile and open software for comparing large genomes," Genome Biology, vol. 5, no. 2, article ID. R12, 2004.
  105. R. A. Cartwright, "Ngila: global pairwise alignments with logarithmic and affine gap costs," Bioinformatics, vol. 23, no. 11, pp. 1427-1428, 2007. https://doi.org/10.1093/bioinformatics/btm095
  106. V. Bafna and D. H. Huson, "The conserved exon method for gene finding," in Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, La Jolla, CA, 2000, pp. 3-12.
  107. P. S. Novichkov, M. S. Gelfand, and A. A. Mironov, "Gene recognition in eukaryotic DNA by comparison of genomic sequences," Bioinformatics, vol. 17, no. 11, pp. 1011-1018, 2001. https://doi.org/10.1093/bioinformatics/17.11.1011
  108. P. Blayo, P. Rouzé, and M. F. Sagot, "Orphan gene finding: an exon assembly approach," Theoretical Computer Science, vol. 290, no. 3, pp. 1407-1431, 2003. https://doi.org/10.1016/S0304-3975(02)00043-9
  109. S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman, "Basic local alignment search tool," Journal of Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990. https://doi.org/10.1016/S0022-2836(05)80360-2
  110. X. Huang, M. D. Adams, H. Zhou, and A. R. Kerlavage, "A tool for analyzing and annotating genomic sequences," Genomics, vol. 46, no. 1, pp. 37-45, 1997. https://doi.org/10.1006/geno.1997.4984
  111. L. Florea, G. Hartzell, Z. Zhang, G. M. Rubin, and W. Miller, "A computer program for aligning a cDNA sequence with a genomic DNA sequence," Genome Research, vol. 8, no. 9, pp. 967-974, 1998.
  112. S. J. Wheelan, D. M. Church, and J. M. Ostell, "Spidey: a tool for mRNA-to-genomic alignments," Genome Research, vol. 11, no. 11, pp. 1952-1957, 2001. https://doi.org/10.1101/gr.195301
  113. Y. Fukunishi, H. Suzuki, M. Yoshino, H. Konno, and Y. Hayashizaki, "Prediction of human cDNA from its homologous mouse full-length cDNA and human shotgun database," FEBS Letters, vol. 464, no. 3, pp. 129- 132, 1999. https://doi.org/10.1016/S0014-5793(99)01696-8
  114. J. Jiang and H. J. Jacob, "EbEST: an automated tool using expressed sequence tags to delineate gene structure," Genome Research, vol. 8, no. 3, pp. 268-275, 1998.
  115. R. Mott, "EST-GENOME: a program to align spliced DNA sequences to unspliced genomic DNA," Computer Applications in the Biosciences (CABIOS), vol. 13, no. 4, pp. 477-478, 1997.
  116. Z. Kan, E. C. Rouchka, W. R. Gish, and D. J. States, "Gene structure prediction and alternative splicing analysis using genomically aligned ESTs," Genome Research, vol. 11, no. 5, pp. 889-900, 2001. https://doi.org/10.1101/gr.155001
  117. X. J. Min, G. Butler, R. Storms, and A. Tsang, "OrfPredictor: predicting protein-coding regions in EST-derived sequences," Nucleic Acids Research, vol. 33, no. Suppl 2, pp. W677-W680, 2005. https://doi.org/10.1093/nar/gki394
  118. M. L. Metzker, "Sequencing technologies the next generation," Nature Reviews Genetics, vol. 11, no. 1, pp. 31- 46, 2010. https://doi.org/10.1038/nrg2626
  119. O. Keller, M. Kollmar, M. Stanke, and S. Waack, "A novel hybrid gene prediction method employing protein multiple sequence alignments," Bioinformatics, vol. 27, no. 6, pp. 757-763, 2011. https://doi.org/10.1093/bioinformatics/btr010
  120. S. Washietl, S. Findeiss, S. A. Müller, S. Kalkhof, M. von Bergen, I. L. Hofacker, P. F. Stadler, and N. Goldman, "RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data," RNA, vol. 17, no. 4, p. 578-594, 2011. https://doi.org/10.1261/rna.2536111
  121. L. Wang, H. J. Park, S. Dasari, S. Wang, J. P. Kocher, and W. Li, "CPAT: coding-potential assessment tool using an alignment-free logistic regression model," Nucleic Acids Research, vol. 41, no. 6, article ID. e74, 2013.
  122. W. Trimble, K. Keegan, M. D'Souza, A. Wilke, J. Wilkening, J. Gilbert, and F. Meyer, "Short-read readingframe predictors are not created equal: sequence error causes loss of signal," BMC Bioinformatics, vol. 13, no. 1, article ID. 183, 2012.
  123. N. E. Castellana, S. H. Payne, Z. Shen, M. Stanke, V. Bafna, and S. P. Briggs, "Discovery and revision of arabidopsis genes by proteogenomics," Proceedings of the National Academy of Sciences, vol. 105, no. 52, pp. 21034-21038, 2008.
  124. J. Usuka and V. Brendel, "Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring," Journal of Molecular Biology, vol. 297, no. 5, pp. 1075-1085, 2000. https://doi.org/10.1006/jmbi.2000.3641
  125. E. Birney, M. Clamp, and R. Durbin, "GeneWise and genomewise," Genome Research, vol. 14, no. 5, p. 988- 995, 2004. https://doi.org/10.1101/gr.1865504
  126. I. B. Rogozin, L. Milanesi, and N. A. Kolchanov, "Gene structure prediction using information on homologous protein sequence," Computer Applications in the Biosciences (CABIOS), vol. 12, no. 3, pp. 161-170, 1996.
  127. O. Gotoh, "Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps," Bioinformatics, vol. 16, no. 3, pp. 190-202, 2000. https://doi.org/10.1093/bioinformatics/16.3.190
  128. S. Hunter, R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, D. Binns, et al., "Interpro: the integrative protein signature database," Nucleic Acids Research, vol. 37, no. Suppl 1, pp. D211-D215, 2009. https://doi.org/10.1093/nar/gkn785
  129. M. O. Dayhoff and R. M. Schwartz, "A model of evolutionary change in proteins," Atlas of Protein Sequence and Structure, vol. 5, pp. 345-252, 1978.
  130. S. Henikoff and J. G. Henikoff, "Amino acid substitution matrices from protein blocks," Proceedings of the National Academy of Sciences, vol. 89, no. 22, pp. 10915-10919, 1992.
  131. J. Wu, "Improving the specificity of exon prediction using comparative genomics," BMC Genomics, vol. 9, no. Suppl 2, article ID. S13, 2008.
  132. M. S. Gelfand, A. A. Mironov, and P. A. Pevzner, "Gene recognition via spliced sequence alignment." Proceedings of the National Academy of Sciences, vol. 93, no. 17, pp. 9061-9066, 1996.
  133. M. Stanke, A. Tzvetkova, and B. Morgenstern, "AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome," Genome Biology, vol. 7, no. Suppl 1, article ID. S11, 2006.
  134. Y. Xu and E. C. Uberbacher, "Gene prediction by pattern recognition and homology search," in Proceeding of the 4th International Conference on Intelligent Systems for Molecular Biology, St. Louis, MO, 1996, pp. 241-251.
  135. Y. Cai and P. Bork, "Homology-based gene prediction using neural nets," Analytical Biochemistry, vol. 265, no. 2, pp. 269-274, 1998. https://doi.org/10.1006/abio.1998.2876
  136. D. Rose, M. Hiller, K. Schutt, J. Hackermller, R. Backofen, and P. F. Stadler, "Computational discovery of human coding and non-coding transcripts with conserved splice sites," Bioinformatics, vol. 27, no. 14, pp. 1894-1900, 2011. https://doi.org/10.1093/bioinformatics/btr314
  137. J. E. Allen and S. L. Salzberg, "JIGSAW: integration of multiple sources of evidence for gene prediction," Bioinformatics, vol. 21, no. 18, pp. 3596-3603, 2005. https://doi.org/10.1093/bioinformatics/bti609
  138. R. Guigo, P. Flicek, J. Abril, A. Reymond, J. Lagarde, F. Denoeud, et al., "EGASP: the human ENCODE genome annotation assessment project," Genome Biology, vol. 7, no. Suppl 1, article ID. S2, 2006.
  139. L. Pachter, M. Alexandersson, and S. Cawley, "Applications of generalized pair hidden Markov models to alignment and gene finding problems," Journal of Computational Biology, vol. 9, no. 2, pp. 389-399, 2002. https://doi.org/10.1089/10665270252935520
  140. T. Larsen and A. Krogh, "EasyGene: a prokaryotic gene finder that ranks ORFs by statistical significance," BMC Bioinformatics, vol. 4, article ID. 21, 2003.
  141. G. Parra, P. Agarwal, J. F. Abril, T. Wiehe, J. W. Fickett, and R. Guigo, "Comparative gene prediction in human and mouse," Genome Research, vol. 13, no. 1, pp. 108-117, 2003. https://doi.org/10.1101/gr.871403
  142. R. A. Tesorero, N. Yu, J. O. Wright, J. P. Svencionis, Q. Cheng, J. H. Kim, and K. H. Cho, "Novel regulatory small RNAs in streptococcus pyogenes," PLoS ONE, vol. 8, no. 6, article ID. e64021, 2013.
  143. Y. Zhou, Y. Liang, C. Hu, L. Wang, and X. Shi, "An artificial neural network method for combining gene prediction based on equitable weights," Neurocomputing, vol. 71, no. 4-6, pp. 538-543, 2008. https://doi.org/10.1016/j.neucom.2007.07.019
  144. A. Krogh, "Two methods for improving performance of an hmm and their application for gene finding," in Proceeding of the 5th International Conference on Intelligent Systems for Molecular Biology, Chalkidikee, Greece, 1997, pp. 179-186.
  145. A. L. Delcher, D. Harmon, S. Kasif, O. White, and S. L. Salzberg, "Improved microbial gene identification with glimmer," Nucleic Acids Research, vol. 27, no. 23, pp. 4636-4641, 1999. https://doi.org/10.1093/nar/27.23.4636
  146. M. Burset and R. Guigo, "Evaluation of gene structure prediction programs," Genomics, vol. 34, no. 3, pp. 353- 367, 1996. https://doi.org/10.1006/geno.1996.0298
  147. J. Nasiri, M. Naghavi, S. N. Rad, T. Yolmeh, M. Shirazi, R. Naderi, M. Nasiri, and S. Ahmadi, "Gene identification programs in bread wheat: a comparison study," Nucleosides, Nucleotides and Nucleic Acids, vol. 32, no. 10, pp. 529-554, 2013. https://doi.org/10.1080/15257770.2013.832773
  148. W. Kent, C. Sugnet, T. Furey, K. Roskin, T. Pringle, A. Zahler, and D. Haussler, "UCSC genome browser," Genome Research, vol. 12, no. 6, pp. 996-1006, 2002. https://doi.org/10.1101/gr.229102.ArticlepublishedonlinebeforeprintinMay2002
  149. A. Coghlan, T. J. Fiedler, S. J. McKay, P. Flicek, T. W. Harris, D. Blasiar, and L. D. Stein, "nGASP: the nematode genome annotation assessment project," BMC Bioinformatics, vol. 9, article ID. 549, 2008.
  150. C. elegans Sequencing Consortium, "Genome sequence of the nematode C. elegans: a platform for investigating biology," Science, vol. 282, no. 5396, pp. 2012-2018, 1998. https://doi.org/10.1126/science.282.5396.2012
  151. N. Chen, T. W. Harris, I. Antoshechkin, C. Bastiani, T. Bieri, D. Blasiar, et al., "WormBase: a comprehensive data resource for Caenorhabditis biology and genomics," Nucleic Acids Research, vol. 33, no. Suppl 1, pp. D383- D389, 2005.
  152. A. Rogers, I. Antoshechkin, T. Bieri, D. Blasiar, C. Bastiani, P. Canaran, et al., "WormBase 2007," Nucleic Acids Research, vol. 36, no. Suppl 1, pp. D612-D617, 2008.
  153. T. Steijger, J. F. Abril, P. G. Engström, F. Kokocinski, Consortium, T. J. Hubbard, R. Guigo, J. Harrow, and P. Bertone, "Assessment of transcript reconstruction methods for RNA-seq," Nature Methods, vol. 10, no. 12, pp. 1177-1184, 2013. https://doi.org/10.1038/nmeth.2714
  154. M. Vilardell, G. Parra, and S. Civit, "WISCOD: a statistical web-enabled tool for the identification of significant protein coding regions," BioMed Research International, vol. 2014, article ID. 282343, 2014.
  155. G. St Laurent, D. Shtokalo, M. Tackett, Z. Yang, T. Eremina, C. Wahlestedt, et al., "Intronic RNAs constitute the major fraction of the noncoding RNA in mammalian cells," BMC Genomics, vol. 13, no. 1, article ID. 504, 2012.
  156. Y. Bai, J. Hassler, A. Ziyar, P. Li, Z. Wright, R. Menon, et al., "Novel bioinformatics method for identification of genome-wide non-canonical spliced regions using RNA-Seq data," PLoS ONE, vol. 9, no. 7, article ID. e100864, 2014.
  157. H. Wang, P. J. Chung, J. Liu, I. C. Jang, M. Kean, J. Xu, and N. H. Chua, "Genome-wide identification of long noncoding natural antisense transcripts and their responses to light in Arabidopsis," Genome Research, vol. 24, no. 3, pp. 444-453, 2014. https://doi.org/10.1101/gr.165555.113
  158. S. Spicuglia, M. A. Maqbool, D. Puthier, and J. C. Andrau, "An update on recent methods applied for deciphering the diversity of the noncoding RNA genome structure and function," Methods, vol. 63, no. 1, pp. 3-17, 2013. https://doi.org/10.1016/j.ymeth.2013.04.003
  159. J. W. Nam and D. P. Bartel, "Long noncoding RNAs in C. elegans," Genome Research, vol. 22, no. 12, pp. 2529- 2540, 2012. https://doi.org/10.1101/gr.140475.112
  160. R. Weikard, F. Hadlich, and C. Kuehn, "Identification of novel transcripts and noncoding RNAs in bovine skin by deep next generation sequencing," BMC Genomics, vol. 14, no. 1, article ID. 789, 2013.
  161. N. L. Barbosa-Morais, M. Irimia, Q. Pan, H. Y. Xiong, S. Gueroussov, L. J. Lee, et al., "The evolutionary landscape of alternative splicing in vertebrate species," Science, vol. 338, no. 6114, pp. 1587-1593, 2012. https://doi.org/10.1126/science.1230612
  162. Q. Pan, O. Shai, L. J. Lee, B. J. Frey, and B. J. Blencowe, "Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing," Nature Genetics, vol. 40, no. 12, pp. 1413-1415, 2008. https://doi.org/10.1038/ng.259
  163. H. Ohmiya, M. Vitezic, M. Frith, M. Itoh, P. Carninci, A. Forrest, et al., "Reclu: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (cage)," BMC Genomics, vol. 15, no. 1, article ID. 269, 2014.
  164. Y. Li, H. Li-Byarlay, P. Burns, M. Borodovsky, G. E. Robinson, and J. Ma, "TrueSight: a new algorithm for splice junction detection using RNA-seq," Nucleic Acids Research, vol. 41, no. 4, article ID. e51, 2013.
  165. P. D. Burns, Y. Li, J. Ma, and M. Borodovsky, "UnSplicer: mapping spliced RNA-seq reads in compact genomes and filtering noisy splicing," Nucleic Acids Research, vol. 42, no. 4, article ID. e25, 2014.
  166. M. Hiller, S. Agarwal, J. H. Notwell, R. Parikh, H. Guturu, A. M. Wenger, and G. Bejerano, "Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish," Nucleic Acids Research, vol. 41, no. 15, article ID. e151, 2013.
  167. S. Lertampaiporn, C. Thammarongtham, C. Nukoolkit, B. Kaewkamnerdpong, and M. Ruengjitchatchawalya, "Identification of non-coding RNAs with a new composite feature in the hybrid random forest ensemble algorithm," Nucleic Acids Research, vol. 42, no. 11, article ID. e93, 2014.
  168. C. De Filippo, M. Ramazzotti, P. Fontana, and D. Cavalieri, "Bioinformatic approaches for functional annotation and pathway inference in metagenomics data," Briefings in Bioinformatics, vol. 13, no. 6, pp. 696- 710, 2012. https://doi.org/10.1093/bib/bbs070
  169. H. Soueidan and M. Nikolski, "Machine learning for metagenomics: methods and tools," Oct. 2015; http://arxiv.org/pdf/1510.06621v1.pdf.
  170. E. Wijaya, M. C. Frith, P. Horton, and K. Asai, "Finding protein-coding genes through human polymorphisms," PLoS ONE, vol. 8, no. 1, article ID. e54210, 2013.
  171. M. Rho, H. Tang, and Y. Ye, "FragGeneScan: predicting genes in short and error-prone reads," Nucleic Acids Research, vol. 38, no. 20, article ID. e191, 2010.
  172. D. Hyatt, G. L. Chen, P. F. LoCascio, M. L. Land, F. W. Larimer, and L. J. Hauser, "Prodigal: prokaryotic gene recognition and translation initiation site identification," BMC Bioinformatics, vol. 11, article ID. 119, 2010.
  173. H. Noguchi, T. Taniguchi, and T. Itoh, "MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes," DNA Research, vol. 15, no. 6, pp. 387-396, 2008. https://doi.org/10.1093/dnares/dsn027
  174. F. S. Collins, L. D. Brooks, and A. Chakravarti, "A DNA polymorphism discovery resource for research on human genetic variation," Genome Research, vol. 8, no. 12, pp. 1229-1231, 1998. https://doi.org/10.1101/gr.8.12.1229
  175. S. J. Lee, K. A. Usmani, B. Chanas, B. Ghanayem, T. Xi, E. Hodgson, H. W. Mohrenweiser, and J. A. Goldstein, "Genetic findings and functional studies of human CYP3A5 single nucleotide polymorphisms in different ethnic groups." Pharmacogenetics, vol. 13, no. 8, pp. 461-472, 2003. https://doi.org/10.1097/00008571-200308000-00004
  176. N. Elango and S. V. Yi, "Functional relevance of CpG island length for regulation of gene expression," Genetics, vol. 187, no. 4, pp. 1077-1083, 2011. https://doi.org/10.1534/genetics.110.126094
  177. P. Deininger, "Alu elements: know the SINEs," Genome Biology, vol. 12, no. 12, article ID. 236, 2011.
  178. B. Hutter, V. Helms, and M. Paulsen, "Tandem repeats in the CpG islands of imprinted genes," Genomics, vol. 88, no. 3, pp. 323-332, 2006. https://doi.org/10.1016/j.ygeno.2006.03.019
  179. A. L. Brunner, D. S. Johnson, S. W. Kim, A. Valouev, T. E. Reddy, N. F. Neff, et al., "Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver," Genome Research, vol. 19, no. 6, pp. 1044-1056, 2009. https://doi.org/10.1101/gr.088773.108
  180. H. Wu, B. Caffo, H. A. Jaffee, R. A. Irizarry, and A. P. Feinberg, "Redefining CpG islands using hidden Markov models," Biostatistics, vol. 11, no. 3, pp. 499-514, 2010. https://doi.org/10.1093/biostatistics/kxq005
  181. N. Yu, X. Guo, A. Zelikovsky, and Y. Pan, "GaussianCpG: a Gaussian model for detection of human CpG island," in Proceedings of IEEE 5th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), Miami, FL, 2015.
  182. L. Deng and D. Yu, "Deep learning: methods and applications," May 2014; http://research.microsoft.com/apps/pubs/default.aspx?id=209355.
  183. I. Wallach, M. Dzamba, and A. Heifets, "AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery," Oct. 2015; http://arxiv.org/pdf/1510.02855v1.pdf.
  184. B. Ramsundar, S. Kearnes, P. Riley, D. Webster, D. Konerding, and V. Pande, "Massively multitask networks for drug discovery," Feb. 2015; http://arxiv.org/pdf/1502.02072v1.pdf.
  185. D. Chicco, P. Sadowski, and P. Baldi, "Deep autoencoder neural networks for gene ontology annotation predictions," in Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB'14), Washington, DC, 2014, pp. 533-540.
  186. R. Raina, A. Madhavan, and A. Y. Ng, "Large-scale deep unsupervised learning using graphics processors," in Proceedings of the 26th Annual International Conference on Machine Learning (ICML'09), Montreal, QC, 2009, pp. 873-880.
  187. X. Guo, Y. Meng, N. Yu, and Y. Pan, "Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering," BMC Bioinformatics, vol. 15, no. 1, article ID. 102, 2014.
  188. T. H. Chang, S. L. Wu, W. J. Wang, J. T. Horng, and C. W. Chang, "A novel approach for discovering conditionspecific correlations of gene expressions within biological pathways by using cloud computing technology," BioMed Research International, vol. 2014, article ID. 18, 2014.
  189. X. Guo, N. Yu, B. Li, and Y. Pan, "Cloud computing for NGS data analysis," in Computational Methods for Next Generation Sequencing Data Analysis. Hoboken, NJ: Wiley, 2016.
  190. J. Yee, M. S. Kwon, T. Park, and M. Park, "A modified entropy-based approach for identifying gene-gene interactions in case-control study," PLoS ONE, vol. 8, no. 7, article ID. e69321, 2013.
  191. A. Motahari, G. Bresler, and D. Tse, "Information theory of DNA shotgun sequencing," IEEE Transactions on Information Theory, vol. 59, no. 10, pp. 6273-6289, 2013. https://doi.org/10.1109/TIT.2013.2270273
  192. A. Ghosh and R. K. De, "A fuzzy entropy based approach for development of gene prediction networks (GPNs): detecting altered dependency in carcinogenic state," in Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB'11), Chicago, IL, 2011, pp. 320-324.
  193. L. Galleani and R. Garello, "The minimum entropy mapping spectrum of a DNA sequence," IEEE Transactions on Information Theory, vol. 56, no. 2, pp. 771-783, 2010. https://doi.org/10.1109/TIT.2009.2037041
  194. Z. Ouyang, H. Zhu, J. Wang, and Z. S. She, "Multivariate entropy distance method for prokaryotic gene identification," Journal of Bioinformatics and Computational Biology, vol. 2, no. 2, pp. 353-373, 2004. https://doi.org/10.1142/S0219720004000624
  195. S. Zhu, D. Wang, K. Yu, T. Li, and Y. Gong, "Feature selection for gene expression using model-based entropy," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 25-36, 2010. https://doi.org/10.1109/TCBB.2008.35
  196. P. Ramachandran and A. Antoniou, "Identification of hot-spot locations in proteins using digital filters," IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 3, pp. 378-389, 2008. https://doi.org/10.1109/JSTSP.2008.923850
  197. M. Sardaraz, M. Tahir, A. A. Ikram, and H. Bajwa, "SeqCompress: an algorithm for biological sequence compression," Genomics, vol. 104, no. 4, pp. 225-228, 2014. https://doi.org/10.1016/j.ygeno.2014.08.007
  198. L. Krause, A. C. McHardy, T. W. Nattkemper, A. Phler, J. Stoye, and F. Meyer, "GISMO: gene identification using a support vector machine for ORF classification," Nucleic Acids Research, vol. 35, no. 2, pp. 540-549, 2007. https://doi.org/10.1093/nar/gkl1083
  199. K. Vervier, P. Mathé, M. Tournoud, J. B. Veyrieras, and J. P. Vert, "Large-scale machine learning for metagenomics sequence classification," Bioinformatics, 2015, http://dx.doi.org/10.1093/bioinformatics/btv683.
  200. M. Welling, "Are machine learning and statistics complementary?" Dec. 2015; https://www.ics.uci.edu/-welling/publications/papers/WhyMLneedsStatistics.pdf.

Cited by

  1. Investigating Apache Hama: a bulk synchronous parallel computing framework vol.73, pp.9, 2017, https://doi.org/10.1007/s11227-017-1987-9
  2. Word clustering based on POS feature for efficient twitter sentiment analysis vol.8, pp.1, 2018, https://doi.org/10.1186/s13673-018-0140-y