Browse > Article
http://dx.doi.org/10.5351/KJAS.2016.29.6.1107

Analysis of massive data in astronomy  

Shin, Min-Su (Korea Astronomy and Space Science Institute)
Publication Information
The Korean Journal of Applied Statistics / v.29, no.6, 2016 , pp. 1107-1116 More about this Journal
Abstract
Recent astronomical survey observations have produced substantial amounts of data as well as completely changed conventional methods of analyzing astronomical data. Both classical statistical inference and modern machine learning methods have been used in every step of data analysis that range from data calibration to inferences of physical models. We are seeing the growing popularity of using machine learning methods in classical problems of astronomical data analysis due to low-cost data acquisition using cheap large-scale detectors and fast computer networks that enable us to share large volumes of data. It is common to consider the effects of inhomogeneous spatial and temporal coverage in the analysis of big astronomical data. The growing size of the data requires us to use parallel distributed computing environments as well as machine learning algorithms. Distributed data analysis systems have not been adopted widely for the general analysis of massive astronomical data. Gathering adequate training data is expensive in observation and learning data are generally collected from multiple data sources in astronomy; therefore, semi-supervised and ensemble machine learning methods will become important for the analysis of big astronomical data.
Keywords
astronomical data; statistical inference; machine learning; parallel computing; distributed computing;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Hahm, J., Kwon, O.-K., Kim, S., Jung, Y.-H., Yoon, J.-W., Kim, J., Kim, M.-K., Byun, Y.-I., Shin, M.-S., and Park, C. (2012). Astronomical time series data analysis leveraging science cloud, In Lecture Notes in Electrical Engineering, 181, 493-500.
2 Hira, Z. M. and Gillies, D. F. (2015). A review of feature selection and feature extraction methods applied on microarray data, Advances in Bioinformatics, 2015, Article ID 198363.
3 Ihaka, R. and Gentleman, R. (1996). R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5, 299-314.
4 Ivezic, Z., Tyson, J. A., Abel, B., Acosta, E., Allsman, R., AlSayyad, Y., et al. (2008). LSST: from science drivers to reference design and anticipated data products, ArXiv e-prints, 0805.2366, Available from: https://arxiv.org/abs/0805.2366
5 Ivezic, Z., Connolly, A. J., VanderPlas, J. T., and Gray, A. (2014). Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data, Princeton University Press.
6 Liao, K., Treu, T., Marshall, P., Fassnacht, C. D., Rumbaugh, N., Dobler, G., et al. (2015). Strong lens time delay challenge. II. Results of TDC1. The Astrophysical Journal, 800, 11.   DOI
7 Patil, A., Huard, D., and Fonnesbeck, C. (2010). PyMC: Bayesian stochastic modelling in python. Journal of Statistical Software, 35, 4.
8 Pier, J. R., Munn, J. A., Hindsley, R. B., Hennessy, G. S., Kent, S. M., Lupton, R. H., et al. (2003). Astrometric calibration of the sloan digital sky survey. The Astronomical Journal, 125, 1559-1579.   DOI
9 Shin, M.-S. and Byun, Y.-I. (2004). Efficient period search for time series photometry. Journal of Korean Astronomical Society, 37, 79-85.   DOI
10 Saeys, Y., Inza, I., and Larra-naga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23, 2507-2517.   DOI
11 Singh, N., Browne, L.-M,. and Butler, R. (2013). Parallel astronomical data processing with Python: Recipes for multicore machines. Astronomy and Computing, 2, 1-10.   DOI
12 Stetson, P. B. (1996). On the automatic determination of light-curve parameters for Cepheid variables. Publications of the Astronomical Society of the Pacific, 108, 851-876.   DOI
13 Szalay, A. S., Kunszt, P. Z., Thakar, A. R., Gray, J., and Slutz, D. (2000). The sloan digital sky survey and its archive, Astronomical Data Analysis Software and Systems IX. ASP Conference Proceedings, 216, 405-414.
14 Szapudi, I., Pan, J., Prunet, S., and Budavari, T. (2005). Fast edge-corrected measurement of the two-point correlation function and the power spectrum. The Astrophysical Journal, 631, L1-L4.   DOI
15 Zhou, Z.-H. (2015). Ensemble learning, Encyclopedia of Biometrics, Springer US, Boston.
16 Townsend, R. H. D. (2010). Fast calculation of the Lomb-Scargle periodogram using graphics processing units. The Astrophysical Journal Supplement, 191, 247-253.   DOI
17 Vio, R., Diaz-Trigo, M., and Andreani, P. (2013). Irregular time series in astronomy and the use of the Lomb-Scargle periodogram. Astronomy and Computing, 1, 5-16.   DOI
18 Way, M. J., Scargle, J. D., Ali, K. M., and Srivastava, A. N. (2012). Advances in Machine Learning and Data Mining for Astronomy (1st ed.), Chapman & Hall/CRC.
19 Zhang, Y. and Zhao, Y. (2015). Astronomy in the big data era. Data Science Journal, 14, 1-9.
20 Zheng, H. and Zhang, Y. (2008). Feature selection for high-dimensional data in astronomy. Advances in Space Research, 41, 1960-1964.   DOI
21 Zuntz, J., Paterno, M., Jennings, E., Rudd, D., Manzotti, A., Dodelson, S., Bridle, S., Sehrish, S., and Kowalkowski, J. (2015). CosmoSIS: Modular cosmological parameter estimation. Astronomy and Computing, 12, 45-59.   DOI
22 Von Neumann, J. (1941). Distribution of the ratio of mean square successive difference to the variance. The Annals of Mathematical Statistics, 12, 367-395.   DOI
23 Borra, S. and Di Ciaccio, A. (2010). Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods. Computational Statistics & Data Analysis, 54, 2976-2989.   DOI
24 Allison, R. and Dunkley, J. (2014). Comparison of sampling techniques for Bayesian parameter estimation. Monthly Notices of the Royal Astronomical Society, 437, 3918-3928.   DOI
25 Alonso, D. (2012). CUTE solutions for two-point correlation functions from large cosmological datasets, ArXiv e-prints, 1210.1833. Available from: https://arxiv.org/abs/1210.1833
26 Ball, N. M. and Brunner, R. J. (2010). Data mining and machine learning in astronomy. International Journal of Modern Physics D, 19, 1049-1106.   DOI
27 Bhat, P. C. (2011). Multivariate analysis methods in particle physics. Annual Review of Nuclear and Particle Science, 61, 281-309.   DOI
28 Borne, K. (2013). Virtual observatories, data mining, and astroinformatics. In Planets, Stars and Stellar Systems (pp. 403-443), Springer Netherlands
29 Cavuoti, S., Brescia, M., De Stefano, V., and Longo, G. (2015). Photometric redshift estimation based on data mining with PhotoRApToR. Experimental Astronomy, 39, 45-71.   DOI
30 Chapelle, O., Schlkopf, B., and Zien, A. (2010). Semi-Supervised Learning, The MIT Press.
31 Abazajian, K. N. Abazajian, K. N., Adelman-McCarthy, J. K., Agueros, M. A., Allam, S. S., Prieto, C. A., An, D., et al. (2009). The seventh data release of the Sloan Digital Sky survey. The Astrophysical Journal Supplement, 182, 543-558.   DOI
32 Feigelson, E. D. and Babu, J. (2012). Statistical Challenges in Modern Astronomy V, (Volume 902 of Lecture Notes in Statistics), Springer, New York.
33 Feroz, F., Hobson, M. P., and Bridges, M. (2009). MULTINEST: an efficient and robust Bayesian inference tool for cosmology and particle physics. Monthly Notices of the Royal Astronomical Society, 398, 1601-1614.   DOI
34 Foreman-Mackey, D., Hogg, D. W., Lang, D., and Goodman, J. (2013). emcee: The MCMC Hammer. Publications of the Astronomical Society of Pacific, 125, 306-312.   DOI
35 Gebru, I. D., Alameda-Pineda, X., Forbes, F., and Horaud, R. (2015). EM algorithms for weighted-data clustering with application to audio-visual scene analysis, CoRR, Available from: https://arxiv.org/abs/1509.01509
36 Golombek, D. (2004). Archives, databases and the emerging virtual observatories. Astrophysics and Space Science, 290, 449-456.   DOI
37 Al-Jarrah, O. Y., Yoo, P. D., Muhaidat, S., Karagiannidis, G. K., and Taha, K. (2015). Efficient machine learning for big data: a review. Big Data Research, 2, 87-93.   DOI
38 Gunn, J. E., Siegmund, W. A., Mannery, E. J., Owen, R. E., Hull, C. L., Leger, R. F., et al. (2006). The 2.5 m telescope of the sloan digital sky survey. The Astronomical Journal, 131, 2332-2359.   DOI