DOI QR코드

DOI QR Code

Performance Improvement of Deep Clustering Networks for Multi Dimensional Data

다차원 데이터에 대한 심층 군집 네트워크의 성능향상 방법

  • Lee, Hyunjin (Division of ICT Engineering, Korea Soongsil Cyber University)
  • Received : 2018.07.09
  • Accepted : 2018.07.19
  • Published : 2018.08.31

Abstract

Clustering is one of the most fundamental algorithms in machine learning. The performance of clustering is affected by the distribution of data, and when there are more data or more dimensions, the performance is degraded. For this reason, we use a stacked auto encoder, one of the deep learning algorithms, to reduce the dimension of data which generate a feature vector that best represents the input data. We use k-means, which is a famous algorithm, as a clustering. Sine the feature vector which reduced dimensions are also multi dimensional, we use the Euclidean distance as well as the cosine similarity to increase the performance which calculating the similarity between the center of the cluster and the data as a vector. A deep clustering networks combining a stacked auto encoder and k-means re-trains the networks when the k-means result changes. When re-training the networks, the loss function of the stacked auto encoder and the loss function of the k-means are combined to improve the performance and the stability of the network. Experiments of benchmark image ad document dataset empirically validated the power of the proposed algorithm.

Keywords

References

  1. G. Trigeorgis, K. Bousmalis, S. Zafeiriou, and B. Schuller, "A Deep Semi-nmf Model for Learning Hidden Representations," Proceeding of the 31st International Conference on Machine Learning, Vol. 46, pp. 1692-1700, 2014.
  2. J. Yang, D. Parikh, and D. Batra, "Joint Unsupervised Learning of Deep Representations and Image Clusters," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5147-5156, 2016.
  3. F. Li, H. Qiao, B. Zhang, and X. Xi, "Discriminatively Boosted Image Clustering with Fully Convolutional Auto-encoders," Pattern Recognition, Vol. 83, pp. 161-173, 2018. https://doi.org/10.1016/j.patcog.2018.05.019
  4. H.J. Lee, "Hierarchical Deep Belief Network for Activity Recognition Using Smartphone Sensor," Journal of Korea Multimedia Society, Vol. 20, No. 8, pp. 1421-1429, 2017. https://doi.org/10.9717/KMMS.2017.20.8.1421
  5. J. Xie, R. Girshick, and A. Farhadi, "Unsupervised Deep Embedding for Clustering Analysis," Proceeding of the 33rd International Conference on Machine Learning, Vol. 48, pp. 478-487, 2016.
  6. C.M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2016.
  7. J. Ye, Z. Zhao, and M. Wu, "Discriminative K-means for Clustering," Proceeding of the 21st Annual Conference on Neural Information Processing Systems, arXiv:1306.2102, 2009.
  8. U.V. Luxburg, "A Tutorial on Spectral Clustering," Statistics and Computing, Vol. 17, No. 4, pp. 395-416, 2007. https://doi.org/10.1007/s11222-007-9033-z
  9. L. Van Der Maaten, "Accelerating t-SNE Using Tree-based Algorithms," The Journal of Machine Learning Research, Vol. 15, No. 1, pp. 3221-3245, 2014.
  10. B. Yang, X. Fu, N.D. Sidiropoulos, and M. Hong, "Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering," Proceeding of the 34th International Conference on Machine Learning, arXiv:1610.04794, 2017.
  11. P. Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, Basic Books, New York, 2015.
  12. G. Xifeng, L. Xinwang, Z. En, and Y. Jianping, "Deep Clustering with Convolutional Autoencoders," Lecture Notes in Computer Science, Vol. 10635, pp. 373-382, 2017.
  13. L.V.D. Maaten and G. Hinton, "Visualizing Data Using Accelerating t-SNE Using Treebased Algorithms,," The Journal of Machine Learning Research, Vol. 9, pp. 2579-2605, 2008.
  14. Y. LeCun, C. Cortes, and C.J. Burges, http://yann.lecun.com/exdb/mnist/ (accessed Mar., 20, 2018).
  15. D.D. Lewis, Y. Yang, T.G. Rose, and F. Li, "RCV1: A New Benchmark Collection for Text Categorization Research," The Journal of Machine Learning Research, Vol. 5, pp. 361-397, 2004.

Cited by

  1. 합성곱 오토인코더 기반의 응집형 계층적 군집 분석 vol.23, pp.1, 2018, https://doi.org/10.9717/kmms.2020.23.1.001
  2. Spark 기반에서 Python과 Scala API의 성능 비교 분석 vol.23, pp.2, 2018, https://doi.org/10.9717/kmms.2020.23.2.241