DOI QR코드

DOI QR Code

An Improved Automated Spectral Clustering Algorithm

  • Xiaodan Lv (Institute of Automotive Engineers, Hubei University of Automotive Technology)
  • Received : 2022.11.22
  • Accepted : 2023.03.11
  • Published : 2024.04.30

Abstract

In this paper, an improved automated spectral clustering (IASC) algorithm is proposed to address the limitations of the traditional spectral clustering (TSC) algorithm, particularly its inability to automatically determine the number of clusters. Firstly, a cluster number evaluation factor based on the optimal clustering principle is proposed. By iterating through different k values, the value corresponding to the largest evaluation factor was selected as the first-rank number of clusters. Secondly, the IASC algorithm adopts a density-sensitive distance to measure the similarity between the sample points. This rendered a high similarity to the data distributed in the same high-density area. Thirdly, to improve clustering accuracy, the IASC algorithm uses the cosine angle classification method instead of K-means to classify the eigenvectors. Six algorithms-K-means, fuzzy C-means, TSC, EIGENGAP, DBSCAN, and density peak-were compared with the proposed algorithm on six datasets. The results show that the IASC algorithm not only automatically determines the number of clusters but also obtains better clustering accuracy on both synthetic and UCI datasets.

Keywords

Acknowledgement

This study was supported by the 2022 Annual Scientific Research Plan Project of the Hubei Provincial Department of Education (No. B2022352).

References

  1. L. Bai, X. Zhao, Y. Kong, Z. Zhang, J. Shao, and Y. Qian, "Survey of spectral clustering algorithms," Computer Engineering and Applications, vol. 57, no. 14, pp. 15-26, 2021. https://doi.org/10.3778/j.issn.1002-8331.2103-0547
  2. Z. Xia, X. Wang, L. Zhang, Z. Qin, X. Sun, and K. Ren, "A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing," IEEE Transactions on Information Forensics and Security, vol. 11, no. 11, pp. 2594-2608, 2016. https://doi.org/10.1109/TIFS.2016.2590944
  3. K. Xia, X. Gu, and Y. Zhang, "Oriented grouping-constrained spectral clustering for medical imaging segmentation," Multimedia Systems, vol. 26, pp. 27-36, 2020. https://doi.org/10.1007/s00530-019-00626-8
  4. Z. Yu, H. Chen, J. You, J. Liu, H. S. Wong, G. Han, and L. Li, "Adaptive fuzzy consensus clustering framework for clustering analysis of cancer data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 4, pp. 887-901, 2015. https://doi.org/10.1109/TCBB.2014.2359433
  5. X. Jiang, M. Chen, W. Song, and G. N. Lin, "Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data," BMC Medical Genomics, vol. 14(Suppl 1), article no. 141, 2021. https://doi.org/10.1186/s12920-021-00985-0
  6. U. Agrawal, D. Soria, C. Wagner, J. Garibaldi, I. O. Ellis, J. M. S. Bartlett, D. Cameron, E. A. Rakha, and A. R. Green, "Combining clustering and classification ensembles: A novel pipeline to identify breast cancer profiles," Artificial Intelligence in Medicine, vol. 97, pp. 27-37, 2019. https://doi.org/10.1016/j.artmed.2019.05.002
  7. D. Xu, C. Li, T. Chen, and F. Lang, "A Novel low rank spectral clustering method for face identification," Recent Patents on Engineering, vol. 13, no. 4, pp. 387-394, 2019. https://doi.org/10.2174/1872212112666180828124211
  8. S. Wazarkar and B. N. Keshavamurthy, "A survey on image data analysis through clustering techniques for real world applications," Journal of Visual Communication and Image Representation, vol. 55, pp. 596-626, 2018. https://doi.org/10.1016/j.jvcir.2018.07.009
  9. Z. Ding, J. Li, H. Hao, and Z. R. Lu, "Structural damage identification with uncertain modelling error and measurement noise by clustering based tree seeds algorithm," Engineering Structures, vol. 185, pp. 301-314, 2019. https://doi.org/10.1016/j.engstruct.2019.01.118
  10. Q. Wu, "Research and implementation of Chinese text clustering algorithm," Ph.D. dissertation, Xidian University, Xi'An, China, 2010.
  11. M. Ester, H. P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR, USA, 1996, pp. 226-231.
  12. A. Rodriguez and A. Laio, "Clustering by fast search and find of density peaks," Science, vol. 344, no. 6191, pp. 1492-1496, 2014. https://doi.org/10.1126/science.1242072
  13. A. Ng, M. Jordan, and Y. Weiss, "On spectral clustering: analysis and an algorithm," Advances in Neural Information Processing Systems, vol. 14, pp. 849-856, 2001.
  14. M. Fiedler, "Algebraic connectivity of graphs," Czechoslovak Mathematical Journal, vol. 23, no. 2, pp. 298-305, 1973. http://dx.doi.org/10.21136/CMJ.1973.101168
  15. J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000. https://doi.org/10.1109/34.868688
  16. H. Liu, J. Chen, J. Li, L. Shao, L. Ren, and L. Zhu, "Transformer fault warning based on spectral clustering and decision tree," Electronics, vol. 12, no. 2, article no. 265, 2023. https://doi.org/10.3390/electronics12020265
  17. L. Hagen and A. B. Kahng, "New spectral methods for ratio cut partitioning and clustering," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 11, no. 9, pp. 1074-1085, 1992. https://doi.org/10.1109/43.159993
  18. W. Z. Kong, Z. H. Sun, C. Yang, G. J. Dai, and C. Sun, "Automatic spectral clustering based on eigengap and orthogonal eigenvector," Acta Electronica Sinica, vol. 38, no. 8, pp. 1880-1885+1891, 2010.
  19. Z. Hu and J. Weng, "Adaptive spectral clustering algorithm based on artificial bee colony algorithm," Journal of Chongqing University of Technology (Natural Science Edition), vol. 34, no. 3, pp. 137-144, 2020. https://doi.org/10.3969/j.issn.1674-8425(z).2020.03.020
  20. R. Porter and N. Canagarajah, "A robust automatic clustering scheme for image segmentation using wavelets," IEEE Transactions on Image Processing, vol. 5, no. 4, pp. 662-665, 1996. https://doi.org/10.1109/83.491343
  21. C. Gao and X. Wu, "An automatic technique to determine cluster number for complex biologic datasets," China Journal of Bioinformatics, vol. 8, no. 4, pp. 295-298, 2010. https://doi.org/10.3969/j.issn.1672-5565.2010.04.003
  22. H. Chen, X. Shen, J. Long, and Y. Lu, "Fuzzy clustering algorithm for automatic identification of clusters," Acta Electronica Sinica, vol. 45, no. 3, pp. 687-694, 2017.
  23. L. Wang, L. Bo, and L. Jiao, "Density-sensitive spectral clustering," Acta Electronica Sinica, vol. 35, no. 8, pp. 1577-1581, 2007.
  24. O. Chapelle and A. Zien, "Semi-supervised classification by low density separation," in Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, Bridgetown, Barbados, 2005, pp. 57-64.
  25. P. Yang, Q. Zhu, and B. Huang, "Spectral clustering with density sensitive similarity function," Knowledge-Based Systems, vol. 24, no. 5, pp. 621-628, 2011. https://doi.org/10.1016/j.knosys.2011.01.009
  26. R. W. Floyd, "Algorithm 97: shortest path," Communications of the ACM, vol. 5, no. 6, pp. 345-345, 1962. https://doi.org/10.1145/367766.368168
  27. UCI Machine Learning Repository, "Machine learning datasets," c2023 [Online]. Available: https://archive.ics.uci.edu/.
  28. X. Xu, S. Ding, L. Wang, and Y. Wang, "A robust density peaks clustering algorithm with density-sensitive similarity," Knowledge-Based Systems, vol. 200, article no. 106028, 2020. https://doi.org/10.1016/j.knosys.2020.106028