DOI QR코드

DOI QR Code

Semi-Supervised Recursive Learning of Discriminative Mixture Models for Time-Series Classification

  • Kim, Minyoung (Department of Electronics & IT Media Engineering, Seoul National University of Science & Technology)
  • Received : 2013.01.09
  • Accepted : 2013.06.07
  • Published : 2013.09.25

Abstract

We pose pattern classification as a density estimation problem where we consider mixtures of generative models under partially labeled data setups. Unlike traditional approaches that estimate density everywhere in data space, we focus on the density along the decision boundary that can yield more discriminative models with superior classification performance. We extend our earlier work on the recursive estimation method for discriminative mixture models to semi-supervised learning setups where some of the data points lack class labels. Our model exploits the mixture structure in the functional gradient framework: it searches for the base mixture component model in a greedy fashion, maximizing the conditional class likelihoods for the labeled data and at the same time minimizing the uncertainty of class label prediction for unlabeled data points. The objective can be effectively imposed as individual mixture component learning on weighted data, hence our mixture learning typically becomes highly efficient for popular base generative models like Gaussians or hidden Markov models. Moreover, apart from the expectation-maximization algorithm, the proposed recursive estimation has several advantages including the lack of need for a pre-determined mixture order and robustness to the choice of initial parameters. We demonstrate the benefits of the proposed approach on a comprehensive set of evaluations consisting of diverse time-series classification problems in semi-supervised scenarios.

Keywords

References

  1. S. Ko, D. W. Kim, and B. Y. Kang, "A matrix-based genetic algorithm for structure learning of Bayesian networks," International Journal of Fuzzy Logic and Intelligent Systems, vol. 11, no. 3, pp. 135-142, Sep. 2011. http://dx.doi.org/10.5391/IJFIS.2011.11.3.135
  2. H. C. Cho, M. S. Fadali, and K. S. Lee, "Online parameter estimation and convergence property of dynamic Bayesian networks," International Journal of Fuzzy Logic and Intelligent Systems, vol. 7, no. 4, pp. 285-294, Dec. 2007. http://dx.doi.org/10.5391/IJFIS.2007.7.4.285
  3. N. Friedman, D. Geiger, and M. Goldszmidt, "Bayesian network classifiers," Machine Learning, vol. 29, pp. 131- 163, 1997. https://doi.org/10.1023/A:1007465528199
  4. T. Starner and A. Pentland, "Real-time American sign language recognition from video using hidden Markov models," in Proceedings of 1995 International Symposium on Computer Vision, Coral Gables, FL, 1995, pp. 265-270. http://dx.doi.org/10.1109/ISCV.1995.477012
  5. A. D. Wilson and A. F. Bobick, "Parametric hidden Markov models for gesture recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 884-900, Sep. 1999. http://dx.doi.org/10. 1109/34.790429 https://doi.org/10.1109/34.790429
  6. P. C. Woodland and D. Povey, "Large scale discriminative training of hidden Markov models for speech recognition," Computer Speech & Language, vol. 16, no. 1, pp. 25-47, Jan. 2002. http://dx.doi.org/10.1006/csla.2001.0182
  7. J. Alon, S. Sclaroff, G. Kollios, and V. Pavlovic, "Discovering clusters in motion time-series data," in Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Wisconsin, 2003, pp. 375-381.
  8. S. Y. Lee and K. J. Lee, "Pattern classification model design and performance comparison for data mining of time series data," Journal of Korean Institute of Intelligent Systems, vol. 21, no. 6, pp. 730-736, Dec. 2011. http: //dx.doi.org/10.5391/JKIIS.2011.21.6.730
  9. Y. K. Bang and C. H. Lee, "Design of fuzzy system with hierarchical classifying structures and its application to time series prediction," Journal of Korean Institute of Intelligent Systems, vol. 19, no. 5, pp. 595-602, Oct. 2009. http://dx.doi.org/10.5391/JKIIS.2009.19.5.595
  10. R. Greiner and W. Zhou, "Structural extension to logistic regression: discriminative parameter learning of belief net classifiers," in Proceeding 18th National Conference on Artificial Intelligence, Edmonton, AB, 2002, pp. 167-173.
  11. F. Pernkopf and J. Bilmes, "Discriminative versus generative parameter and structure learning of Bayesian Network Classifiers," in ref Proceedings of the 22nd International Conference on Machine Learning, Bonn, 2005, pp. 657- 664.
  12. J. Salojarvi, K. Puolamaki, and S. Kaski, "On discriminative joint density modeling," in Proceedings of the 16th European Conference on Machine Learning, Berlin, 2005, pp. 341-352.
  13. Q. N. Dinh and C. H. Lee, "Model-based clustering of DOA data using von mises mixture model for sound source localization," International Journal of Fuzzy Logic and Intelligent Systems, vol. 13, no. 1, pp. 59-66, Mar. 2013. http://dx.doi.org/10.5391/IJFIS.2013.13.1.59
  14. J. Lee, S. Cho, J. Kim, and S.-T. Chung, "Layered object detection using adaptive gaussian mixture model in the complex and dynamic environment," Journal of Korean Institute of Intelligent Systems, vol.18, no. 3, pp. 387-391, Jun. 2008. http://dx.doi.org/10.5391/JKIIS.2008.18.3.387
  15. S. S. Kim, K. C. Kwak, J. W. Ryu, and M. G. Chun, "A Neuro-Fuzzy Modeling using the Hierarchical Clustering and Gaussian Mixture Model," Journal of Korean Institute of Intelligent Systems, vol. 13, no. 5, pp. 512-519, Oct. 2003. http://dx.doi.org/10.5391/JKIIS.2003.13.5.512
  16. M. Kim and V. Pavlovic, "Recursive method for discriminative mixture learning," in Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007, pp. 409-416. http://dx.doi.org/10.1145/1273496.1273548
  17. J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, Oct. 1999. http://dx.doi.org/10.1214/aos/1013203451
  18. Y. Grandvalet and Y. Bengio, "Semi-supervised learning by entropy minimization," in Proceeding of Advances in Neural Information Processing Systems, Vancouver, BC, 2004.
  19. A. Nadas, "A decision theorectic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 31, no. 4, pp. 814-817, Aug. 1983. http://dx.doi.org/10.1109/TASSP.1983.1164173
  20. V. Pavlovic, "Model-based motion clustering using boosted mixture modeling," in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, 2004, pp. 811- 818.
  21. Y. Jing, V. Pavlovic, and J. M. Rehg, "Efficient discriminative learning of Bayesian network classifier via boosted augmented naive Bayes," in Proceedings of the 22nd International Conference on Machine Learning, Bonn, 2005, pp. 369-376. http://dx.doi.org/10.1145/1102351.1102398
  22. Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," in Proceedings of the 2nd European Conference, Barcelona, 1995, pp. 23-37. http://dx.doi.org/10. 1007/3-540-59119-2 166
  23. T. Jaakkola, M. Diekhans, and D. Haussler, "Using the Fisher kernel method to detect remote protein homologies," in Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, Heidelberg, 1999, pp. 149-158.
  24. H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26, no. 1, pp. 43-49, Feb. 1978. http://dx.doi.org/10. 1109/TASSP.1978.1163055 https://doi.org/10.1109/TASSP.1978.1163055
  25. C. A. Ratanamahatana and E. Keogh, "Making timeseries classification more accurate using learned constraints," in Proceedings of the 4th SIAM International Conference on Data Mining, Lake Buena Vista, FL, 2004, pp. 11-21.
  26. A. Veeraraghavan, R. Chellappa, and A. K. Roy- Chowdhury, "The function space of an activity," in Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, 2006, pp. 959-968. http://dx.doi.org/10.1109/CVPR.2006.304
  27. K. Crammer and Y. Singer, "On the algorithmic implementation of multiclass kernel-based vector machines," Journal of Machine Learning Research, vol. 2, pp. 265- 292, Dec. 2001.
  28. T. Hastie and R. Tibshirani, "Classification by pairwise coupling," in Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems, Denver, CO, 1997, pp. 507-513.
  29. K. B. Duan and S. S. Keerthi, "Which is the best multiclass SVM method? An empirical study," in Proceedings of the 6th International Conference on Multiple Classifier Systems, Seaside, CA, 2005, pp. 278-285. http://dx.doi. org/10.1007/11494683 28
  30. E. Keogh and T. Folias, "The UCR time series data mining archive," Department Computer Science & Engineering, University of California, Riverside CA, 2002.
  31. S. Hettich and S. D. Bay, "The UCI KDD archive," Department of Information and Computer Science, University of California, Irvine, CA, 2009.
  32. R. Tanawongsuwan and A. F. Bobick, "Characteristics of time-distance gait parameters across speeds," Available https://smartech.gatech.edu/bitstream/handle/1853/ 85/03-01.pdf?sequence=1
  33. R. Tanawongsuwan and A. Bobick, "Performance analysis of time-distance gait parameters under different speeds," in Proceedings of the 4th International Conference on Audio- and Video-Based Biometric Person Authentication, Guildford, 2003, pp. 715-724. http://dx.doi.org/10.1007/ 3-540-44887-X 83
  34. P. Saisan,G. Doretto,Y. N. Wu, and S. Soatto, "Dynamic texture recognition," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, 2001, pp. 58-63.
  35. G. Doretto, A. Chiuso, Y. N.Wu, and S. Soatto, "Dynamic textures," International Journal of Computer Vision, vol. 51, no. 2, pp. 91109, Feb. 2003. http://dx.doi.org/10.1023/ A:1021669406132
  36. A. B. Chan and N. Vasconcelos, "Probabilistic kernels for the classification of auto-regressive visual processes," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, 2005, pp. 846-851. http://dx.doi.org/10.1109/CVPR. 2005.279
  37. P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in Proceedings of 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, 2005, pp. 65-72. http://dx.doi.org/10.1109/VSPETS.2005.1570899