DOI QR코드

DOI QR Code

Discriminative Training of Sequence Taggers via Local Feature Matching

  • Kim, Minyoung (Department of Electronics and IT Media Engineering, Seoul National University of Science and Technology)
  • Received : 2014.08.26
  • Accepted : 2014.09.20
  • Published : 2014.09.25

Abstract

Sequence tagging is the task of predicting frame-wise labels for a given input sequence and has important applications to diverse domains. Conventional methods such as maximum likelihood (ML) learning matches global features in empirical and model distributions, rather than local features, which directly translates into frame-wise prediction errors. Recent probabilistic sequence models such as conditional random fields (CRFs) have achieved great success in a variety of situations. In this paper, we introduce a novel discriminative CRF learning algorithm to minimize local feature mismatches. Unlike overall data fitting originating from global feature matching in ML learning, our approach reduces the total error over all frames in a sequence. We also provide an efficient gradient-based learning method via gradient forward-backward recursion, which requires the same computational complexity as ML learning. For several real-world sequence tagging problems, we empirically demonstrate that the proposed learning algorithm achieves significantly more accurate prediction performance than standard estimators.

Keywords

References

  1. A. Nadas, "A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 31, no. 4, pp. 814-817, 1983. http://dx.doi.org/10.1109/TASSP.1983.1164173
  2. P. Woodland and D. Povey, "Large scale discriminative training for speech recognition," in Proceedings of ISCA Workshop on Automatic Speech Recognition: Challenges for the New Millennium, Paris, France, 2000.
  3. J. D. Lafferty, A. McCallum, and F. C. N. Pereira, "Conditional random fields: probabilistic models for segmenting and labeling sequence data," in Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, June 28-July 1, 2001, pp. 282-289.
  4. F. Sha and F. Pereira, "Shallow parsing with conditional random fields," in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, 2003, pp. 134-141. http://dx.doi.org/10.3115/1073445.1073473
  5. S. Kumar and M. Hebert, "Discriminative random fields," International Journal of Computer Vision, vol. 68, no. 2, pp. 179-201, Jun. 2006. http://dx.doi.org/10.1007/s11263-006-7007-9
  6. A. Quattoni, M. Collins, and T. Darrell, "Conditional random fields for object recognition," in Proceedings of the 18th Annual Conference on Neural Information Processing Systems, Vancouver, Canada, December 13-18, 2004.
  7. R. McDonald and F. Pereira, "Identifying gene and protein mentions in text using conditional random fields," BMC Bioinformatics, vol. 6, no. Suppl 1, pp. S6, 2005. http://dx.doi.org/10.1186/1471-2105-6-S1-S6
  8. H. Xuming, R. S. Zemel, and M. A. Carreira-Perpindn, "Multiscale conditional random fields for image labeling," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,Washington, DC, June 27-July 2, 2004, pp. II695-II702. http://dx.doi.org/10.1109/CVPR.2004.1315232
  9. A. Gunawardana, M. Mahajan, A. Acero, and J. C. Platt, "Hidden conditional random fields for phone classification," in Proceedings of the 9th International Conference on Speech Communication and Technology, Lisbon, Portugal, 2005.
  10. A. Mccallum, D. Freitag, and F. C. N. Pereira, "Maximum Entropy Markov Models for information extraction and segmentation," in Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, 2000, pp. 591-598
  11. E. F. Tjong Kim Sang, "Introduction to the CoNLL-2002 shared task: language-independent named entity recognition," in Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwn, 2002, pp. 155-158. http://dx.doi.org/10.3115/1118853.1118877