Browse > Article
http://dx.doi.org/10.4218/etrij.2019-0396

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry  

Prakash, Amit (Department of Computer Science and Engineering, Birla Institute of Technology)
Singh, Niraj Kumar (Department of Computer Science and Engineering, Birla Institute of Technology)
Saha, Sujan Kumar (Department of Computer Science and Engineering, Birla Institute of Technology)
Publication Information
ETRI Journal / v.44, no.3, 2022 , pp. 413-425 More about this Journal
Abstract
The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.
Keywords
feature extraction; literary study; literary text extraction; poetry classification; word embedding;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. M. Kaplan and D. M. Blei, A computational approach to style in American poetry, (Seventh IEEE International Conference on Data Mining. Omaha, NE, USA), Oct. 2007, pp. 553-558.
2 A. Rahgozar and D. Inkpen, Bilingual chronological classification of Hafezs poems, (Proceedings of the Fifth Workshop on Computational Linguistics for Literature, San Diego, CA, USA), June 2016, pp. 54-62.
3 G. Rakshit, A. Ghosh, P. Bhattacharyya, and G. Haffari, Automated analysis of Bangla poetry for classification and poet identification, (Proceedings of the 12th International Conference on Natural Language Processing, Trivandrum, India), 2015, pp. 247-253.
4 S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci. 41 (1990), no. 6, 391-407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9   DOI
5 T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, arXiv preprint, ICLR, 2013. https://doi.org/10.48550/arXiv.1301.3781
6 C. Cortes and V. Vapnik, Support vector networks, Mach. Learn. 20 (1995), 273-297. https://doi.org/10.1007/BF00994018   DOI
7 F. Colas and P. Brazdil, Comparison of SVM and some older classification algorithms in text classification tasks, In Conference on Artificial Intelligence in Theory and Practice, I. F. I. P. International. (ed.), Springer, Boston, MA, 2006, 169-178.
8 R. M. Cyotl-Morales, L. Villasenor-Pineda, M. Montes-y-Gomez, and P. Rosso, Authorship attribution using word sequences, (Progress in Pattern Recognition, Image Analysis and Applications, Cancun, Mexico), Nov. 2006, pp. 844-853. https://doi.org/10.1007/11892755_87   DOI
9 T. Chakraborty and S. Bandyopadhyay, Identification of reduplication in Bengali corpus and their semantic analysis A rulebased approach, (Proceedings of the Multiword Expressions: From Theory to Applications, Beijing, China), Aug. 2010, pp. 73-76.
10 P. Y. Pawar and S. H. Gawande, A comparative study on different types of approaches to text categorization, Int. J. Mach. Learn. Comput. 2 (2012), no. 4, 423-426. https://doi.org/10.7763/IJMLC.2012.V2.158   DOI
11 R. Shukla, Hindi Sahitya ka Itihas Prabhat Prakashan, 1st ed. 10 April 2016.
12 M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, The Weka data mining software, A.C.M. SIGKDD Explor. Newsl. 11 (2009), 10-18. https://doi.org/10.1145/1656274.1656278   DOI
13 D. V. Lindley, Fiducial distributions and Bayes theorem, J. R. Stat. Soc. B. Methodol. 20 (1958), 102-107. https://doi.org/10.1111/j.2517-6161.1958.tb00278.x   DOI
14 F. E. Gould, Creative expression through poetry, Elem. Engl. 26 (1949), 391-393.
15 S. Wang and C. D. Manning, Baselines and bigrams: Simple, good sentiment and topic classification, (Proceedings of the 50th annual meeting of the Association for Computational Linguistics: Association for Computational Linguistics, Jeju, Rep. of Korea), July 2012, pp. 90-94.
16 E. Gabrilovich and S. Markovitch, Feature generation for text categorization using world knowledge, (Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland), July 2005, pp. 1048-1053.
17 L. Mohan, Encyclopedia of Indian literature, Sahitya Akad., 1992.
18 C. O. Hartman and Free, Verse: An essay on Prosody, Northwestern University Press, 1980.
19 P. Hobsbaum, Metre, rhythm and verse form Routledge, Routledge, 1996.
20 S. Phani, L. Shibamouli, and A. Biswas, Authorship attribution in Bengali language, (Proceedings of the 12th International Conference on Natural Language Processing, Trivandrum, India), 2015, pp. 100-105.
21 A. Almuhareb, I. Alkharashi, L. A. L Saud, and H Altuwaijri, Recognition of classical arabic poems, (Proceedings of the Second Workshop on Computational Linguistics for Literature, Atlanta, GA, USA), June 2013, pp. 9-16.
22 P. J. Chaudhury, The theory of Rasa, J. Aesthet. Art Critic. 11 (1952), no. 2, 147-150, Special Issue on Oriental Art and Aesthetics.   DOI
23 W. Dace, The concept of "Rasa" in Sanskrit dramatic theory, Educ. Theatre J. 15 (1963), no. 3, 249-2554. https://doi.org/10.2307/3204783   DOI
24 V. P. Dhananjayan and B. R. Rhythms, Dhananjayan on Indian classical dance, 3rd revised ed., BR Rhythms, 2004.
25 S. Das and P. Mitra, Author identification in Bengali literary works, (PReMI 2011: Pattern Recognition and Machine Intelligence, Moscow, Russia), 2011, pp. 220-226. https://doi.org/10.1007/978-3-642-21786-9_37   DOI
26 M. Lustrek, Overview of automatic genre identification, Jozef Stefan Institute Department of Intelligent Systems, 2006.
27 A. Lou, D. Inkpen, and C. Tanasescu, Multilabel subject-based classification of poetry, (Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, Hollywood, FL, USA), May 2015, pp. 187-192.
28 R. Delmonte, Computing poetry style, (ESSEM@AI* IA, Torino, Italy) Dec. 2013, pp. 148-155.
29 J. Kaur and J. R. Saini, Automatic Punjabi poetry classification using machine learning algorithms with reduced feature set, Int. J. Artif. Intell. Soft. Comput. 5 (2016), no. 4, 311-319. https://doi.org/10.1504/IJAISC.2016.081353   DOI
30 N. Lidova, Natyashastra, Oxford University Press, (2014). https://doi.org/10.1093/obo/9780195399318-0071   DOI
31 J. T. Kao and D. Jurafsky, A computational analysis of poetic style, Literature Lifts up Comput. Linguistics 12 (2015), 1377. https://doi.org/10.33011/lilt.v12i.1377   DOI
32 N. Rang, Poetry classification using support vector machines, J. Comput. Sci. 8 (2012), no. 9, 1441-1446. https://doi.org/10.3844/jcssp.2012.1441.1446   DOI
33 J. R. Quinlan, Induction of decision trees, Mach. Learn. 1 (1986), 81-106. https://doi.org/10.1007/BF00116251   DOI
34 L. Breiman, Random forests, Mach. Learn. 45 (2001), no. 1, 5-32. https://doi.org/10.1023/A:1010933404324   DOI
35 T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inform. Theory 13 (1967), 21-27. https://doi.org/10.1109/TIT.1967.1053964   DOI
36 J. H. Friedman, Greedy function approximation: A gradient boosting machine, Ann. Statist. 29 (2001), no. 5, 1189-1232. https://doi.org/10.1214/aos/1013203451   DOI
37 M. Williams, Rasa, Sanskrit English dictionary with etymology, Motilal Banarsidass (Originally Published: Oxford), 1899.
38 S. L. Schwartz, Rasa: "Performing the divine in India", Columbia University Press, 2004, 12-15.
39 H. R. Tizhoosh, F. Sahba, and R. Dara, Poetic features for poem recognition: A comparative study, Pattern Recognit. Res. 3 (2008), 24-39.
40 A. Almuhareb, W. A. Almutairi, H. Altuwaijri, A. Almubarak, and M. Khan, Recognition of modern Arabic poems, J. Softw. 10 (2015), 454-464. https://doi.org/10.17706/jsw.10.4.454-464   DOI
41 F. Can and J. M. Patton, Change of writing style with time, Comput. Humanit. 38 (2004), 61-82. https://doi.org/10.1023/B:CHUM.0000009225.28847.77   DOI
42 F. Can and J. M. Patton, Change of word characteristics in 20th-century Turkish literature: A statistical analysis, J. Quant. Linguist. 17 (2010), no. 3, 167-190. https://doi.org/10.1080/09296174.2010.485444   DOI
43 J. T. Kao and D. Jurafsky, A computational analysis of style, affect, and imagery in contemporary poetry, (Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature, Montreal, Canada), June 2012, pp. 8-17.
44 R. Voigt and D. Jurafsky, Tradition and modernity in 20th century Chinese poetry, (Proceedings of the Workshop on Computational Linguistics for Literature, Atlanta, GA, USA), June 2013, pp. 17-22.
45 B. Yu, An evaluation of text classification methods for literary study, Literary Linguist. Comput. 23 (2008), 327-343. https://doi.org/10.1093/llc/fqn015   DOI