Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar;

doi:10.4218/etrij.2019-0396

ETRI Journal

Volume 44 Issue 3
/
Pages.413-425
/
2022
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

Prakash, Amit (Department of Computer Science and Engineering, Birla Institute of Technology) ;
Singh, Niraj Kumar (Department of Computer Science and Engineering, Birla Institute of Technology) ;
Saha, Sujan Kumar (Department of Computer Science and Engineering, Birla Institute of Technology)

Received : 2019.08.20
Accepted : 2022.03.15
Published : 2022.06.10

https://doi.org/10.4218/etrij.2019-0396 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

Keywords

Acknowledgement

The authors would like to thank the associated editor and anonymous reviewers for their valuable comments and suggestions to improve the quality of this paper.

References

F. E. Gould, Creative expression through poetry, Elem. Engl. 26 (1949), 391-393.
S. Wang and C. D. Manning, Baselines and bigrams: Simple, good sentiment and topic classification, (Proceedings of the 50th annual meeting of the Association for Computational Linguistics: Association for Computational Linguistics, Jeju, Rep. of Korea), July 2012, pp. 90-94.
E. Gabrilovich and S. Markovitch, Feature generation for text categorization using world knowledge, (Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland), July 2005, pp. 1048-1053.
R. Shukla, Hindi Sahitya ka Itihas Prabhat Prakashan, 1st ed. 10 April 2016.
L. Mohan, Encyclopedia of Indian literature, Sahitya Akad., 1992.
C. O. Hartman and Free, Verse: An essay on Prosody, Northwestern University Press, 1980.
P. Hobsbaum, Metre, rhythm and verse form Routledge, Routledge, 1996.
M. Williams, Rasa, Sanskrit English dictionary with etymology, Motilal Banarsidass (Originally Published: Oxford), 1899.
P. J. Chaudhury, The theory of Rasa, J. Aesthet. Art Critic. 11 (1952), no. 2, 147-150, Special Issue on Oriental Art and Aesthetics. https://doi.org/10.2307/426040
W. Dace, The concept of "Rasa" in Sanskrit dramatic theory, Educ. Theatre J. 15 (1963), no. 3, 249-2554. https://doi.org/10.2307/3204783
S. L. Schwartz, Rasa: "Performing the divine in India", Columbia University Press, 2004, 12-15.
N. Lidova, Natyashastra, Oxford University Press, (2014). https://doi.org/10.1093/obo/9780195399318-0071
V. P. Dhananjayan and B. R. Rhythms, Dhananjayan on Indian classical dance, 3rd revised ed., BR Rhythms, 2004.
H. R. Tizhoosh, F. Sahba, and R. Dara, Poetic features for poem recognition: A comparative study, Pattern Recognit. Res. 3 (2008), 24-39.
A. Almuhareb, I. Alkharashi, L. A. L Saud, and H Altuwaijri, Recognition of classical arabic poems, (Proceedings of the Second Workshop on Computational Linguistics for Literature, Atlanta, GA, USA), June 2013, pp. 9-16.
N. Rang, Poetry classification using support vector machines, J. Comput. Sci. 8 (2012), no. 9, 1441-1446. https://doi.org/10.3844/jcssp.2012.1441.1446
A. Almuhareb, W. A. Almutairi, H. Altuwaijri, A. Almubarak, and M. Khan, Recognition of modern Arabic poems, J. Softw. 10 (2015), 454-464. https://doi.org/10.17706/jsw.10.4.454-464
F. Can and J. M. Patton, Change of writing style with time, Comput. Humanit. 38 (2004), 61-82. https://doi.org/10.1023/B:CHUM.0000009225.28847.77
F. Can and J. M. Patton, Change of word characteristics in 20th-century Turkish literature: A statistical analysis, J. Quant. Linguist. 17 (2010), no. 3, 167-190. https://doi.org/10.1080/09296174.2010.485444
J. T. Kao and D. Jurafsky, A computational analysis of style, affect, and imagery in contemporary poetry, (Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature, Montreal, Canada), June 2012, pp. 8-17.
R. Voigt and D. Jurafsky, Tradition and modernity in 20th century Chinese poetry, (Proceedings of the Workshop on Computational Linguistics for Literature, Atlanta, GA, USA), June 2013, pp. 17-22.
M. Lustrek, Overview of automatic genre identification, Jozef Stefan Institute Department of Intelligent Systems, 2006.
D. M. Kaplan and D. M. Blei, A computational approach to style in American poetry, (Seventh IEEE International Conference on Data Mining. Omaha, NE, USA), Oct. 2007, pp. 553-558.
B. Yu, An evaluation of text classification methods for literary study, Literary Linguist. Comput. 23 (2008), 327-343. https://doi.org/10.1093/llc/fqn015
A. Lou, D. Inkpen, and C. Tanasescu, Multilabel subject-based classification of poetry, (Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, Hollywood, FL, USA), May 2015, pp. 187-192.
A. Rahgozar and D. Inkpen, Bilingual chronological classification of Hafezs poems, (Proceedings of the Fifth Workshop on Computational Linguistics for Literature, San Diego, CA, USA), June 2016, pp. 54-62.
J. T. Kao and D. Jurafsky, A computational analysis of poetic style, Literature Lifts up Comput. Linguistics 12 (2015), 1377. https://doi.org/10.33011/lilt.v12i.1377
R. Delmonte, Computing poetry style, (ESSEM@AI* IA, Torino, Italy) Dec. 2013, pp. 148-155.
R. M. Cyotl-Morales, L. Villasenor-Pineda, M. Montes-y-Gomez, and P. Rosso, Authorship attribution using word sequences, (Progress in Pattern Recognition, Image Analysis and Applications, Cancun, Mexico), Nov. 2006, pp. 844-853. https://doi.org/10.1007/11892755_87
S. Das and P. Mitra, Author identification in Bengali literary works, (PReMI 2011: Pattern Recognition and Machine Intelligence, Moscow, Russia), 2011, pp. 220-226. https://doi.org/10.1007/978-3-642-21786-9_37
J. Kaur and J. R. Saini, Automatic Punjabi poetry classification using machine learning algorithms with reduced feature set, Int. J. Artif. Intell. Soft. Comput. 5 (2016), no. 4, 311-319. https://doi.org/10.1504/IJAISC.2016.081353
T. Chakraborty and S. Bandyopadhyay, Identification of reduplication in Bengali corpus and their semantic analysis A rulebased approach, (Proceedings of the Multiword Expressions: From Theory to Applications, Beijing, China), Aug. 2010, pp. 73-76.
S. Phani, L. Shibamouli, and A. Biswas, Authorship attribution in Bengali language, (Proceedings of the 12th International Conference on Natural Language Processing, Trivandrum, India), 2015, pp. 100-105.
G. Rakshit, A. Ghosh, P. Bhattacharyya, and G. Haffari, Automated analysis of Bangla poetry for classification and poet identification, (Proceedings of the 12th International Conference on Natural Language Processing, Trivandrum, India), 2015, pp. 247-253.
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, J. Am. Soc. Inf. Sci. 41 (1990), no. 6, 391-407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, arXiv preprint, ICLR, 2013. https://doi.org/10.48550/arXiv.1301.3781
C. Cortes and V. Vapnik, Support vector networks, Mach. Learn. 20 (1995), 273-297. https://doi.org/10.1007/BF00994018
P. Y. Pawar and S. H. Gawande, A comparative study on different types of approaches to text categorization, Int. J. Mach. Learn. Comput. 2 (2012), no. 4, 423-426. https://doi.org/10.7763/IJMLC.2012.V2.158
F. Colas and P. Brazdil, Comparison of SVM and some older classification algorithms in text classification tasks, In Conference on Artificial Intelligence in Theory and Practice, I. F. I. P. International. (ed.), Springer, Boston, MA, 2006, 169-178.
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, The Weka data mining software, A.C.M. SIGKDD Explor. Newsl. 11 (2009), 10-18. https://doi.org/10.1145/1656274.1656278
D. V. Lindley, Fiducial distributions and Bayes theorem, J. R. Stat. Soc. B. Methodol. 20 (1958), 102-107. https://doi.org/10.1111/j.2517-6161.1958.tb00278.x
J. R. Quinlan, Induction of decision trees, Mach. Learn. 1 (1986), 81-106. https://doi.org/10.1007/BF00116251
L. Breiman, Random forests, Mach. Learn. 45 (2001), no. 1, 5-32. https://doi.org/10.1023/A:1010933404324
T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inform. Theory 13 (1967), 21-27. https://doi.org/10.1109/TIT.1967.1053964
J. H. Friedman, Greedy function approximation: A gradient boosting machine, Ann. Statist. 29 (2001), no. 5, 1189-1232. https://doi.org/10.1214/aos/1013203451

ETRI Journal

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)