과제정보
We sincerely thank Mr Zahoor Ahmad Shora, who is the Chief Editor of "Daily Roshni," for his generous contribution of freely sharing raw data for our collection. We are also thankful to the students and scholars of the Department of Urdu and Linguistics, Aligarh Muslim University, Aligarh, who helped generate topics and evaluate relevance for our text collection.
참고문헌
- A. Hardie, Developing a tag-set for automated part-of-speech tagging in Urdu in Proc. Corpus Linguistics (Lancaster, UK), Mar. 2003.
- P. Baker et al., Corpus data for south asian language processing, in Proc. Workshop South Asian Lang. Process. (EACL), (Budapest, Hungary), Apr. 2003, pp. 1-8.
- K. Riaz, Baseline for Urdu IR evaluation, in Proc. ACM workshop Improving non english web searching (Napa Valley, CA, USA), Oct. 2008, pp. 97-100.
- A. Daud, W. Khan, and D. Che, Urdu language processing: A survey, Artif. Intell. Rev. 47 (2017), 279-311. https://doi.org/10.1007/s10462-016-9482-x
- M. Sharjeel, R. M. A. Nawab, and P. Rayson, Counter: Corpus of urdu news text reuse, Lang. Res. Eval. 51 (2017), 777-803. https://doi.org/10.1007/s10579-016-9367-2
- M. Humayoun, H. Hammarstrom, and A. Ranta, Urdu morphology, orthography and lexicon extraction, M.S. thesis, Department of Computer Science and Engineering, Chalmers tekniska hogskola, Goteborg, Sweden, 2006.
- V. Gupta, N. Joshi, and I. Mathur, Design & development of rule based inflectional and derivational Urdu stemmer, in Proc. Int. Conf, Futuristic Trends Comput. Anal. Knowl. Manag. (ABLAZE), (Greater Noida, India), Feb. 2015, pp. 7-12.
- I. Rasheed, H. Banka, and H. M. Khan, Pseudo-relevance feedback based query expansion using boosting algorithm, Artif. Intell. Rev. (2021), https://doi.org/10.1007/s10462-021-09972-4
- D. Becker and K. Riaz, A study in Urdu corpus construction, in Proc. Workshop Asian Lang. Resour. Int. Stand. vol. 12, (Stroudsburg, PA, USA), Aug. 2002, pp. 1-5.
- K. Riaz, Concept search in Urdu, in Proc. PhD workshop Inf. Knowl. Manag. (Napa Valley, CA, USA), Oct. 2008, pp. 33-40.
- S. Urooj et al., Cle Urdu digest corpus, in Proc. Conf. Lang. Technol. (SNLP), (Lahore, Pakistan), (2012), pp. 47-53.
- F. Baseer, A. Habib, and J. Ashraf, Romanized Urdu corpus development (rucd) model: Edit-distance based most frequent unique unigram extraction approach using real-time interactive dataset, in Proc. Int. Conf. Innov. Comput. Technol. (INTECH), (Dublin, Ireland), Aug. 2016, pp. 513-518.
- S. A. Ali et al., Salience analysis of news corpus using heuristic approach in Urdu language, Int. J. Comput. Sci. Netw. Secur. (IJCSNS), 16 (2016), no. 4, 28-36.
- Q. Abbas, Building a hierarchical annotated corpus of Urdu: The Urdu. kon-tb treebank, in International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Berlin, Germany, 2012, pp. 66-79.
- M. Ijaz and S. Hussain, Corpus based Urdu lexicon development, in Proc. Conf. Lang. Technol. (CLT07), vol. 73, (Peshawar, Pakistan), Aug. 2007.
- I. Hanif et al., Cross-language Urduenglish (clue) text alignment corpus, in Proc. Working notes CLEF (Toulouse, France), Sept. 2015.
- R. Rahimi, A. Shakery, and I. King, Extracting translations from comparable corpora for cross-language information retrieval using the language modeling framework, Inf. Process. Manage, 52 (2016), no. 2, 299 -318. https://doi.org/10.1016/j.ipm.2015.08.001
- M. Karthikeyan and P. Aruna, Probability based document clustering and image clustering using content-based image retrieval, Appl. Soft Comp. 13 (2013), no. 2, 959 -966. https://doi.org/10.1016/j.asoc.2012.09.013
- Z. Ahmad et al., Urdu nastaleeq optical character recognition, World Acad. Sci., Eng. Technol. 26 (2007), pp. 249-252.
- M. Humayoun et al., Urdu summary corpus, in Proc. Int. Conf. Lang. Resour. Eval. (Reykjavik, Iceland), May 2014, pp. 796-800, https://github.com/humsh a/USCorpus
- Q. A. Akram, A. Naseer, and S. Hussain, Assasband, an affix-exception-list based Urdu stemmer, in Proc. Workshop Asian Lang. Resour. (Suntec, Singapore), Aug. 2009, pp. 40-47,
- I. Rasheed and H. Banka, Query expansion in information retrieval for Urdu language, in Proc. Int. Conf. Inf. Retr. Knowl. Manag. (CAMP), (Kota Kinabalu, Malaysia), Mar. 2018, pp. 171-176.
- I. Rasheed et al., Urdu text classification: A comparative study using machine learning techniques, in Proc. Int. Conf. Digit. Inf. Manag. (ICDIM) (Berlin, Germany), Sept. 2018, pp. 274-278.
- K. Batri, S. Lakshmi, and B. Sathiyabhama, Trade-off between the number of index-terms and the information retrieval system's performance, Kuwait J. Sci. 44 (2017), no. 4, 49-56.
- N. Craswell et al., Overview of the trec-2003 web track, in Proc. Text Retr. Conf. (TREC), vol. 3, (Gaithersburg, MD, USA), 2002.
- A. AleAhmad et al., Hamshahri: A standard persian text collection, Knowl. Based Syst. 22 (2009), no. 5, 382 -387. https://doi.org/10.1016/j.knosys.2009.05.002
- A. Kanapala and S. Pal, Test collection for legal ir from online discussion forums, in Proc. Forum Inf. Retr. Eval. (Bangalore, India), Dec. 2014, pp. 126-129.
- J. M. Ponte and W. B. Croft, A language modeling approach to information retrieval, in Proc Int. ACM SIGIR Conf. Res. Dev. Inf Retr. (Melbourne, Australia), Aug. 1998, pp. 275-281.
- I. Ounis et al., Terrier information retrieval platform, in Advances in Information Retrieval, vol. 3408, Springer, Berlin, Germany, 2005, pp. 517-519.
- E. M. Voorhees, Overview of trec 2003, in Proc. Text Retr. Conf. (TREC), (Gaithersburg, MD, USA), Nov. 2003, pp. 1-13, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=150467
- L. Cohen, L. Manion, and K. Morrison, The ethics of educational and social research, in Research Methods in Education, 8 th ed., Routledge, London, UK, 2013, https://doi.org/10.4324/9780203720967
- S. E. Robertson et al., Okapi at trec-4, in Proc. Text REtrieval Conf. (London, UK), Oct. 1996, pp. 73-96, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.3342
- G. Amati and C. J. Van Rijsbergen, Probabilistic models of information retrieval based on measuring the divergence from randomness, ACM Trans. Inf. Syst. (TOIS), 20 (2002), no. 4, 357-389. https://doi.org/10.1145/582415.582416
- G. Salton, A. Wong, and C. S. Yang, A vector space model for automatic indexing, Commun. ACM 18 (1975), no. 11, 613-620. https://doi.org/10.1145/361219.361220
- C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, vol. 999, MIT Press, Cambridge, MA, USA, 1999, https://nlp.stanf ord.edu/fsnlp/.
- P. Clough and M. Sanderson, Evaluating the performance of information retrieval systems using test collections, Inf. Res, 18 (2013), no. 2.
- W. B. Croft, D. Metzler, and T. Strohmann, Search Engines: Information Retrieval in Practice, Pearson Education, Boston, MA, USA, 2010.
- A. K. McCallum, Mallet: A machine learning for language toolkit, 2002, http://mallet.cs.umass.edu/.
- E. Frank et al., Weka-a machine learning workbench for data mining, in Data Mining and Knowledge Discovery Handbook, Springer, Boston, MA, USA, 2009, pp. 1269-1277.
- T. Zia, M. P. Akhter, and Q. Abbas, Comparative study of feature selection approaches for Urdu text categorization, Malaysian J. Comput. Sci, 28 (2015), no. 2, 93-109.
- I. Haneef et al., Design and development of a large cross-lingual plagiarism corpus for urdu-english language pair, Sci. Program. 2019 (2019), 1-11.
- N. Khan, M. P. Bakht, and R. A. Wagan, Corpus construction and structure study of Urdu language using empirical laws, in Proc. Int. Conf. Data Sci. (Karachi, Pakistan), Feb. 2019, pp. 9-14.
- S. Hussain, Resources for Urdu language processing, in Proc. Workshop Asian Lang. Resour. IJCNLP, (Hyderabad, India), Jan. 2008, pp. 99-100, https://www.aclweb.org/anthology/I08-7017.pdf