Browse > Article
http://dx.doi.org/10.3745/JIPS.02.0041

A Hybrid Approach for the Morpho-Lexical Disambiguation of Arabic  

Bousmaha, Kheira Zineb (Dept. of Computer Science, LRIIR, University of Oran1)
Rahmouni, Mustapha Kamel (Dept. of Computer Science, LRIIR, University of Oran1)
Kouninef, Belkacem (National Institute of Telecommunication and Information and Communication Technology of Oran (INTTIC))
Hadrich, Lamia Belguith (Dept. of Computer Science at Faculty of Economics and Management of Sfax (FSEGS), University of Sfax)
Publication Information
Journal of Information Processing Systems / v.12, no.3, 2016 , pp. 358-380 More about this Journal
Abstract
In order to considerably reduce the ambiguity rate, we propose in this article a disambiguation approach that is based on the selection of the right diacritics at different analysis levels. This hybrid approach combines a linguistic approach with a multi-criteria decision one and could be considered as an alternative choice to solve the morpho-lexical ambiguity problem regardless of the diacritics rate of the processed text. As to its evaluation, we tried the disambiguation on the online Alkhalil morphological analyzer (the proposed approach can be used on any morphological analyzer of the Arabic language) and obtained encouraging results with an F-measure of more than 80%.
Keywords
Alkhalil Morphological Analyzer; Approach to Multi-Criteria Decision (MCA); Arabic Language Processing (ALP); Augmented Transition Networks (ATNs); Contextual Exploration; Tagging; Diacritization; Disambiguation Method; Segmentation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 N. Habash and O. Rambow, "Arabic diacritization through full morphological tagging," in Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, Rochester, NY, pp. 53-56.
2 E. Souissi, "Etiquetage grammatical de l'arabe voyelle ou non," Ph.D. dissertation, Universite de Paris VII, 1997.
3 F. Debili, H. Achour, and E. Souissi, "La langue arabe et l'ordinateur de l'etiquetage gramatical a la voyellation automatique," Correspondances: bulletin de l'IRMC, vol. 2002, no. 71, pp. 10-26, 2002.
4 R. Shah, P. S. Dhillon, M. Liberman, D. Foster, M. Maamouri, and L. Ungar, "A new approach to lexical disambiguation of Arabic text," in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP'10), MIT Stata Center, MA, 2010, pp. 725-735.
5 A. M. Azmi and R. S. Almajed, "A survey of automatic Arabic diacritization techniques," Natural Language Engineering, vol. 21, no. 3, pp. 477-495, 2015.   DOI
6 A. A. Alzand and I. Rosziati, "Diacritics of Arabic natural language processing (ANLP) and its quality assessment," in Proceedings of the 2015 International Conference on Industrial Engineering and Operations Management (IEOM2015), Dubai, United Arab Emirates (UAE), 2015.
7 A. Alsaad and M. Abbod, "Arabic text root extraction via morphological analysis and linguistic constraints," in Proceedings of 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation (UKSim), Cambridge, UK, 2014, pp. 125-130.
8 A. Al-Arfaj and A. Al-Salman, "Arabic NLP tools for ontology construction from Arabic text: an overview," in Proceedings of 2015 International Conference on Electrical and Information Technologies (ICEIT), Marrakech, Moroco, 2015, pp. 246-251.
9 L. Audibert, "Desambiguisation lexicale automatique: selection automatique d'indices," in Proceedings of Traitement Automatique des Langues Naturelles (TALN-2007), Toulouse, France, 2007, pp. 13-22.
10 A. Tchechmedjiev, "Etat de l'art: mesures de similarite semantique locales et algorithmes globaux pour la desambiguisation lexicale a base de connaissances," in Proceedings of Actes de la conference conjointe JEP-TALN-RECITAL 2012, volume 3: RECITAL, Grenoble, France, 2012, pp. 295-308.
11 M. Rakho, G. Pitel, and C. Mouton, "Desambiguisation automatique a partir d'espaces vectoriels multiples cluterises," Universite Paris 7 - Diderot, Rapport Intermediaire, 2008.
12 M. Diab, K. Hacioglu, and D. Jurafsky, "Automatic tagging of Arabic text: from raw text to base phrase chunks," in Proceedings of HLT-NAACL 2004: Short Papers, Boston, MA, 2004, pp. 149-152.
13 R. Navigli, "Word sense disambiguation: a survey," ACM Computing Surveys, vol. 41, no. 2, article no. 10, 2009.
14 R. A. Haertel, P. McClanahan, and E. K. Ringger, "Automatic diacritization for low-resource languages using a hybrid word and consonant CMM," in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2010, pp. 519-527.
15 G. A. Abandah, A. Graves, B. Al-Shagoor, A. Arabiyat, F. Jamour, and M. Al-Taee, "Automatic diacritization of Arabic text using recurrent neural networks," International Journal on Document Analysis and Recognition (IJDAR), vol. 18, no. 2, pp. 183-197, 2015.   DOI
16 M. Diab, M. Ghoneim, and N. Habash, "Arabic diacritization in the context of statistical machine translation," in Proceedings of Machine Translation Summit XI (MT-Summit), Copenhagen, Denmark, 2007.
17 M. A. Rashwan, M. A. Al-Badrashiny, M. Attia, S. M. Abdou, and A. Rafea, "A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 166-175, 2001.   DOI
18 T. Buckwalter, Buckwalter Arabic Morphological Analyzer Version 2.0. Philadelphia, PA: Linguistic Data Consortium, 2004.
19 A. Stolcke, "SRILM: an extensible language modeling toolkit," in Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP), Denver, CO, 2002, pp. 1-4.
20 R. Roth, O. Rambow, N. Habash, M. Diab, and C. Rudin, "Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking," in Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, Columbus, OH, 2008, pp. 117-120.
21 A. Said, M. El-Sharqwi, A. Chalabi, and E. Kamal, "A hybrid approach for Arabic diacritization," in Natural Language Processing and Information Systems. Heidelberg: Springer, 2013, pp. 53-64.
22 M. Alghamdi, Z. Muzaffar, and H. Alhakami, "Automatic restoration of Arabic diacritics: a simple, purely statistical approach," Arabian Journal for Science and Engineering, vol. 35, no. 2, pp. 125-135, 2010.
23 Y. Hifny, "Smoothing techniques for Arabic diacritics restoration," in Proceedings of 12th Conference on Language Engineering (ESOLEC'12), Cairo, Egypt, 2012, pp. 6-12.
24 A. Scharlig, Decider sur plusieurs criteres: panorama de l'aide a la decision multicritere. Lausanne: Presses polytechniques et universitaires romandes, 1985.
25 B. Roy and D. Bouyssou, Aide multicritere a la decision: methodes et cas. Paris: Economica, 1993.
26 Alkhalil Morpho Sys version 1.3, 2011; http://sourceforge.net/projects/alkhalil/.
27 L. Belguith, L. Baccour, and G. Mourad, "Segmentation de textes arabes basee sur l'analyse contextuelle des signes de ponctuations et de certaines particules," in Actes de la 12eme Conference annuelle sur le Traitement Automatique des Langues Naturelles, Dourdan, France, 2005, pp. 451-456.
28 A. O. Bahanshal and H. S. Al-Khalifa, "A first approach to the evaluation of Arabic diacritization systems," in Proceedings of 2012 Seventh International Conference on Digital Information Management (ICDIM), Macau, 2012, pp. 155-158.
29 M. El-Beze, B. Merialdo, B. Rozeron, and A. M. Derouault, "Accentuation automatique de textes par des methodes probabilistes," Technique et Science Informatiques, vol. 13, no. 6, pp. 797-815, 1994.
30 M. Maamouri, A. Bies, and S. Kulick, "Diacritization: a challenge to Arabic treebank annotation and parsing," in Proceedings of the British Computer Society Arabic NLP/MT Conference, London, 2006.
31 Y. A. Gal, "An HMM approach to vowel restoration in Arabic and Hebrew," in Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages, Philadelphia, PA, 2002, pp. 1-7.
32 R. Nelken and S. M. Shieber, "Arabic diacritization using weighted finite-state transducers," in Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Ann Arbor, MI, 2005, pp. 79-86.
33 K. Shaalan, "Rule-based approach in Arabic natural language processing," International Journal on Information and Communication Technologies (IJICT), vol. 3, no. 3, pp. 11-19, 2010.
34 I. Zitouni and R. Sarikaya, "Arabic diacritic restoration approach based on maximum entropy models," Computer Speech & Language, vol. 23, no. 3, pp. 257-276, 2009.   DOI
35 M. Maamouri, A. Bies, T. Buckwalter, and W. Mekki, "The Penn Arabic treebank: building a large-scale annotated Arabic corpus," in Proceedings of NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt, 2004, pp. 102-109.
36 M. A. Attia, "Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation," Ph.D. dissertation, University of Manchester, UK, 2008.
37 M. Yassen, K..Choukri, N. Paulsson., S. Haamid. and all "Building Annotated Written and Spoken Arabic LRs in NEMLAR Project," in Proceedings of International Conference on Language Resources and Evaluation (LREC), 2006.
38 A. Haddad, H. B. Ghezala, and M. Ghnima, "Conception d'un categoriseur morphologique fonde sur le principe d'Eric Brill dans un contexte multi-agents," in Proceedings of 26th Conference on Lexis and Grammar, Bonifacio, France, 2007, pp. 1-8.
39 K. Belkacem and S. Abderrahmane, "Using augmented transition network for morphological processing of Arabic," International Journal of Computer Applications, vol. 25, no. 10, pp. 22-27, 2011.   DOI
40 W. A. Woods, "Transition network grammars for natural language analysis," Communications of the ACM, vol. 13, no. 10, pp. 591-606, 1970.   DOI
41 K. R. Beesley, "Finite-state morphological analysis and generation of Arabic at Xerox Research: status and plans in 2001," in Proceedings of ACL Workshop on ARABIC Language Processing: Status and Perspective, Toulouse, France, 2001, pp. 1-8.
42 N. Habash, Introduction to Arabic Natural Language Processing. San Rafael, CA: Morgan & Claypool Publishers, 2010.
43 P. Vincke, L'aide multicritere a la decision. Bruxelles: Editions de l'universite de Bruxelles, 1989.
44 C. L. Hwang and K. Yoon, Multiple Attribute Decision Making: Methods and Applications: A State-ofthe-Art Survey. Berlin: Springer, 1981.
45 A. Farghaly and K. Shaalan, "Arabic natural language processing: Challenges and solutions," ACM Transactions on Asian Language Information Processing, vol. 8, no. 4, article no. 14, 2009.
46 L. Belguith and A. Ben Hamadou, "Traitement des erreurs d'accord: Une analyse syntagmatique pour la detection et une analyse multicritere pour la correction," Revue d'intelligence artificielle, vol. 18, no. 5-6, pp. 679-707, 2004.   DOI
47 M. Sawalha and E. Atwell, "Adapting language grammar rules for building morphological analyzer for Arabic language," in Proceedings of the Workshop of Morphological Analyzer Experts for Arabic Language, Damascus, Syria, 2009.
48 R. Ouersighni, "La conception et la realisation d'un systeme d'analyse morpho-syntaxique robuste pour l'arabe: utilisation pour la detection et le diagnostic des fautes d'accord," Ph.D. dissertation, Universite Lumiere Lyon 2, 2002.
49 K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, "Feature-rich part-of-speech tagging with a cyclic dependency network," in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, 2003, pp. 173-180.
50 L. Belguith and N. Chaaben, "Analyse et desambiguisation morphologiques de textes arabes non voyelles," in Actes de la 13eme confrence sur le Traitement Automatique des Langues Naturelles, Leuven, Belgium, 2006, pp. 493-501.
51 S. Khoja, "APT: Arabic part-of-speech tagger," in Proceedings of the Student Workshop at North American Chapter of the Association for Computational Linguistics (NAACL2001), Pittsburg, PA, 2001, pp. 20-25.
52 J. Gimenez and L. Marquez, "SVMTool: a general POS tagger generator based on support vector machines," in Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, 2004.
53 M. Diab, "Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking," in Proceedings of 2nd International Conference on Arabic Language Resources and Tools, Cairo, Egypt, 2009, pp. 285-288.
54 N. Habash, O. Rambow, and R. Roth, "MADA+ TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization," in Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, 2009, pp. 102-109.