Acknowledgement
This research was supported by the National Research Foundation of Korea (No. NRF-2020RIA2B5B01002134) and the BK21 FOUR (Fostering Outstanding Universities for Research; No. 5199990914048).
References
- C. Ding, M. Utiyama, and E. Sumita, "NOVA: a feasible and flexible annotation system for joint tokenization and part-of-speech tagging," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 18, no. 2, article no. 17, 2019. https://doi.org/10.1145/3276773
- R. Buoy, S. Kor, and N. Taing, "An end-to-end Khmer optical character recognition using sequence-to-sequence with attention," 2021 [Online]. Available: https://arxiv.org/abs/2106.10875.
- X. Yan, X. Xiong, X. Cheng, Y. Huang, H. Zhu, and F. Hu, "HMM-BiMM: hidden Markov model-based word segmentation via improved bi-directional maximal matching algorithm," Computers & Electrical Engineering, vol. 94, article no. 107354, 2021. https://doi.org/10.1016/j.compeleceng.2021.107354
- M. Sassano, "Deterministic word segmentation using maximum matching with fully lexicalized rules," in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Volume 2: Short Papers, Gothenburg, Sweden, 2014, pp. 79-83.
- C. Ding, Y. K. Thu, M. Utiyama, and E. Sumita, "Word segmentation for Burmese (Myanmar)," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 15, no. 4, article no. 22, 2016. https://doi.org/10.1145/2846095
- J. M. Nobel, S. Puts, J. Weiss, H. J. Aerts, R. H. Mak, S. G. Robben, and A. L. Dekker, "T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting," Insights into Imaging, vol. 12, article no. 77, 2021. https://doi.org/10.1186/s13244-021-01018-1
- S. Liang, K. Stockinger, T. M. de Farias, M. Anisimova, and M. Gil, "Querying knowledge graphs in natural language," Journal of Big Data, vol. 8, article no. 3, 2021. https://doi.org/10.1186/s40537-020-00383-w
- D. Cao, X. Ren, M. Zhu, and W. Song, "Visual question answering research on multi-layer attention mechanism based on image target features," Human-centric Computing and Information Sciences, vol. 11, article no. 11, 2021. https://doi.org/10.22967/HCIS.2021.11.011
- M. Kuzma and A. Moscicka, "Evaluation of metadata describing topographic maps in a National Library," Heritage Science, vol. 8, article no. 113, 2020. https://doi.org/10.1186/s40494-020-00455-3
- H. Christian, D. Suhartono, A. Chowanda, and K. Z. Zamli, "Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging," Journal of Big Data, vol. 8, article no. 68, 2021. https://doi.org/10.1186/s40537-021-00459-1
- H. Kamper, A. Jansen, and S. Goldwater, "Unsupervised word segmentation and lexicon discovery using acoustic word embeddings," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 669-679, 2016. https://doi.org/10.1109/TASLP.2016.2517567
- C. Shorten, T. M. Khoshgoftaar, and B. Furht, "Text data augmentation for deep learning," Journal of Big Data, vol. 8, article no. 101, 2021. https://doi.org/10.1186/s40537-021-00492-0
- R. Buoy, N. Taing, and S. Kor, "Joint Khmer word segmentation and part-of-speech tagging using deep learning," 2021 [Online]. Available: https://arxiv.org/abs/2103.16801.
- K. M. Park, H. C. Cho, and H. C. Rim, "Utilizing various natural language processing techniques for biomedical interaction extraction," Journal of Information Processing Systems, vol. 7, no. 3, pp. 459-472, 2011. https://doi.org/10.3745/JIPS.2011.7.3.459
- K. Batsuren, E. Batbaatar, T. Munkhdalai, M. Li, O. E. Namsrai, and K. H. Ryu, "A dependency graph-based keyphrase extraction method using anti-patterns," Journal of Information Processing Systems, vol. 14, no. 5, pp. 1254-1271, 2018. https://doi.org/10.3745/JIPS.04.0091
- V. Chea, Y. K. Thu, C. Ding, M. Utiyama, A. Finch, and E. Sumita, "Khmer word segmentation using conditional random fields," in Proceedings of the 2nd Annual Conference on Khmer Natural Language Processing (KNLP), Phnom Penh, Cambodia, 2015, pp. 62-69.
- D. Li, J. Wang, M. Chen, Z. Zhang, and Z. Li, "Base-band involved integrative modeling for studying the transmission characteristics of wireless link in railway environment," EURASIP Journal on Wireless Communications and Networking, vol. 2015, article no. 81, 2015. https://doi.org/10.1186/s13638-015-0316-3
- F. N. A. Al Omran and C. Treude, "Choosing an NLP library for analyzing software documentation: a systematic literature review and a series of experiments," in Proceedings of 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), Buenos Aires, Argentina, 2017, pp. 187-197.
- S. Knight, NLP at Work: The Difference that Makes the Difference, 4th ed. London, UK: Nicholas Brealey Publishing, 2020.
- N. Bi and N. Taing, "Khmer word segmentation based on bi-directional maximal matching for plaintext and Microsoft Word document," in Proceedings of 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Siem Reap, Cambodia, 2014, pp. 1-9.
- S. Kundu and G. Sarker, "A multi-level integrator with programming based boosting for person authentication using different biometrics," Journal of Information Processing Systems, vol. 14, no. 5, pp. 1114-1135, 2018. https://doi.org/10.3745/JIPS.02.0094
- P. Hok, "Khmer Spell Checker," M.S. thesis, Australian National University, Canberra, Australia, 2005.
- S. Chea, M. Soeurn, S. Kor, and S. Srun, "Khmer word segmentation with Maximum Matching," in Proceedings of the 10th International Conference on Internet (ICONI), Phnom Penh, Cambodia, 2018.