Construction of Text Summarization Corpus in Economics Domain and Baseline Models

Sawittree Jumpathong;Akkharawoot Takhom;Prachya Boonkwan;Vipas Sutantayawalee;Peerachet Porkaew;Sitthaa Phaholphinyo;Charun Phrombut;Khemarath Choke-mangmi;Saran Yamasathien;Nattachai Tretasayuth;Kasidis Kanwatchara;Atiwat Aiemleuk;Thepchai Supnithi;

doi:10.56977/jicce.2024.22.1.33

Journal of information and communication convergence engineering

Volume 22 Issue 1
/
Pages.33-43
/
2024
/
2234-8255(pISSN)
/
2234-8883(eISSN)

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

DOI QR Code

Construction of Text Summarization Corpus in Economics Domain and Baseline Models

Sawittree Jumpathong (Language and Semantic Technology Laboratory, National Electronics and Computer Technology Center) ;
Akkharawoot Takhom (Faculty of Engineering, Thammasat School of Engineering, Thammasat University) ;
Prachya Boonkwan (Language and Semantic Technology Laboratory, National Electronics and Computer Technology Center) ;
Vipas Sutantayawalee (Promes Co., Ltd., Backyard Group) ;
Peerachet Porkaew (Language and Semantic Technology Laboratory, National Electronics and Computer Technology Center) ;
Sitthaa Phaholphinyo (Language and Semantic Technology Laboratory, National Electronics and Computer Technology Center) ;
Charun Phrombut (Language and Semantic Technology Laboratory, National Electronics and Computer Technology Center) ;
Khemarath Choke-mangmi (PTT Digital Solutions Company Limited) ;
Saran Yamasathien (PTT Digital Solutions Company Limited) ;
Nattachai Tretasayuth (PTT Digital Solutions Company Limited) ;
Kasidis Kanwatchara (PTT Digital Solutions Company Limited) ;
Atiwat Aiemleuk (PTT Digital Solutions Company Limited) ;
Thepchai Supnithi (Language and Semantic Technology Laboratory, National Electronics and Computer Technology Center)

Received : 2023.04.01
Accepted : 2023.11.04
Published : 2024.03.31

https://doi.org/10.56977/jicce.2024.22.1.33 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Automated text summarization (ATS) systems rely on language resources as datasets. However, creating these datasets is a complex and labor-intensive task requiring linguists to extensively annotate the data. Consequently, certain public datasets for ATS, particularly in languages such as Thai, are not as readily available as those for the more popular languages. The primary objective of the ATS approach is to condense large volumes of text into shorter summaries, thereby reducing the time required to extract information from extensive textual data. Owing to the challenges involved in preparing language resources, publicly accessible datasets for Thai ATS are relatively scarce compared to those for widely used languages. The goal is to produce concise summaries and accelerate the information extraction process using vast amounts of textual input. This study introduced ThEconSum, an ATS architecture specifically designed for Thai language, using economy-related data. An evaluation of this research revealed the significant remaining tasks and limitations of the Thai language.

Keywords

Acknowledgement

This study was conducted with collaborative support from Promes Co., Ltd. (Backyard Group) as part of the Joint Research Project funded by the National Science and Technology Development Agency (NSTDA), Thailand. Additionally, we extend our gratitude to PTT Digital Solutions Co. Ltd., Thailand for generously providing public datasets for Thai ATS. The Program Management Unit for National Competitiveness Enhancement, under the Office of the National Higher Education Science Research and Innovation Policy Council in Thailand, provided financial support for data collection and construction.

References

S. Deo and D. Banik, "Text summarization using textrank and lexrank through latent semantic analysis," in Proceeding of the International Conference on Information Technology (OCIT) 2022, Odisha, India, pp. 113-118, 2022. DOI: 10.1109/OCIT56763.2022.00031.
K. Kaikhah, "Automatic text summarization with neural networks," in Proceeding of the 2nd International IEEE Conference on 'Intelligent Systems', Varna, Bulgaria, pp. 40-44, 2004. DOI: 10.1109/IS.2004.1344614.
Z. Yang, Y. Dong, J. Deng, B. Sha, and T. Xu, "Research on automatic news text summarization technology based on GPT2 model," in Proceeding of the 3rd International Conference on Artificial Intelligence and Advanced Manufacture, Manchester, United Kingdom, pp. 418-423, 2021. DOI: 10.1145/3495018.3495091.
N. Chumpolsathien, "Using knowledge distillation from keyword extraction to improve the informativeness of neural cross-lingual summarization," Masters thesis, Beijing Institute of Technology, 2020.
J. Zhu, Q. Wang, Y. Wang, Y. Zhou, J. Zhang, S. Wang, and C. Zong, "NCLS: Neural cross-lingual summarization," arXiv preprint arXiv:1909.00156, Aug. 2019. DOI: 10.48550/arXiv.1909.00156.
T. Hasan et al., "XL-sum: Large-scale multilingual abstractive summarization for 44 languages," arXiv preprint arXiv:2106.13822, Jun. 2021. DOI: 10.48550/arXiv.2106.13822.
N. Ketui, T. Theeramunkong, and C. Onsuwan, "An EDU-based approach for Thai multi-document summarization and its application," ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 14, no. 1, pp. 1-26, Jan. 2015. DOI: 10.1145/2641567.
S. Jumpathong, T. Theeramunkong, T. Supnithi, and M. Okumura, "A performance analysis of deep-learning-based thai news abstractive summarization: Word positions and document length," in Proceeding of the 7th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand, pp. 279-284, 2022. DOI: 10.1109/ICBIR54589.2022.9786413.
W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, "Automatic text summarization: A comprehensive survey," Expert systems with applications, vol. 165, pp. 113679, Mar. 2021. DOI: 10.1016/j.eswa.2020.113679.
P. Wang, B. Xu, J. Xu, G. Tian, C. L. Liu, and H. Hao, in "Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification," Neurocomputing, vol. 174, pp. 806-814, 2016. DOI: 10.1016/j.neucom.2015.09.096.
M. Joshi, H. Wang, and S. McClean, "Dense semantic graph and its application in single document summarisation," Emerging Ideas on Information Filtering and Retrieval, Springer, pp. 55-67, Oct. 2017. DOI: 10.1007/978-3-319-68392-8_4.
R. Z. Al-Abdallah and A. T. Al-Taani, "Arabic single-document text summarization using particle swarm optimization algorithm," Procedia Computer Science, vol. 117, pp. 30-37, 2017. DOI: 10.1016/j.procs.2017.10.091.
K. Krishnakumari and E. Sivasankar, "Scalable aspect-based summarization in the hadoop environment," Big Data Analytics, Springer, pp. 439-449, Oct. 2017. DOI: 10.1007/978-981-10-6620-7_42.
P. Nuanplord and M. Sodanil, "Health news summarization using semantic ontology," in Proceeding of the 3rd International Conference on Next Generation Computing, Chiang Mai, Thailand, 2017.
C. Yongkiatpanich and D. Wichadakul, "Extractive text summarization using ontology and graph-based method," in Proceeding of the 4th International Conference on Computer and Communication Systems (ICCCS), pp. 105-110, 2019. DOI: 10.1109/CCOMS.2019.8821755.
O. Chaowalit and O. Sornil, "Abstractive thai opinion summarization," Advanced Materials Research, vol. 971-973, pp. 2273-2280, 2014. DOI: 10.4028/www.scientific.net/AMR.971-973.2273.
L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, "mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer," in Proceeding of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 483-498, Jun. 2021. DOI: 10.18653/v1/2021.naacl-main.41.
"Economic news dataset." Thailand's national electronics and computer technology center, Thailand, [Internet], Available: https://aiforthai.in.th/.
I. Loshchilov and F. Hutter, "Fixing weight decay regularization in adam," in 6th International Conference on Learning Representations, pp. 1-14, 2018. [Internet], Available: https://openreview.net/pdf?id=rk6qdGgCZ.
C. Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Proceedings of the workshop on text summarization branches out (WAS 2004), no. 1, pp. 25-26, 2004. [Internet], Available: papers2://publication/uuid/5DDA0BB8-E59F-44C1-88E6-2AD316DAEF85.
S. Jumpathong, A. Takhom, P. Boonkwan, V. Sutantayawalee, P. Porkaew, S. Phaholphinyo, C. Phrombut, T. Supnithi, K. Choke-Mangmi, S. Yamasathien, N. Tretasayuth, K. Kanwatchara, and A. Aiemleuk, "ThEconSum: an Economics-domained Dataset for Thai Text Summarization and Baseline Models," in 2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Chiang Mai, Thailand, pp. 1-6, 2022. DOI: 10.1109/iSAI-NLP56921.2022.9960271.

Journal of information and communication convergence engineering

Construction of Text Summarization Corpus in Economics Domain and Baseline Models

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)