Browse > Article
http://dx.doi.org/10.9708/jksci.2022.27.12.041

Rule-based Normalization of Relative Temporal Information  

Jeong, Young-Seob (Department of Computer Engineering, Chungbuk National University)
Lim, Chaegyun (School of Computing, KAIST)
Lee, SeungDong (Department of Computer Engineering, Chungbuk National University)
Mswahili, Medard Edmund (Department of Computer Engineering, Chungbuk National University)
Ndomba, Goodwill Erasmo (Department of Computer Engineering, Chungbuk National University)
Choi, Ho-Jin (School of Computing, KAIST)
Abstract
Documents often contain relative time expressions, and it is important to define a schema of the relative time information and develop a system that extracts such information from corpus. In this study, to deal with the relative time expressions, we propose seven additional attributes of timex3: year, month, day, week, hour, minute, and second. We propose a way to represent normalized values of the relative time expressions such as before, after, and count, and also design a set of rules to extract the relative time information from texts. With a new corpus constructed using the new attributes that consists of dialog, news, and history documents, we observed that our rule-set generally achieved 70% accuracy on the 1,041 documents. Especially, with the most frequently appeared attributes such as year, day, and week, we got higher accuracies compared to other attributes. The results of this study, our proposed timex3 attributes and the rule-set, will be useful in the development of services such as question-answer systems and chatbots.
Keywords
Time information; Timex3; Relative temporal information; Normalization; Information extraction; Rule-based;
Citations & Related Records
연도 인용수 순위
  • Reference
1 KT Giga gini, https://gigagenie.kt.com/ltemain.do (accessed Oct. 24th, 2022)
2 Hyun-Jo Yu, Hayeun Jang, Yumi Jo, Seungho Nam, Hyopil Shin, and Yoon-shin Kim, "The Korean TimeML: A Study of Event and Temporal Information in Korean Text," Language and Information, Vol. 15, No. 1, pp. 31-62, June 2011. 10.29403/LI.15.1.3   DOI
3 Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, and Daniel S. Weld, "Pretrained Language Models for Sequential Sentence Classification," Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3693-3699, Hong Kong, China, 2019. 10.18653/v1/D19-1383   DOI
4 Touseef Iqbal and Shaima Qureshi, "The survey: Text generation models in deep learning," Journal of King Saud University - Computer and Information Sciences, Vol. 34, No. 6, pp. 2515-2528, June 2022. 10.1016/j.jksuci.2020.04.001   DOI
5 Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, "Attention Is All You Need," Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.
6 ISO 8601, https://www.iso.org/iso-8601-date-and-time-format.html (accessed Oct. 24th, 2022)
7 James Pustejovsky, Kiyong Lee, Harry Bunt and Laurent Romary, "ISO-TimeML: An International Standard for Semantic Annotation," Proceedings of the Seventh International Conference on Language Resources and Evaluation, pp. 394-397, Valletta, Malta, 2010.
8 Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant Misra, "Solving Quantitative Reasoning Problems with Language Models," arXiv:2206.14858, 2022.
9 Young-Seob Jeong, Zae Myung Kim, Hyun-Woo Do, Chae-Gyun Lim, and Ho-Jin Choi, "Temporal Information Extraction from Korean Texts," Proceedings of the 19th SIGNLL Conference on Computational Natural Language Learning, Beijing, China, pp. 279-288, 2015. 10.18653/v1/K15-1028   DOI
10 Devendra Singh Sachan, Pengtao Xie, Mrinmaya Sachan, and Eric P. Xing, "Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition," Proceedings of the Machine Learning for Healthcare Conference, pp. 383-402, Palo Alto, California, 2018.
11 Tommaso Caselli, Valentina Bartalesi Lenzi, Rachele Sprugnoli, Emanuele Pianta, and Irina Prodanof, "Annotating Events, Temporal Expressions and Relations in Italian: the It-Timeml Experience for the Ita-TimeBank," Proceedings of the 5th Linguistic Annotation Workshop, pp. 143-151, Portland, Oregon, USA, 2011.
12 Naver Clova, https://clova.ai/ko (accessed Oct. 24th, 2022)
13 IBM Watson, https://www.ibm.com/watson (accessed Oct. 24th, 2022)
14 Young-Seob Jeong, Won-Tae Joo, Hyun-Woo Do, Chae-Gyun Lim, Key-Sun Choi, and Ho-Jin Choi, "Korean TimeML and Korean TimeBank," Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp. 356-359, Portoroz, Slovenia, 2016.
15 Gati L. Martin, Medard E. Mswahili, Young-Seob Jeong, and Jiyoung Woo, "SwahBERT: Language Model of Swahili," Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.314-324, Seattle, Washington, 2022. 10.18653/v1/2022.naaclmain.23   DOI
16 Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever, "Robust Speech Recognition via Large-Scale Weak Supervision," 2022.
17 Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi, "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding," arXiv:2205.11487, 2022.