Browse > Article
http://dx.doi.org/10.22925/apjcr.2021.2.2.31

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus  

Yuan, Yiguo (Nanjing Normal University)
Li, Bin (Nanjing Normal University)
Publication Information
Asia Pacific Journal of Corpus Research / v.2, no.2, 2021 , pp. 31-41 More about this Journal
Abstract
This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.
Keywords
Ancient Chinese; Lexical Evolution; Quantitative Study; Corpus-based Analysis; Computational Linguistics;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. International Journal of Translation Studies, 12(2), 241-266.   DOI
2 Dong, X. (2002). Research of lexicalization of syntactic structure (in Chinese). Studies in Language and Linguistics, 3, 56-65.
3 Guo, J., & Yang, E. (2015). A study on the lexicalization of combined idioms in mencius. In Lu, Q., & Gao, H. (Eds.), Workshop on Chinese Lexical Semantics, (pp. 307-319). Cham: Springer.
4 Jin, H., & Dong, Y. (2019). Investigation on the lexicalization process and causes of "Guzhi". In Hong, J. F., Zhang, Y., & Liu, P. (Eds.), Workshop on Chinese Lexical Semantics, (pp. 275-283). Cham: Springer.
5 Li, S. (2007) The development of mid-ancient Chinese word formation from dissyllabic word data (in Chinese). Journal of Ningxia University (Social Science Edition), 3, 1-7.
6 Cheng, N., Li, B., Ge, S., Hao, X. & Feng, M. (2020). A joint model of automatic sentence segmentation and lexical analysis for ancient Chinese based on BiLSTM-CRF model (in Chinese). Journal of Chinese Information Processing, 34(4), 1-9.
7 Jiang, S. (1989). Review and prospect of the study of Chinese language history (in Chinese). Language Teaching and Linguistic Studies, 2, 124-129.
8 Wang, L. (1980). The Manuscript of Chinese History (in Chinese). Beijing: Zhonghua Book Company.