DOI QR코드

DOI QR Code

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

  • 투고 : 2021.09.30
  • 심사 : 2021.12.08
  • 발행 : 2021.12.31

초록

This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

키워드

과제정보

This research was supported by the Project of Social Science Foundation of Jiangsu Province (20JYB004), the National Social Science Project of China (18BYY127) and the Project for Jiangsu Higher Institutions' Excellent Innovative Team for Philosophy and Social Sciences. We thank the participants of the CLSW2021 Nanjing conference for feedback.

참고문헌

  1. Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. International Journal of Translation Studies, 12(2), 241-266. https://doi.org/10.1075/target.12.2.04bak
  2. Cheng, N., Li, B., Ge, S., Hao, X. & Feng, M. (2020). A joint model of automatic sentence segmentation and lexical analysis for ancient Chinese based on BiLSTM-CRF model (in Chinese). Journal of Chinese Information Processing, 34(4), 1-9.
  3. Dong, X. (2002). Research of lexicalization of syntactic structure (in Chinese). Studies in Language and Linguistics, 3, 56-65.
  4. Guo, J., & Yang, E. (2015). A study on the lexicalization of combined idioms in mencius. In Lu, Q., & Gao, H. (Eds.), Workshop on Chinese Lexical Semantics, (pp. 307-319). Cham: Springer.
  5. Jiang, S. (1989). Review and prospect of the study of Chinese language history (in Chinese). Language Teaching and Linguistic Studies, 2, 124-129.
  6. Jin, H., & Dong, Y. (2019). Investigation on the lexicalization process and causes of "Guzhi". In Hong, J. F., Zhang, Y., & Liu, P. (Eds.), Workshop on Chinese Lexical Semantics, (pp. 275-283). Cham: Springer.
  7. Li, S. (2007) The development of mid-ancient Chinese word formation from dissyllabic word data (in Chinese). Journal of Ningxia University (Social Science Edition), 3, 1-7.
  8. Wang, L. (1980). The Manuscript of Chinese History (in Chinese). Beijing: Zhonghua Book Company.