Browse > Article
http://dx.doi.org/10.3745/KTSDE.2022.11.2.93

Deep Learning Based Semantic Similarity for Korean Legal Field  

Kim, Sung Won (한국과학기술원 지식서비스공학대학원)
Park, Gwang Ryeol (인하대학교 법학전문대학원)
Publication Information
KIPS Transactions on Software and Data Engineering / v.11, no.2, 2022 , pp. 93-100 More about this Journal
Abstract
Keyword-oriented search methods are mainly used as data search methods, but this is not suitable as a search method in the legal field where professional terms are widely used. In response, this paper proposes an effective data search method in the legal field. We describe embedding methods optimized for determining similarities between sentences in the field of natural language processing of legal domains. After embedding legal sentences based on keywords using TF-IDF or semantic embedding using Universal Sentence Encoder, we propose an optimal way to search for data by combining BERT models to check similarities between sentences in the legal field.
Keywords
NLP; LegalTech; Semantic Similarity; BERT; Legal;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jegou, and T. Mikolov, "Fasttext. zip: Compressing text classification models," arXiv preprint arXiv:1612.03651, 2016.
2 J. Pennington, R. Socher, and C. D. Manning. "Glove: Global vectors for word representation," In Proceedings of Empirical Methods in Natural Language Processing, pp.1532-1543, 2014.
3 J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," In Proceedings of Northern American Chapter of the Association for Computational Linguistics, 2019.
4 "법률 지식 베이스 소개" AI허브, 2018.01.02. [Internet], https://aihub.or.kr/aidata/29
5 "법제처 보도자료," pp.4-5, 2021.12.29.
6 M. Iyyer, V. Manjunatha, J. Boyd-Graber, and H. Daume III. "Deep unordered composition rivals syntactic methods for text classification," In Proceedings of Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2015.
7 J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP), pp.1532-1543, 2014.
8 A. Vaswani, et al., "Attention is all you need," In Proceedings of Neural Information Processing Systems, 2017.
9 Y. Yang, et al., "Multilingual universal sentence encoder for semantic retrieval," arXiv preprint arXiv:1907.04307, 2019.
10 "universal-sentence-encoder-multilingual," TensorFlow hub. last modified Jan 05, 2022. accessed Oct. 21, 2021. [Internet], https://tfhub.dev/google/universal-sentence-encoder-multilingual/3
11 "mecab-ko-dic," bitbucket. last modified Jul. 20, 2018. accessed Oct. 21, 2021. [Internet], https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/
12 Cer, D. et al., "Universal sentence encoder," arXiv preprint arXiv:1803.11175, 2018.
13 A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pretraining," 2018.
14 J. Ramos, "Using tf-idf to determine word relevance in document queries," In Proceedings of the First Instructional Conference on Machine Learning, Vol.242, No.1, pp.29-48, 2003.
15 T. B. Brown, et al., "Language models are few-shot learners," arXiv preprint arXiv:2005.14165, 2020.
16 M. E. Peters, et al., "Deep contextualized word representations," arXiv preprint arXiv:1802.05365, 2018
17 H. Zhong, C. Xiao, C. Tu, T. Zhang, Z. Liu, and M. Sun, "How Does NLP benefit legal system: A summary of legal artificial intelligence," arXiv: 2004.12158, 2020.
18 A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, Vol.1, No.8, pp.9, 2019.
19 T. M. Mitchell, "Bayesian Learning," Machine Learning, McGraw-Hill, pp.154-200, 1997.
20 M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, "Deep contextualized word representations," arXiv preprint arXiv:1802.05365, 2018.
21 A. Akbik, T. Bergmann, and R. Vollgraf, "Pooled contextualized embeddings for namedentity recognition," In Proceedings of Northern American Chapter of the Association for Computational Linguistics, 2019.
22 "판결문 공개 과감히 확대하라," 법률신문, 2019.10.28. [Internet], https://m.lawtimes.co.kr/Content/Article?serial=156740
23 J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.