DOI QR코드

DOI QR Code

연관법령 검색을 위한 워드 임베딩 기반 Law2Vec 모형 연구

A Study on the Law2Vec Model for Searching Related Law

  • 김나리 (고려대학교 빅데이터응용및보안학과) ;
  • 김형중 (고려대학교 빅데이터응용및보안학과)
  • Kim, Nari (Department of Big Data Application and Security, Korea University) ;
  • Kim, Hyoung Joong (Department of Big Data Application and Security, Korea University)
  • 투고 : 2017.10.19
  • 심사 : 2017.11.25
  • 발행 : 2017.11.30

초록

법률 지식 검색의 궁극적 목적은 법령과 판례를 근거로 최적의 법례정보 획득이라고 할 수 있다. 최근, 대규모 자료에서 효율적으로 검색하여야 하는목적을 달성하기 위하여텍스트 마이닝 연구가 활발히 이루어지고 있다. 대표적인 방법으로 Neural Net 기반 학습방법인 워드 임베딩 알고리즘을 들 수 있다. 본 논문에서는 한국 법령정보를 워드임베딩에 적용하여 연관정보 검색방법을 연구하였다. 우선 판례의 참조법령을 순서대로 추출하여 모형의 입력정보로 활용하였다. 추출한 참조법령들은 중심법령을 기준으로 주변 법령을 학습하고 임베딩하는 Law2Vec 모형을 작성하였다. 이 모형으로 법령에 대하여 학습을 수행하고 법령 간의 관계를 추론하였다. 본 연구의 모형을 평가하기 위하여 연관법령으로 도출된 결과가 키워드와 밀접한 관련이 있는지 정밀도와 재현율을 계산하여 검증하였다. 실험결과, 본 연구의 제안방식이기존의 키워드 검색방법보다 연관된 법령을추론하는데유용함을 알 수 있었다.

The ultimate goal of legal knowledge search is to obtain optimal legal information based on laws and precedent. Text mining research is actively being undertaken to meet the needs of efficient retrieval from large scale data. A typical method is to use a word embedding algorithm based on Neural Net. This paper demonstrates how to search relevant information, applying Korean law information to word embedding. First, we extracts reference laws from precedents in order and takes reference laws as input of Law2Vec. The model learns a law by predicting its surrounding context law. The algorithm then moves over each law in the corpus and repeats the training step. After the training finished, we could infer the relationship between the laws via the embedding method. The search performance was evaluated based on precision and the recall rate which are computed from how closely the results are associated to the search terms. The test result proved that what this paper proposes is much more useful compared to existing systems utilizing only keyword search when it comes to extracting related laws.

키워드

참고문헌

  1. Statute Status Report , [Internet] available at http://www.moleg.go.kr/lawinfo/status/statusReport
  2. H. J. Jeon, "Legal Tech Industry Status and Implications," Hyundai Research Institute, vol. 16-31. no. 669. pp. 1-11. Dec 2016.
  3. M. H, Koh, "A Study on Advancement Provision of Legal Information," Korea Ministry of Government Legislation, no. 11-1170000-000460-01, pp. 1-121. Sep 2012.
  4. I. H. Chang, "Developing and Evaluating an Ontology-based Legal Retrieval System," Journal of the Korean Society for Library and Information Science, vol. 45, no. 2, pp. 345-366, Mar 2011. https://doi.org/10.4275/KSLIS.2011.45.2.345
  5. M. J. Won, "A Development of Ontology-Based Law Retrieval System: Focused on Railroad R&D Projects," Journal of Society for e-Business Studies, vol. 20, no. 4, pp. 209-225, Nov 2015. https://doi.org/10.7838/jsebs.2015.20.4.209
  6. J. H. Kim, "A Study on Legal Ontology Construction," Journal of the Korea Society of Computer and Information, vol. 19, no. 11, pp. 105-113, Nov 2014. https://doi.org/10.9708/jksci.2014.19.11.105
  7. J.H. Kim, "Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System," Journal of Intelligence and Information Systems, vol. 18, no. 3, pp. 137-152, Sep 2012. https://doi.org/10.13088/JIIS.2012.18.3.137
  8. J. S. Shim, "A Searching Method for Legal Case Using LDA Topic Modeling," Journal of the Institute of Electronics and Information Engineers, vol. 54, no. 9, pp. 67-75, Sep 2017.
  9. J. H. Kim, "Exploring the Lawyers' Legal Information Seeking Behaviors for the Law Practice," Journal of the Korean Society for Information Management, vol. 32, no. 4, pp. 55-76, Dec 2015. https://doi.org/10.3743/KOSIM.2015.32.4.055
  10. T. Young, D. Hazarika, S. Poria, and E. Cambria, "Recent Trends in Deep Learning Based Natural Language Processing," arXiv preprint arXiv:1708.02709, 2017.
  11. Y. Bengio, R. Ducharme, P. Vincent et al., "A neural probabilistic language model," Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
  12. H. Y. Lee, and J. S. Lee, "Functional Expansion of Morphological Analyzer Based on Longest Phrase Matching For Efficient Korean Parsing," Journal of Digital Contents Society, vol. 17, no. 3, pp. 203-210, Jun. 2016. https://doi.org/10.9728/dcs.2016.17.3.203
  13. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv:1301.3781v3, 2013.
  14. R. Andrii, "Semiotic Analysis of Korean Legal Terms," Journal of Korean Culture, vol. 10, pp. 26-30, Feb 2008.
  15. C. Park, K. Kim, and D. Seong, "Automatic IPC Classification of Patent Documents Using the Term Clustering," Journal of Korean Institute of Information Technology, vol. 12, no. 9, pp.127-139, Sep 2014.
  16. Z. S. Harris, "Distributional Structure," Word, vol. 10, no. 2-3, pp. 146-162. 1954. https://doi.org/10.1080/00437956.1954.11659520
  17. Word2Vec Research, [Internet] available at https://ratsgo.github.io/from%20frequency%20to%20semantics/2017/03/11/embedding/
  18. Word2Vec Tutorial, [Internet] available at https://rare-technologies.com/deep-learning-with-Word2vec-and-gensim/
  19. K. Y. Lee, "Jurisprudence for the Advancement of the Statute of Limitations in Korean Civil Law," Ministry of Justice, Republic of Korea, Research Report, Dec 2007.

피인용 문헌

  1. 법률정보시스템을 위한 텍스트 마이닝 적용 방안 - 명예 훼손 판례를 대상으로 - vol.54, pp.1, 2017, https://doi.org/10.4275/kslis.2020.54.1.387