DOI QR코드

DOI QR Code

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP

NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크

  • Received : 2011.12.01
  • Accepted : 2012.03.13
  • Published : 2012.04.30

Abstract

Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

방대한 양의 데이터로부터 정보추출의 정확도를 향상시키기 위한 많은 방법이 개발되어 왔다. 본 논문에서는NER(named entity recognition), 문장 추출, 스피치 태깅과 같은 여러 가지의 자연어 처리 작업을 통합하여 텍스트를 분석하였다. 데이터는 도메인에 특화된 데이터 추출 에이전트를 사용하여 웹에서 수집한 텍스트로 구성하였고, 위에서 언급한 자연어 처리 작업을 사용하여 비 구조화된 데이터로부터 정보를 추출하는 프레임 워크를 개발하였다. 조직 구조의 탐색을 위한 택스트 추출 및 분석 관점에서 연구의 성능을 시뮬레이션을 통해 분석하였으며, 시뮬레이션 결과, 정보추출에서 MUC 및 CoNLL과 같은 다른 NER 분석기 보다 성능이 우수함을 보였다.

Keywords

References

  1. H. Lauw, E. Lim, T. Tan, and H. Pang: Mining Social Network from Spatio-Temporal Events, Proceedings of SIAM Data Mining Conference (2005)
  2. J.J. Xu and H. Chen:Crimenet Explorer: A Framework For Criminal Network Knowledge Discovery., ACM Transactions on Information Systems, pp. 201-226 (2005)
  3. J. Diesner, and K.M. Carley: Using Network Text Analysis to Detect The Organizational Structure of Covert Networks, Proceedings of the North American Association for Computational Social and Organizational Science (NAACSOS) Conference, Pittsburgh, PA (2004).
  4. Named Entity Recognition, http://en.wikipedia.org/wiki/Named_entity_recognition
  5. L. Zhang, Y. Pan, and T. Zhang:Focused Named Entity Recognition using Machine Learning, SIGIR'04 (2004)
  6. J. Zhu, A. L. Goncalves, and V. Uren: Adaptive Named Entity Recognition for Social Network Analysis and Domain Ontology Maintenance, Tech Report kmi-04-30 (2005)
  7. W. Murnane:Improving Accuracy of Named Entity Recognition on Social Media Data, Thesis, Graduate School, University of Maryland (2010)
  8. K. Knight, and D. Marcu:Summarization beyond sentence extraction: A probabilistic approach to sentence compression, Artificial Intelligence Volume 139, Issue 1, pp. 91-107 (2002) 8 https://doi.org/10.1016/S0004-3702(02)00222-9
  9. Part of Speech Tagging, http://en.wikipedia.org/wiki/Part-of-speech_tagging
  10. D. Rusu, L. Dali, B. Fortuna, M. Grobelnik, and D. Mladenid: Triplet Extraction from Sentences, In Proceedings of the 10th International Multiconference "Information Society--IS 2007". Vol. A, pp. 218-222 (2007)
  11. L. Dali and B. Fortuna: Triplet extraction from sentences using svm. In SiKDD (2008)
  12. Karmakar, and Z. Ying, "Mining collaboration through textual semantic interpretation,"Intelligent Systems (HIS), 2011 11th International Conference onvol., no., pp.728-733, 5-8 Dec. 2011
  13. O. Vybornova, I. Smirnov, I. Sochenkov, A. Kiselyov, I. Tikhomirov, N. Chudova, Y. Kuznetsova, G. Osipov, "Social Tension Detection and Intention Recognition Using Natural Language Semantic Analysis: On the Material of Russian-Speaking Social Networks and Web Forums,"and Security Informatics Conference (EISIC), 2011 Europeanvol., no., pp.277-281, 12-14 Sept. 2011
  14. Sun Duo-Yong; Guo Shu-Quan; Zhang Hai; Li Ben-Xian; , "Study on covert networks of terroristic organizations based on text analysis,"Intelligence and Security Informatics (ISI), 2011 IEEE International Conference onvol., no., pp.373-378, 10-12 July 2011
  15. Automap by CASOS, http://www.casos.cs.cmu.edu/projects/automap/
  16. ORA by CASOS, http://www.casos.cs.cmu.edu/projects/ora/
  17. H. Cunningham: Information Extraction-A User Guide, Research memo CS-97-02 (1997)
  18. D. Nadeau, and S.Sekine: A survey of named entity recognition and classification, Lingvisticae Investigationes, Volume 30,1 , pp. 3-26(24) (2007) https://doi.org/10.1075/li.30.1.03nad
  19. D. Nadeau, and S.Sekine: A survey of named entity recognition and classification, Lingvisticae Investigationes, Volume 30,1 , pp. 3-26(24) (2007) https://doi.org/10.1075/li.30.1.03nad
  20. Doing Named Entity Recognition? Don't optimize for F1, http://nlpers.blogspot.com/2006/08/doing-namedentity-recognition-dont.html
  21. Aperture Framework, http://aperture.sourceforge.net/
  22. Stanford Named Entity Recognizer, http://nlp.stanford.edu/software/CRF-NER.shtml
  23. Stanford Log-linear Part-Of-Speech Tagger, http://nlp.stanford.edu/software/tagger.shtml
  24. Graph, http://sourceforge.net/projects/jgraph