한국생물정보학회:학술대회논문집 (Proceedings of the Korean Society for Bioinformatics Conference)
- 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
- /
- Pages.95-100
- /
- 2003
가상 예제와 Edit-distance 자질을 이용한 SVM 기반의 단백질명 인식
SVM-based Protein Name Recognition using Edit-Distance Features Boosted by Virtual Examples
- Yi, Eun-Ji (Department of Computer Science and Engineering, POSTECH) ;
- Lee, Gary-Geunbae (Department of Computer Science and Engineering, POSTECH) ;
- Park, Soo-Jun (Bioinformatics Research Team, Computer and Software Research Lab, ETRI)
- 발행 : 2003.10.31
초록
In this paper, we propose solutions to resolve the problem of many spelling variants and the problem of lack of annotated corpus for training, which are two among the main difficulties in named entity recognition in biomedical domain. To resolve the problem of spotting valiants, we propose a use of edit-distance as a feature for SVM. And we propose a use of virtual examples to automatically expand the annotated corpus to resolve the lack-of-corpus problem. Using virtual examples, the annotated corpus can be extended in a fast, efficient and easy way. The experimental results show that the introduction of edit-distance produces some improvements in protein name recognition performance. And the model, which is trained with the corpus expanded by virtual examples, outperforms the model trained with the original corpus. According to the proposed methods, we finally achieve the performance 75.80 in F-measure(71.89% in precision,80.15% in recall) in the experiment of protein name recognition on GENIA corpus (ver.3.0).
키워드