An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique

Han, Yoo-Jin;Oh, Seung-Woo;

doi:10.3743/KOSIM.2010.27.2.007

정보관리학회지 (Journal of the Korean Society for information Management)

제27권2호
/
Pages.7-20
/
2010
/
1013-0799(pISSN)
/
2586-2073(eISSN)

한국정보관리학회 (Korean Society for Information Management)

DOI QR Code

미국 특허 서지정보 추출 방법에 대한 연구: HTML 파싱 기법의 활용을 중심으로

An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique

한유진 ;
오승우

Han, Yoo-Jin (School of Global Service, Sookmyung Women's University) ;
Oh, Seung-Woo (Technology Management, Economics and Policy Program, Seoul National University)

투고 : 2010.04.16
심사 : 2010.06.13
발행 : 2010.06.30

https://doi.org/10.3743/KOSIM.2010.27.2.007 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 연구는 미국 특허 문서에서 가장 최신의 정보를 추출할 수 있는 방법을 제시하였다. 이를 위해 미국특허청 웹페이지에 직접 접속하여, HTML 문서를 파싱하는 방법을 제시하였다. 먼저 관심 있는 키워드로 검색을 한 후 50개로 이루어진 리스트가 출력되면, HTML 파싱 기법을 이용하여 여기서 직접 특허번호, 출원인, 미국 특허 클래스와 같은 주요 서지정보를 추출할 수 있는 알고리즘을 제안하였다. 또한 미국 특허문서에서 특수하게 제공되는 선.후행 특허간의 관계를 활용해 본 특허와 후행 특허의 미국 특허 클래스를 동시에 추출 할 수 있는 알고리즘도 보여주었다. 본 연구에서 제시한 방법은 몇 가지 한계를 가지지만, 적시성.포괄성 측면에서 이미 존재하는 데이터베이스를 보완할 수 있을 것이다.

This study aims to provide a method of extracting the most recent information on US patent documents. An HTML paring technique that can directly connect to the US Patent and Trademark Office (USPTO) Web page is adopted. After obtaining a list of 50 documents through a keyword searching method, this study suggested an algorithm, using HTML parsing techniques, which can extract a patent number, an applicant, and the US patent class information. The study also revealed an algorithm by which we can extract both patents and subsequent patents using their closely connected relationship, that is a very distinctive characteristic of US patent documents. Although the proposed method has several limitations, it can supplement existing databases effectively in terms of timeliness and comprehensiveness.

키워드

참고문헌

Calcagno, M. 2008. “An investigation into analyzing patents by chemical structure using Thomson’s Derwent World Patent Index codes.” World Patent Information, 30(3): 188-198. https://doi.org/10.1016/j.wpi.2007.10.007
Ernst, H. 2003. “Patent Information for Strategic Technology Management.” World Patent Information, 25(3): 233-242. https://doi.org/10.1016/S0172-2190(03)00077-2
Gupta, S., G. E. Kaiser, P. Grimm, M. F. Chiang, and J. Starren. 2005. “Automating Content Extraction of HTML Documents.” World Wide Web, 8(2): 179-224. https://doi.org/10.1007/s11280-004-4873-3
Hall, B., A. B. Jaffe, and M. Trajtenberg. 2001. The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools. NBER Working Paper 8498.
Lerdorf, R., K. Tatroe, and P. MacIntyre. 2006. Programming PHP (2nd ed.). O'Reilly Media:Sebastopol, CA.
Lichtenthaler, U. 2009. “The role of corporate technology strategy and patent portfolios in low-,medium- and high-technology firms.” Research Policy, 38(3): 559-569. https://doi.org/10.1016/j.respol.2008.10.009
No, H. J. and Y. Park. 2010. “Trajectory patterns of technology fusion: Trend analysis and taxonomical grouping in nanobiotechnology.” Technological Forecasting and Social Change, 77(1): 63-75. https://doi.org/10.1016/j.techfore.2009.06.006
Simmons, E. S. 2004. “The online divide: a professional user’s perspective on Derwent database development in the online era.” World Patent Information, 26(1): 45-47. https://doi.org/10.1016/j.wpi.2003.10.008
World Intellectual Property Organization (WIPO, 2010) IP Statistics.
Yoo, J. B. and Y. M. Chung. 2010. “Analysis of factors influencing patent citations.” Journal of the Korean Society for Information Management, 27(1): 103-118. https://doi.org/10.3743/KOSIM.2010.27.1.103
Yoon, B. U. and Y. Park. 2004. “A text-mining-based patent network: Analytical tool for high-technology trend.” The Journal of High Technology Management Research, 15(1): 37-50. https://doi.org/10.1016/j.hitech.2003.09.003

정보관리학회지 (Journal of the Korean Society for information Management)

미국 특허 서지정보 추출 방법에 대한 연구: HTML 파싱 기법의 활용을 중심으로

An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)