• Title/Summary/Keyword: KRISTAL-IRMS

Search Result 2, Processing Time 0.014 seconds

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

  • Kim, Jin-Suk;Choe, Ho-Seop;You, Beom-Jong;Seo, Jeong-Hyun;Lee, Suk-Hoon;Ra, Dong-Yul
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.3
    • /
    • pp.165-180
    • /
    • 2009
  • The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.

Development of the Management Tool for S&T information in distributed retrieval database (분산 저장된 과학기술정보 서비스를 위한 검색 데이터베이스 관리 도구의 설계 및 개발)

  • Lee, Seok-Hyoung;Yoon, Hee-Jun;Yeo, Il-Yeon;Choi, Sung-Pil;Yoon, Hwa-Mook
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.677-681
    • /
    • 2006
  • In this paper, we suggest the GUI Management Tool, named K-Manager, for management and service of science and technology information that stored in distributed retrieval databases. Generally, it must be adapted retrieval database system for web based S&T contents service. But, It is inconvenient contents manager or the system administrator controls information easily, because it does not support the S&T information management process like TOAD, which can use for the relation database, in information retrieval database system. Using K-Manager, content manager can process the S&T content and system manager can manage the databases easily. The proposed tool active controls information effectively which is stored in the distributed retrieval database which guarantee the safety management of the contents stored in database and operate retrieval with efficient performances. Our tool consists of two sub systems, one is content manager, the other is database manager for YESKISTI based on KRISTAL-IRMS.

  • PDF