• 제목/요약/키워드: test collections

검색결과 64건 처리시간 0.13초

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

  • Kim, Jin-Suk;Choe, Ho-Seop;You, Beom-Jong;Seo, Jeong-Hyun;Lee, Suk-Hoon;Ra, Dong-Yul
    • Journal of Computing Science and Engineering
    • /
    • 제3권3호
    • /
    • pp.165-180
    • /
    • 2009
  • The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.

Development of a Framework for Semi-automatic Building Test Collection Specialized in Evaluating Relation Extraction between Technical Terminologies (기술용어 간 관계추출의 성능평가를 위한 반자동 테스트 컬렉션 구축 프레임워크 개발)

  • Jeong, Chang-Hoo;Choi, Sung-Pil;Lee, Min-Ho;Choi, Yun-Soo
    • The Journal of the Korea Contents Association
    • /
    • 제10권2호
    • /
    • pp.481-489
    • /
    • 2010
  • Due to the increase of the attention on relation extraction systems, the construction of test collections for assessing their performance has emerged as an important task. In this paper, we propose semi-automatic framework capable of constructing test collections for relation extraction on a large scale. Based on this framework, we develop a test collection which can assess the performance of various approaches to extracting relations between technical terminologies in scientific literatures. This framework can minimize the cost of constructing this kind of collections and reduce the intrinsic fluctuations which may come from the diversity in characteristics of collection developers. Furthermore, we can construct balanced and objective collections by means of controlling the selection process of seed documents and terminologies using the proposed framework.

Developing the KRIST Test Collection for Researches in Information Retrieval (정보 검색 연구를 위한 KRIST 테스트 컬렉션의 개발)

  • 이준호
    • Journal of the Korean Society for information Management
    • /
    • 제12권2호
    • /
    • pp.225-232
    • /
    • 1995
  • It has been known that test collections play an important role for researches in information retrieval. A variety of test collections have been created in foreign countries, and have been heavily used by researchers. Although research interests in Hangul information retrieval have been rapidly grown up in Korea these days, lack of Hangul test collec tions makes it difficult to develop retrieval techniques for Hangul texts. This study describes the development of the KRIST test collection. The KRIST test collection consists of 13.515 bibliographic records. 30 queries and a list of relevant documents to the queries.

  • PDF

Use Studies of Library Collections (장서평가에 관한 소고 -특히 이용조사를 중심으로-)

  • Yoo Chae-Ock
    • Journal of the Korean Society for Library and Information Science
    • /
    • 제15권
    • /
    • pp.175-195
    • /
    • 1988
  • Use studies of library collections have been conducted as a method of evaluating collections in a library. The main purpose of use studies is to evaluate the quality of a library collection in terms of extent and mode of its use. In addition to use studies, both quantitative and qualitative methods could be utilized in order to evaluate a library collection. However, the quantitative and qualitative collection evaluation methods are more concerned with the collection itself than with its use. Use studies have been conducted in large academic libraries for the following specific purposes: 1) They attempt to identify little used portion of collections that can be retired less accessible and less expensive storage area. 2) They try to identify core collections to satisfy some degree of circulation demands in the near future. 3) They try to identify use patterns of selected subject areas or type of books that can be used to adjusting collection development practices or fund allocations. 4) They try to assess the document delivery capability of a library to improve their availability. A number of methodologies employed for these specific purposes fall into four major categories; 1) circulation analysis method, 2) last circulation method, 3) relative use method, and 4) document delivery test. Each method is briefly reviewed with its limitations.

  • PDF

Construction of a Balanced Test Collection for Evaluation of Information Retrieval System (정보 검색 시스템 평가를 위한 균형 테스트 컬렉션 구축)

  • 맹성현;이석훈;이준호;이응봉;송사광
    • Journal of the Korean Society for information Management
    • /
    • 제16권2호
    • /
    • pp.135-148
    • /
    • 1999
  • There has been some research in Korea on test collections for evaluation of information retrieval (IR) systems. The test collections constructed as an outcome from the research have provided a starting point and opportunities to test Korean IR systems in an objective manner. However, they are well short of the standard practice in the broader IR community in that they are small in their size and usually unbalanced in terms of the characteristics of the documents and the queries (such as the subject domains). In this article, we describe our research effort to alleviate this problem and the resulting test collection, called HANTEC (Hangul TEst Collection). HANTEC is balanced in terms of the subject domains, document lengths, and user types, and currently consists of 120,000 documents divided into three groups: general area, social science area and scienceltechnology area. The 30 queries in the collection are grouped into the same three areas in one dimension and into three distinct user groups in the other dimension.

  • PDF

The Characteristics of Identical Color Coordination In Contemporary Women's Fashion - Centered on the Collections of Paris, Milan, New York, London - (현대(現代) 여성(女性) 패션에 나타난 동일색채(同一色彩) 코디네이션의 특성(特性) - 파리, 밀란, 뉴욕, 런던 컬렉션 중심(中心)으로 -)

  • Kwon, Hae-Sook
    • Journal of Fashion Business
    • /
    • 제9권1호
    • /
    • pp.21-33
    • /
    • 2005
  • The main objective of this research was to understand the characteristics of identical color coordination through the analysis of modern female fashion color coordination as they appear in the 'Collections'. Data collection of 2026 was done through review of '$pr\hat{e}t-\grave{a}$-porter Collections' of four cities; Milan, London, New York, Paris. Lastly, statistical analysis of frequency and $X^2$-test and also qualitative interpretation of identical color coordination characteristics were completed. The main findings were; The color coordination of modern women's fashion produces a unified theme, or monochromatic harmony, through the use and coordination of identical colors. The clear contrast of tones portrays a strong image especially in achromatic color coordination, and through the use of texture variation, monochromatic color coordination becomes even more compelling. The tone variation, observed most often in monochromatic color coordination was the black and white contrast, which enhances the simplicity and clarity. Within chromatic color combinations, tone on tone color coordination was achieved by varying brightness. Furthermore, the observation of Faux Camaeu indicates that the coordination of different textures is used often in identical color coordination. While achromatic colors can lead to a hard and rough feeling, it also is compensated through the use of varying textures. In addition, adding variety of textures can add subtle interests to the simplicity of white. Lastly, in all four collections, the chromatic identical color coordination was found more frequently than the achromatic. In Paris, N.Y. & London, the chromatic identical coordination was used more often than chromatic. Milan showed most use of achromatic coordination. The use of the tones showed similar trends in all four collections, with contrasting tone being used most often, followed by similar and identical tones.

Characteristics of Natural Prints Design in Fashion Collections - Paris, Milan & New York from 2011 SS to 2012 SS - (패션 컬렉션에 나타난 자연문양디자인의 특성 - 2011 S/S ~2012 S/S 파리, 밀란, 뉴욕 컬렉션을 중심으로 -)

  • Kwon, Hae-Sook
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • 제15권1호
    • /
    • pp.91-109
    • /
    • 2013
  • The main objective of this research was to understand the latest trends of natural print design through the quantitative & qualitative analysis of fashion appeared in contemporary female collections. The research criteria was defined as 3 seasons from 2011 S/S to 2012 S/S. Data collection of 726 was done through review of 'pr$\hat{e}$t-$\grave{a}$-porter Collections' of three major fashion cities; Paris, Milan and NY. Statistical analysis of frequency with chi-square test was conducted. Also qualitative interpretation of natural print design' characteristics was completed. The main findings were as follows.; The average occurrence rate of natural print design from 2011SS to 2012 SS in three collections were 6.4% in Milan 6.4%, 5.5% in Paris and 6.8% in N.Y. The five source types of natural prints in contemporary women's fashion collections were identified and the order of their appearance were as follows: flowers, plants, animals, insects & marine organisms and compound one. The plant prints were expressed by stylized or realistic touch. Flower patterns showed more variables than plants, however, there were no big difference in their image and major characteristics. The animal prints demonstrated two aspects. First one used typical animal print of fur or skin, but the other one draw the animal figure like paintings. The compound source type presented the most interesting and fresh pattern design ideas. In the insects & marine organisms, mainly butterfly and seashell & starfish, etc. appeared as real shapes or sometimes were stylized.

  • PDF

Implementation of a KORMARC/EAD integrated system for the Myongji Digital Library Collections (디지털 도서관 콘텐츠 관리를 위한 KORMARC/EAD 통합시스템 구현)

  • Kim, Hyun-Hee
    • Journal of Korean Society of Archives and Records Management
    • /
    • 제2권1호
    • /
    • pp.119-131
    • /
    • 2002
  • The study designs and implements a KORMARC/EAD integrated system for the Myongji Digital Library Collections. The purpose of this paper is to design the metadata to Myongji Korean History Collections to provide digital information of high quality to clients, and to develop and implement a model for managing digital library collections. In order to test the model and the quality of the derived metadata, we built a metadata management system, which is connected to the existing KORMARC system. The system consists of two modules- a retrieval and an input module. While in the retrieve mode, one can retrieve KORMARC records of books and archival items, with links to modified EAD files for archival items or to image files for books, in the input mode, one can type two types of data such as a catalog data and an inventory data. Finally, we evaluated the proposed system via mail questionnaires, and propose three suggestions to make this system a much more comprehensive and effective system.

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • 제18권2호
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

Developing a Test Collection for Korean Text Categorization (한국어 문서분류 테스트컬렉션 개발)

  • Ra, Dong-Yul;Kim, Yunsik;Shin, Hyun-Joo;Lee, Kyu-Hee;Kim, Tae-Kyu;Kang, Hyun-Kyu;Choe, Ho-Seop;Yoon, Hwa-Mook
    • Proceedings of the Korea Contents Association Conference
    • /
    • 한국콘텐츠학회 2007년도 추계 종합학술대회 논문집
    • /
    • pp.435-439
    • /
    • 2007
  • Document categorization system is important in the internet age in which huge number of documents are created and need to be dealt with. By this reason a lot of research has been done in this field. For the development of the system, a supervised learning method is widely used. This approach needs a test collection as a prerequisite. For the case of English, several test collections are available which provide a lot of help for developing systems and doing research. But no public test collections have been reported and are not available in the case of Korean. To improve the situation for Korean we are undergoing the construction of a Korean test collection. In this paper the approaches being used and current stage of the collection will be described.

  • PDF