• 제목/요약/키워드: Controlled vocabulary

검색결과 55건 처리시간 0.023초

키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법 (A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model)

  • 조원진;노상규;윤지영;박진수
    • Asia pacific journal of information systems
    • /
    • 제21권1호
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

대한소아치과학회지의 주요어와 의학주제표목의 일치도 (The Equality of Keywords of Journal of KAPD with Medical Subject Headings)

  • 김은희;김아현;심연수;안은숙;전은영;안소연
    • 대한소아치과학회지
    • /
    • 제43권2호
    • /
    • pp.123-128
    • /
    • 2016
  • 연구의 목적은 대한소아치과학회지에서 사용된 주요어와 의학주제표목(medical subject headings, MeSH)와의 일치도를 분석하는 것이다. 1998년부터 2014년까지 대한소아치과학회지에 게재된 1165편의 논문에서 총 4353개의 주요어를 연구대상으로 하여, MeSH와 일치하는 단어와 일치하지 않는 단어로 분류하였다. 주요어의 24.9%는 MeSH 용어와 일치하였고, 75.1%는 일치하지 않았다. 이 결과는 대한소아치과학회지의 주요어와 MeSH와의 일치도가 낮음을 보여준다. 따라서 MeSH를 더 구체적이고 정확하게 이해할 필요가 있다. MeSH와 같은 적절한 주요어를 사용하는 것은 국제적인 기준에 부합하기 위해 필요하다. 저자들은 주요어로써 적절한 MeSH 용어를 사용하도록 주의를 기울여야 할 것이다.

Open versus closed reduction of mandibular condyle fractures : A systematic review of comparative studies

  • Kim, Jong-Sik;Seo, Hyun-Soo;Kim, Ki-Young;Song, Yun-Jung;Kim, Seon-Ah;Hong, Soon-Min;Park, Jun-Woo
    • Journal of the Korean Association of Oral and Maxillofacial Surgeons
    • /
    • 제34권1호
    • /
    • pp.99-107
    • /
    • 2008
  • Objective : The objective of this review was to provide reliable comparative results regarding the effectiveness of any interventions either open or closed that can be used in the management of fractured mandibular condyle Patients and Methods : Research of studies from MEDLINE and Cochrane since 1990 was done. Controlled vocabulary terms were used. MeSH Terms were "Mandibular condyle" AND "Fractures, bone". Only comparative study were considered in this review using the "limit" function. According to the criteria, two review authors independently assessed the abstracts of studies resulting from the searches. The studies were divided according to some criteria, and following were measured: Ramus height, condyle sagittal displacement, condyle Towns's image displacement, Maximum open length, Protrusion & Lateral excursion, TMJ pain, Malocclusion, and TMJ disorder. Results : Many studies were analyzed to review the post-operative result of the two methods of treatment. Ramus height decreased more in when treated by closed reduction as opposed to open reduction. Sagittal condyle displacement was shown to be greater in closed reduction. Condyle Town's image condyle displacement had greater values in closed reduction. Maximum open length showed lower values in closed reduction. In protrusive and lateral movement, closed reduction was less than ORIF. Closed reduction showed greater occurrence of malocclusion than ORIF. However, post-operative pain and discomfort was greater in ORIF. Conclusion : In almost all categories, ORIF showed better results than CRIF. However, the use of the open reduction method should be considered due to the potential surgical morbidity and increased hospitalization time and cost. To these days, Endoscopic surgical techniques for ORIF (EORIF) are now in their infancy with the specific aims of eliminating concern for damage to the facial nerve and of reducing or eliminating facial scars. Before performing any types of treatment, patients must be understood of both of the treatment methods, and the best treatment method should be taken on permission.

연구데이터 관리를 위한 OAK 메타데이터 확장 방안 연구 (A Preliminary Study on Extending OAK Metadata for Research Data)

  • 이미화;이은주;노지현
    • 한국도서관정보학회지
    • /
    • 제51권3호
    • /
    • pp.27-51
    • /
    • 2020
  • 본 연구는 국립중앙도서관의 오픈액세스 리포지토리인 OAK에서 연구데이터를 기술할 수 있도록 OAK 메타데이터에 확장 방안을 제안하는데 목적이 있다. 이를 위한 연구방법으로 문헌연구, 사례조사, 관계자와의 면담을 실시하였다. 연구데이터 기술을 위한 기존 OAK 메타데이터의 확장 방안을 다음과 같이 도출하였다. 첫째, 연구데이터를 위한 모델링으로 컬렉션 > 아이템 > 파일로 구성된 기존 구조를 그대로 유지하되 컬렉션은 해당 연구데이터를 묶을 수 있는 상위 그룹으로 두고, 아이템에는 연구데이터의 메타데이터와 파일을 묶어 제공하는 구조를 제안하였다. 둘째, 표준, 사례 기관의 메타데이터를 기존 OAK 메타데이터와 매핑하여 연구데이터의 기술을 위해 OAK에 추가할 필요가 있다고 판단되는 요소를 선별하여 OAK 확장 요소를 도출하였다. 셋째, 구조화된 데이터를 통해 검색이나 추후 통계 등에 활용할 수 있도록 통제어휘집과 구문에 대한 사항도 제시하였다. 본 연구는 연구데이터의 기술을 위해 OAK 메타데이터를 확장함으로써 국내에서 산출되는 연구데이터가 공식적으로 수집·저장·활용될 수 있는 기반을 제공함으로써 국가적으로 연구의 중복을 방지하고 연구 산출물을 공유 및 재활용할 수 있는 정보환경을 구축하는데 기여하였다.

IPTV환경에서 온톨로지와 k-medoids기법을 이용한 개인화 시스템 (Personalized Recommendation System for IPTV using Ontology and K-medoids)

  • 윤병대;김종우;조용석;강상길
    • 지능정보연구
    • /
    • 제16권3호
    • /
    • pp.147-161
    • /
    • 2010
  • 최근 방송과 통신의 융합으로 TV에 통신이라는 기술이 접목되면서, TV 시청 형태에 많은 변화를 가져왔다. 이러한 형태의 TV 시청 변화는 서비스 선택의 폭을 넓혀주지만 프로그램을 선택을 위해 많은 시간을 투자해야 한다. 이러한 단점을 개선하기 위해서 본 논문에서는 IPTV환경에서 사용자의 다양한 콘텐츠를 제공하는 방송 환경에서 고객의 시청 정보를 바탕으로 고객 사용정보 온톨로지를 구축하고 그에 따라 고객을 k-medoids 방법을 이용해서 클러스터링 한다. 이를 바탕으로 고객이 선호하는 콘텐츠를 추천 하는 방법을 제안하였다. 실험부분에서 본 제안방법의 우수성을 기존의 방법과 비교하여 보여준다.