• Title/Summary/Keyword: 내포된 문서

Search Result 55, Processing Time 0.02 seconds

Resampling Feedback Documents Using Overlapping Clusters (중첩 클러스터를 이용한 피드백 문서의 재샘플링 기법)

  • Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.247-256
    • /
    • 2009
  • Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.

Clustering Technique Using a Node and Level of XML tree (XML 트리의 노드와 레벨을 사용한 군집화 방법)

  • Kim, Woosaeng
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.3
    • /
    • pp.649-655
    • /
    • 2013
  • Recently, researches are studied in developing efficient techniques for accessing, querying, and managing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. An element and an inclusion relationship of a XML document corresponds to a node and a level of the corresponding tree, respectively. Therefore, when two XML documents are similar then their nodes' names and levels of the corresponding trees are also similar. In this paper, we cluster XML documents by using nodes' names and levels of the corresponding tree as a feature of a document. The experiment shows that our proposed method has a good performance.

Terms Based Sentiment Classification for Online Review Using Support Vector Machine (Support Vector Machine을 이용한 온라인 리뷰의 용어기반 감성분류모형)

  • Lee, Taewon;Hong, Taeho
    • Information Systems Review
    • /
    • v.17 no.1
    • /
    • pp.49-64
    • /
    • 2015
  • Customer reviews which include subjective opinions for the product or service in online store have been generated rapidly and their influence on customers has become immense due to the widespread usage of SNS. In addition, a number of studies have focused on opinion mining to analyze the positive and negative opinions and get a better solution for customer support and sales. It is very important to select the key terms which reflected the customers' sentiment on the reviews for opinion mining. We proposed a document-level terms-based sentiment classification model by select in the optimal terms with part of speech tag. SVMs (Support vector machines) are utilized to build a predictor for opinion mining and we used the combination of POS tag and four terms extraction methods for the feature selection of SVM. To validate the proposed opinion mining model, we applied it to the customer reviews on Amazon. We eliminated the unmeaning terms known as the stopwords and extracted the useful terms by using part of speech tagging approach after crawling 80,000 reviews. The extracted terms gained from document frequency, TF-IDF, information gain, chi-squared statistic were ranked and 20 ranked terms were used to the feature of SVM model. Our experimental results show that the performance of SVM model with four POS tags is superior to the benchmarked model, which are built by extracting only adjective terms. In addition, the SVM model based on Chi-squared statistic for opinion mining shows the most superior performance among SVM models with 4 different kinds of terms extraction method. Our proposed opinion mining model is expected to improve customer service and gain competitive advantage in online store.

Design and Implementation of Integration ebXML Document Editing System (통합형 ebXML 문서 편집 시스템의 설계 및 구현)

  • 임지훈;김창수;정회경;오수영;정문영
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04a
    • /
    • pp.364-366
    • /
    • 2002
  • 인터넷/웹 비즈니스의 확산과 더불어, 기업간(B2B) 거래 처리에도 혁명적인 변화가 일어나고 있다. 그러나, 이러한 변화의 이면에는 해결해야 할 많은 기술적인 제약이 내포되어 있다. 무엇보다도, 전자적인 기업간 거래처리를 위해서는 컴퓨터 시스템간에 구조화된 정보를 교환할 수 있도록 공통의 언어가 제공되어야 한다. 이런 공동의 민어에 대한 문제를 해결하기 위해서 UN/CEFACT와 OASIS에서는 XML(extensible Markup Language)을 기반으로 한 ebXML(electronic business XML)을 차세대 e-비즈니스의 표준으로 제정하고 공동 개발하기로 함에 따라 전서계 단일 전자상거래시장 구축이 XML기반으로 이루어 질 수 있게 되었다. 이에 본 논문에서는 ebXML 도큐먼트 편집을 위해 XML을 기반으로 한 e-business 문서 생성을 위한 XML 편집기, XML DTD 생성기, XML Schema를 작성하기 위한 Schema 편집기 등의 통합 ebXML문서 편집 시스템을 설계 및 구현 하였다.

  • PDF

선박 ECDIS의 올바른 사용을 위한 지침 분석

  • Lee, Bo-Gyeong;Kim, Dae-Hae;Jo, Ik-Sun
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2016.05a
    • /
    • pp.64-66
    • /
    • 2016
  • 국제항해에 종사하는 500GT 이상의 상선에서는 SOLAS에 의해 2012년부터 ECDIS 사용이 강제화 됨에 따라 선박에서는 기존의 종이해도 대신 전자해도를 가지고 항해해야하는 극적인 변화를 맞이하였다. ECDIS는 H/W, S/W, data가 어우러진 복합전자장비로서 안전항해를 위해서는 ECDIS에서 제공받는 데이터의 신뢰성, 시스템 안전성을 확보하고 ECDIS에 대한 사용자의 올바른 이해와 숙련이 매우 중요하다. 하지만 선박에 새로운 항해 장비가 등장함으로서 예측하지 못한 다양한 문제가 식별되었고 또다른 문제가 추가적으로 발견될 수 있는 위험성을 내포하고 있다. 이러한 문제점을 보완하고 ECDIS 장비의 안정적인 선박 도입이 진행되도록 IMO에서는 식별된 ECDIS의 이상현상과 주의사항에 대한 회람문서가 하나의 통합된 문서로 발행되었다. 이 연구에서는ECDIS의 올바른 사용을 위해서 2015년 발행된 통합 ECDIS 회람문서 'ECDIS-Good Practice'를 바탕으로 선박에서 ECDIS를 안정적으로 사용하고 받아들일 수 있는 방법에 대해서 분석하였다.

  • PDF

XML Clustering Technique by Genetic Algorithm (유전자 알고리즘을 통한 XML 군집화 방법)

  • Kim, Woo-Saeng
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.1-7
    • /
    • 2012
  • Recently, researches are studied in developing efficient techniques for accessing, querying, and managing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. An element of a XML document corresponds to a node of the corresponding tree and an inclusion relationship of the document corresponds to a relationship between parent and child node of the tree. Therefore, similar XML documents are similar to the node's name and level of the corresponding trees. We make evaluation function with this characteristic to cluster XML documents by genetic algorithm. The experiment shows that our proposed method has better performance than other existing methods.

Implementation of SGML Basic Parser (SGML(Standardized Genernal Markup Language)에 대한 기본 파서의 구현)

  • 홍은선;정회경;이수연
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.5
    • /
    • pp.495-508
    • /
    • 1992
  • This paper describes on implementation of SGML(Standardized General Markup Language )parser, which can analize SGML documents and its DTD(Document Type Deflnitton) defined according to the SGML( ISO 8879) .We have constructed a yacc definition file to present the rules of SGML DTD and documents, by which incoming SGML DTD and documents can be parsed Into the appropriate tokens. with the tokens a database with the stuctures such as entity table, element table and so on is built to vali-date the logical structure of the inconung SGML documents. The additional functions of this parser Include the automatic transforminng of the incoming documents with the short references into the complete SGML documents. Several test SGML documents have tested to clarify an implementation of this parser and experimental results are satisfactory.

  • PDF

The Study of Storing and Query Processing Strategy based on Transition of XML to RDF (XML의 RDF 변환과 저장 및 질의 처리에 관한 연구)

  • 김연희;김병곤;이재호;임해철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10b
    • /
    • pp.154-156
    • /
    • 2003
  • 웹 상의 데이터 표현 및 교환의 표준으로 각광받는 XML은 논리적 구조와 내용 정보를 이용하여 보다 정확한 검색이 가능하다. 그러나 더욱 빠른 속도로 증가하는 많은 양의 데이터에 대해 보다 정확하고 풍부한 검색을 하기 위해서 메타데이터를 활용하는 방법이 고려되었고, RDF와 같은 메타데이터 기술 언어들에 대한 연구가 많이 이루어지고 있다. RDF는 XML의 문법 구조를 이용하여 작성되므로 XML 문서를 RDF 형태로 작성한다던가, 약간의 수정을 통해 기존 XML 문서를 RDF 형태로 변환하는 것이 가능하다. XML의 RDF 변환은 여러 이점 때문에 앞으로 활성화될 가능성이 크기 때문에 RDF의 특성을 고려한 저장 및 검색에 대한 연구가 필요하다. 따라서 본 논문에서는 XML을 기본적인 RDF 형태로 변환하는 기본적인 규칙을 소개하고 변환된 RDF 문서를 위한 저장 구조를 제안한다. 제안한 저장 구조는 기존 웹 애플리케이션과의 쉬운 연동을 위하여 관계형 데이터베이스를 기반으로 구성되며, 리소스/속성/값의 RDF 기본 구조를 고려한 세 종류의 테이블로 구성된다. 또한 본 논문에서는 RDF 문서에 대한 키워드 질의 처리를 고려하여, 질의 처리 결과의 단위를 리소스로 정의한다. 그리고 주어진 키워드들에 대한 질의 처리 결과로 반환된 리소스들 간의 중요도를 평가하기 위하여 키워드간의 근접도, 키워드 내포 정도, 다양한 속성 관계를 맺고 있는 다른 리소스들을 고려한 랭킹 평가 기법을 제안한다.

  • PDF

Harmful Document Classification Using the Harmful Word Filtering and SVM (유해어 필터링과 SVM을 이용한 유해 문서 분류 시스템)

  • Lee, Won-Hee;Chung, Sung-Jong;An, Dong-Un
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.85-92
    • /
    • 2009
  • As World Wide Web is more popularized nowadays, the environment is flooded with the information through the web pages. However, despite such convenience of web, it is also creating many problems due to uncontrolled flood of information. The pornographic, violent and other harmful information freely available to the youth, who must be protected by the society, or other users who lack the power of judgment or self-control is creating serious social problems. To resolve those harmful words, various methods proposed and studied. This paper proposes and implements the protecting system that it protects internet youth user from harmful contents. To classify effective harmful/harmless contents, this system uses two step classification systems that is harmful word filtering and SVM learning based filtering. We achieved result that the average precision of 92.1%.

Representative Keyword Extraction from Few Documents through Fuzzy Inference (퍼지추론을 이용한 소수 문서의 대표 키워드 추출)

  • 노순억;김병만;허남철
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.9
    • /
    • pp.837-843
    • /
    • 2001
  • In this work, we propose a new method of extracting and weighting representative keywords(RKs) from a few documents that might interest a user. In order to extract RKs, we first extract candidate terms and them choose a number of terms called initial representative keywords (IRKs) from them through fuzzy inference. Then, by expanding and reweighting IRKs using term co-occurrence similarity, the final RKs are obtained. Performance of our approach is heavily influenced by effectiveness of selection method of IRKs so that we choose fuzzy inference because it is more effective in handling the uncertainty inherent in selecting representative keywords of documents. The problem addressed in this paper can be viewed as the one of calculating center of document vectors. So, to show the usefulness of our approach, we compare with two famous methods - Rocchio and Widrow-Hoff - on a number of documents collections. The result show that our approach outperforms the other approaches.

  • PDF