• Title/Summary/Keyword: Document Filtering

Search Result 96, Processing Time 0.034 seconds

Retrieval Model using Subject Classification Table, User Profile, and LSI (전공분류표, 사용자 프로파일, LSI를 이용한 검색 모델)

  • Woo Seon-Mi
    • The KIPS Transactions:PartD
    • /
    • v.12D no.5 s.101
    • /
    • pp.789-796
    • /
    • 2005
  • Because existing information retrieval systems, in particular library retrieval systems, use 'exact keyword matching' with user's query, they present user with massive results including irrelevant information. So, a user spends extra effort and time to get the relevant information from the results. Thus, this paper will propose SULRM a Retrieval Model using Subject Classification Table, User profile, and LSI(Latent Semantic Indexing), to provide more relevant results. SULRM uses document filtering technique for classified data and document ranking technique for non-classified data in the results of keyword-based retrieval. Filtering technique uses Subject Classification Table, and ranking technique uses user profile and LSI. And, we have performed experiments on the performance of filtering technique, user profile updating method, and document ranking technique using the results of information retrieval system of our university' digital library system. In case that many documents are retrieved proposed techniques are able to provide user with filtered data and ranked data according to user's subject and preference.

Accelerating Keyword Search Processing over XML Documents using Document-level Ranking (문서 단위 순위화를 통한 XML 문서에 대한 키워드 검색 성능 향상)

  • Lee, Hyung-Dong;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.538-550
    • /
    • 2006
  • XML Keyword search enables us to get information easily without knowledge of structure of documents and returns specific and useful partial document results instead of whole documents. Element level query processing makes it possible, but computational complexity, as the number of documents grows, increases significantly overhead costs. In this paper, we present document-level ranking scheme over XML documents which predicts results of element-level processing to reduce processing cost. To do this, we propose the notion of 'keyword proximity' - the correlation of keywords in a document that affects the results of element-level query processing using path information of occurrence nodes and their resemblances - for document ranking process. In benefit of document-centric view, it is possible to reduce processing time using ranked document list or filtering of low scored documents. Our experimental evaluation shows that document-level processing technique using ranked document list is effective and improves performance by the early termination for top-k query.

Query Expansion and Term Weighting Method for Document Filtering (문서필터링을 위한 질의어 확장과 가중치 부여 기법)

  • Shin, Seung-Eun;Kang, Yu-Hwan;Oh, Hyo-Jung;Jang, Myung-Gil;Park, Sang-Kyu;Lee, Jae-Sung;Seo, Young-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.743-750
    • /
    • 2003
  • In this paper, we propose a query expansion and weighting method for document filtering to increase precision of the result of Web search engines. Query expansion for document filtering uses ConceptNet, encyclopedia and documents of 10% high similarity. Term weighting method is used for calculation of query-documents similarity. In the first step, we expand an initial query into the first expanded query using ConceptNet and encyclopedia. And then we weight the first expanded query and calculate the first expanded query-documents similarity. Next, we create the second expanded query using documents of top 10% high similarity and calculate the second expanded query- documents similarity. We combine two similarities from the first and the second step. And then we re-rank the documents according to the combined similarities and filter off non-relevant documents with the lower similarity than the threshold. Our experiments showed that our document filtering method results in a notable improvement in the retrieval effectiveness when measured using both precision-recall and F-Measure.

Harmful Document Classification Using the Harmful Word Filtering and SVM (유해어 필터링과 SVM을 이용한 유해 문서 분류 시스템)

  • Lee, Won-Hee;Chung, Sung-Jong;An, Dong-Un
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.85-92
    • /
    • 2009
  • As World Wide Web is more popularized nowadays, the environment is flooded with the information through the web pages. However, despite such convenience of web, it is also creating many problems due to uncontrolled flood of information. The pornographic, violent and other harmful information freely available to the youth, who must be protected by the society, or other users who lack the power of judgment or self-control is creating serious social problems. To resolve those harmful words, various methods proposed and studied. This paper proposes and implements the protecting system that it protects internet youth user from harmful contents. To classify effective harmful/harmless contents, this system uses two step classification systems that is harmful word filtering and SVM learning based filtering. We achieved result that the average precision of 92.1%.

A Study on Sensitive Information Filtering Requirements for Supporting Original Information Disclosure (원문정보공개 지원을 위한 민감정보 필터링 요건에 관한 연구)

  • Oh, Jin-Kwan;Oh, Seh-La;Choi, Kwang-Hoon;Yim, Jin-Hee
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.17 no.1
    • /
    • pp.51-71
    • /
    • 2017
  • Approximately 10 million electronic approval documents have been released online since the commencement of the original information disclosure service. However, it is practically impossible to carry out an original information disclosure service by confirming a large amount of electronic approval documents to all persons in charge of information disclosure. Recently, some public organizations have been using private information filtering tools to filter personal information at the stage of document production, but the management of different sensitive information has not been managed using solutions. In this study, we set up the advanced direction of the filtering tool by analyzing the filtering tool in use to support the original information disclosure, and redesigned the text of the approval document and the original information disclosure process with the use of the filtering tool.

XML Document Filtering based on Segments (세그먼트 기반의 XML 문서 필터링)

  • Kwon, Joon-Ho;Rao, Praveen;Moon, Bong-Ki;Lee, Suk-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.4
    • /
    • pp.368-378
    • /
    • 2008
  • In recent years, publish-subscribe (pub-sub) systems based on XML document filtering have received much attention. In a typical pub-sub system, subscribed users specify their interest in profiles expressed in the XPath language, and each new content is matched against the user profiles so that the content is delivered to only the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the system is critical to the success of pub-sub services. In this paper, we propose a fast and scalable XML filtering system called SFiST which is an extension of the FiST system. Sharable segments are extracted from twig patterns and stored into the hash-based Segment Table in SFiST system. Segments are used to represent user profiles as Terse Sequences and stored in the Compact Segment Index during filtering. Our experimental study shows that SFiST system has better performance than FiST system in terms of filtering time and memory usage.

Document Filtering Algorithm for Efficient Preprocessing of XML Information Retrieval (XML 정보검색의 효율적 전처리를 위한 문서여과 알고리즘)

  • Kong Yong-Hae;Kim Myung-Sook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.6 no.1
    • /
    • pp.1-11
    • /
    • 2005
  • The paper proposes a preprocessing method for efficient processing of XML queries in information retrieval with a large amount of XML documents. The conventional preprocessing methods filter out XML documents by parsing XML document for keyword of query or by comparing query signatures with signatures of XML document to be generated. But these methods are dependent on a query and are very in efficient for a large amount of XML documents. For this, we generate a universal DTD based on ontology of a domain. The universal DTD is applicable to the XML documents when they contain information of a same domain even when they have different structures and attributes. Then, using the universal DTD, we filter out the XML documents that are not bounded in the domain. We evaluate the performance of this method through experiments.

  • PDF

A Personalized XML Documents Delivery System (사용자 정보에 기반한 XML문서 전달 시스템)

  • 유상원;이형동;김형주
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.5
    • /
    • pp.487-497
    • /
    • 2003
  • There have been many filtering systems covering mail or news. Documents filtered by them consist of general text or HTML. XML is emerging as a new standard for information exchange. So, filtering systems need new approaches in dealing with XML documents. Our system suggests a method to describe user profiles with XML's ability to represent schema and structure. An user profile is made from DTD information and it is supposed to point the specific part of a document conforming to the DTD. More, it is different from the existing systems in extracting part of a document. An user profile is reflected in XML query to get part of an XML document.

Recommendation System using Associative Web Document Classification by Word Frequency and α-Cut (단어 빈도와 α-cut에 의한 연관 웹문서 분류를 이용한 추천 시스템)

  • Jung, Kyung-Yong;Ha, Won-Shik
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.1
    • /
    • pp.282-289
    • /
    • 2008
  • Although there were some technological developments in improving the collaborative filtering, they have yet to fully reflect the actual relation of the items. In this paper, we propose the recommendation system using associative web document classification by word frequency and ${\alpha}$-cut to address the short comings of the collaborative filtering. The proposed method extracts words from web documents through the morpheme analysis and accumulates the weight of term frequency. It makes associative rules and applies the weight of term frequency to its confidence by using Apriori algorithm. And it calculates the similarity among the words using the hypergraph partition. Lastly, it classifies related web document by using ${\alpha}$-cut and calculates similarity by using adjusted cosine similarity. The results show that the proposed method significantly outperforms the existing methods.

The Information Filtering Agent System with a Customized Document Summary (사용자 맞춤의 문서 요약을 제공하는 정보 여과 에이전트 시스템)

  • 조영희;김교정
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.04a
    • /
    • pp.377-386
    • /
    • 2000
  • 현재의 정보 과적재(information overload) 상황은 대량의 정보 가운데서 사용자의 관련 정보에 대한 요청을 도와 불필요한 정보로부터 막기 위한 도구가 매우 필요한 실정이다. 이러한 도구중 대표적으로 사용되는 웹 검색 엔진과 같은 정보 검색 시스템의 단점은 적합한 검색용어를 선택해야만 하는 점과, 결과에 대한 효율적인 요약이 제공되지 않는다는 점이다.따라서 본 논문에서는 이러한 검색 엔진에서의 단점을 보완하여 사용자를 정보 과잉 상황에서의 불필요한 정보로부터 보호하기 위해, 사용자의 프로파일을 기반으로 하여 정보를 개인화된 요약과 함께 제공하는 정보 여과 에이전트(information filtering agent)인 '사용자 맞춤의 문서 요약을 제공하는 정보 여과 에이전트 시스템'을 제안한다.

  • PDF