• Title/Summary/Keyword: Document searching

Search Result 170, Processing Time 0.024 seconds

Baseline Searching Method for Document Skew Detection (문서 영상의 기울기 검출을 위한 기준선 탐색 기법)

  • Shin, Myoung-Jin;Kim, Do-Hyeon;Cha, Eui-Young
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.2
    • /
    • pp.218-225
    • /
    • 2007
  • This paper presents a technique to detect a document skew that often occurs during document scanning. To correct a skewed document is essential for automatic processing system including character segmentation, character recognition and so on. The proposed algorithm can detect a skew angle exactly by searching characters baselines that have slant information of the document within a candidated area. To reduce processing time, we resized the image small and then established a ROI (region of interest) by morphology operations and connected components analysis. We compared our method with the existing method based on morphology operations and proved correctness and efficiency of the proposed algorithm through experiments and analysis with various kind of document images.

  • PDF

Implementation of Text Summarize Automation Using Document Length Normalization (문서 길이 정규화를 이용한 문서 요약 자동화 시스템 구현)

  • 이재훈;김영천;이성주
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.51-55
    • /
    • 2001
  • With the rapid growth of the World Wide Web and electronic information services, information is becoming available on-Line at an incredible rate. One result is the oft-decried information overload. No one has time to read everything, yet we often have to make critical decisions based on what we are able to assimilate. The technology of automatic text summarization is becoming indispensable for dealing with this problem. Text summarization is the process of distilling the most important information from a source to produce an abridged version for a particular user or task. Information retrieval(IR) is the task of searching a set of documents for some query-relevant documents. On the other hand, text summarization is considered to be the task of searching a document, a set of sentences, for some topic-relevant sentences. In this paper, we show that document information, that is more reliable and suitable for query, using document length normalization of which is gained through information retrieval . Experimental results of this system in newspaper articles show that document length normalization method superior to other methods use query itself.

  • PDF

Query Space Exploration Using Genetic Algorithm

  • Lee, Jae-Hoon;Kim, Young-Cheon;Lee, Sung-Joo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.683-689
    • /
    • 2003
  • Information retrieval must be able to search the most suitable document that user need from document set. If foretell document adaptedness by similarity degree about QL(Query Language) of document, documents that search person does not require are searched. In this paper, showed that can search the most suitable document on user's request searching document of the whole space using genetic algorithm and used knowledge-base operator to solve various model's problem.

  • PDF

Query Space Exploration Model Using Genetic Algorithm

  • Lee, Jae-Hoon;Lee, Sung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.3 no.2
    • /
    • pp.222-226
    • /
    • 2003
  • Information retrieval must be able to search the most suitable document that user need from document set. If foretell document adaptedness by similarity degree about QL(Query Language) of document, documents that search person does not require are searched. In this paper, showed that can search the most suitable document on user's request searching document of the whole space using genetic algorithm and used knowledge-base operator to solve various model's problem.

A Study of the Behaviours in Searching Full-Text Databases- Subject Specialists vs. Professional Searchers - (전문데이터베이스의 탐색특성에 관한 연구 - 주제전문가와 탐색전문가 -)

  • Lee Eung-Bong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.30 no.2
    • /
    • pp.51-86
    • /
    • 1996
  • The primary purpose of this study is to verify the difference of behavioural characteristics between the subject specialists and professional searchers in searching full-text databases. The major findings and conclusions from this study are summarized as follows. Analyses of Search questions(the degree of understanding with search questions, the degree of difficulty in selecting terms, and the degree of expectation of search results), search processes(the number of search terms used, the number of Boolean operators and qualifiers used, the number of documents browsed and the search time(the connecting time, time to spend per one output document, time to spend per one relevant output document) and search results(the searching efficiency(the number of relevant documents, the ,recall ratio and the precision ratio), the search cost(the total search cost. the search cost per one output document and the search cost per one relevant output document) and the degree of satisfaction with search results) are significantly different between the subject specialists and professional searchers in searching full-text databases.

  • PDF

Document Clustering based on Level-wise Stop-word Removing for an Efficient Document Searching (효율적인 문서검색을 위한 레벨별 불용어 제거에 기반한 문서 클러스터링)

  • Joo, Kil Hong;Lee, Won Suk
    • The Journal of Korean Association of Computer Education
    • /
    • v.11 no.3
    • /
    • pp.67-80
    • /
    • 2008
  • Various document categorization methods have been studied to provide a user with an effective way of browsing a large scale of documents. They do compares set of documents into groups of semantically similar documents automatically. However, the automatic categorization method suffers from low accuracy. This thesis proposes a semi-automatic document categorization method based on the domains of documents. Each documents is belongs to its initial domain. All the documents in each domain are recursively clustered in a level-wise manner, so that the category tree of the documents can be founded. To find the clusters of documents, the stop-word of each document is removed on the document frequency of a word in the domain. For each cluster, its cluster keywords are extracted based on the common keywords among the documents, and are used as the category of the domain. Recursively, each cluster is regarded as a specified domain and the same procedure is repeated until it is terminated by a user. In each level of clustering, a user can adjust any incorrectly clustered documents to improve the accuracy of the document categorization.

  • PDF

Evaluation of the Newspaper Library -With Emphasis on the Document Delivery Capability and Retrieval Effectivenss- (신문사 자료실에 대한 평가 -문헌전달능력과 검색효율을 중심으로-)

  • 노동조
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.7 no.1
    • /
    • pp.319-351
    • /
    • 1994
  • This rearch is a case study for the newspaper libraries in Seoul and the primary purpose of the this study are to investigate its document delivery capability. To achieve the above-mentioned purpose, representative rsers visited seven the newspaper library and checked their searching time. Document delivery capability was checked by units of hour, minute, second(searching time). Retrieval effectiveness was tested through the recall ratio and the precision ratio. The major findings of the study are summarized as follows: 1) Most of the newspaper libraries excellent to the document delivery capability; 6 newspaper libraries deliverived the data related subject. 2) The newspaper libraries were came out 50.1% the mean recall ratio and 84.8% the mean precision ratio about the all materials. 3) Concerned their own articles, the newspaper libraries showed 71.4% the recall ratio and 90.0% the precision ratio. That moaned their own articles were more effectived than others. 4) The Kookmin Ilbo library had the most excellent system, and the precision ratio of The Dong-A Ilbo library prior to the recall ratio. The Han Kyoreh Shinmun library had a excellent arragement in own articles, but The Segye Times library had problem in every parties.

  • PDF

A Study on the Depth-Oriented Decomposition Indexing Method for Creating and Searching Structured Documents Based-on XML (XML을 이용한 구조적 문서 생성 및 탐색을 위한 깊이중심분할 색인기법에 관한 연구)

  • Yang, Ok-Yul;Lee, Yong-Ju
    • The KIPS Transactions:PartD
    • /
    • v.9D no.6
    • /
    • pp.1025-1042
    • /
    • 2002
  • The goal of this study is to generate a structured document which improves the performance of an information retrieval system by using thesaurus, information on relations between words (terms), and to study on the technique for searching this structured document. In order to accomplish this goal, we propose a DODI (Depth -Oriented Decomposition Index) technique for the structured document and an algorithm to search for related information efficient]y through this index technique that uses a thesaurus. We establish a storage system by which the structured document generated by this index technique is saved in a database through OpenXML and XML documents are generated through ForXML methods.

A Document Management System That Can Handle over Terabyte Order Data - An Integration of Self Organized Picture Search, 3D Graphics and DVD Changer Control Technology - (테라바이트급 데이터를 축적.검색표시할 수 있는 문서관리 시스템 - 3D 그래픽과 화상검색 및 DVD 체인저 제어기술의 융합 -)

  • Yoshihiro, Mori;Hiroyuki, Nitta;Mitsuji, Inoue;Koji, Kimura;lzuru, Shimamoto;Hiroharu, Ito;Atsushi, Kitamachi
    • Journal of Information Management
    • /
    • v.32 no.1
    • /
    • pp.108-119
    • /
    • 2001
  • Creating digital document by scanning paper or using a digital camera or using a computer is a daily task at every office. Digital document is increasing at high pace. The quantity of digital document is almost beyond the maximum capacity of the online storage and is destroying searching efficiency. To solve these problems, we developed a document management system(ChronoStar) by integrating various searching methods(Picture, Full-Text Related), 3D graphic and a DVD changer.

  • PDF

A Study on the Improvement Model of Document Retrieval Efficiency of Tax Judgment (조세심판 문서 검색 효율 향상 모델에 관한 연구)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.6
    • /
    • pp.41-47
    • /
    • 2019
  • It is very important to search for and obtain an example of a similar judgment in case of court judgment. The existing judge's document search uses a method of searching through key-words entered by the user. However, if it is necessary to input an accurate keyword and the keyword is unknown, it is impossible to search for the necessary document. In addition, the detected document may have different contents. In this paper, we want to improve the effectiveness of the method of vectorizing a document into a three-dimensional space, calculating cosine similarity, and searching close documents in order to search an accurate judge's example. Therefore, after analyzing the similarity of words used in the judge's example, a method is provided for extracting the mode and inserting it into the text of the text, thereby providing a method for improving the cosine similarity of the document to be retrieved. It is hoped that users will be able to provide a fast, accurate search trying to find an example of a tax-related judge through the proposed model.