• Title/Summary/Keyword: Document Search

Search Result 384, Processing Time 0.023 seconds

A Study on the online of PDF Electronic Documents System (인터넷 원거리출판의 응용과 PDF의 인쇄활용에 관한 연구)

  • 유영수;강영립;김병현;이광수
    • Proceedings of the Korean Printing Society Conference
    • /
    • 2001.06a
    • /
    • pp.63-77
    • /
    • 2001
  • PDF(Portable Document Format) is a file format that Adobe advances postscritp technique and use in managing document information or electric publishing(internet, CD-ROM, DVD). PDF is a devised document type for being able to read and print anywhere, independent of OS, printer type, resolution, and the kind of computer etc. Because this includes a compressing function, it transfers document through a small size of file in internet or intranet. In addition, that is a file format has various advantages-sharing of information and transfering documents in on line or off line environment. In this paper, we developed electronic document system using PDF format. Electronic document system consists of filter, automatic indexing, special searching system and web server. The information used in this paper is database made using Zwon\`s DocuCom. The filter recognizes various kinds of document structure. And according to property of document, it produces ASCII output. In addition to processing various formats of document, the filter can extract keywords in documents of MS WORD, Excel, Powerpoint, PDF, CAD etc. This filter uses the structure of window printer drive and can extract the information for text, page, font type and size from relevant document. The automatic indexing recognizes the formatted tag of document form ASCII text produced by filter and extracts adequate keyword to structure and property of document. PDF electronic document systems proposed in this paper can be used in Internet, PC communication. Users can choose and read electronic documents by two ways. First, users can choose and read relevant books using PDF electronic document homepage. Second, users can use PDF integrated-search system. User can search after inputing keyword and choose reference field and type of data. But, now, PDF products of Adobe can\`t support the Korean character. If this problem is resolved, we thick that PDF applications system looks active. Although there is limited function in case of using Zwon DocuCom used in this study, we think that there isn\`t a great deal of difficulty in electronic document and building digital database.

  • PDF

A Tensor Space Model based Semantic Search Technique (텐서공간모델 기반 시멘틱 검색 기법)

  • Hong, Kee-Joo;Kim, Han-Joon;Chang, Jae-Young;Chun, Jong-Hoon
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.4
    • /
    • pp.1-14
    • /
    • 2016
  • Semantic search is known as a series of activities and techniques to improve the search accuracy by clearly understanding users' search intent without big cognitive efforts. Usually, semantic search engines requires ontology and semantic metadata to analyze user queries. However, building a particular ontology and semantic metadata intended for large amounts of data is a very time-consuming and costly task. This is why commercialization practices of semantic search are insufficient. In order to resolve this problem, we propose a novel semantic search method which takes advantage of our previous semantic tensor space model. Since each term is represented as the 2nd-order 'document-by-concept' tensor (i.e., matrix), and each concept as the 2nd-order 'document-by-term' tensor in the model, our proposed semantic search method does not require to build ontology. Nevertheless, through extensive experiments using the OHSUMED document collection and SCOPUS journal abstract data, we show that our proposed method outperforms the vector space model-based search method.

Automated networked knowledge map using keyword-based document networks (키워드 기반 문서 네트워크를 이용한 네트워크형 지식지도 자동 구성)

  • Yoo, Keedong
    • Knowledge Management Research
    • /
    • v.19 no.3
    • /
    • pp.47-61
    • /
    • 2018
  • A knowledge map, a taxonomy of knowledge repositories, must have capabilities supporting and enhancing knowledge user's activity to search and select proper knowledge for problem-solving. Conventional knowledge maps, however, have been hierarchically categorized, and could not support such activity that must coincide with the user's cognitive process for knowledge utilization. This paper, therefore, aims to verify and develop a methodology to build a networked knowledge map that can support user's activity to search and retrieve proper knowledge based on the referential navigation between content-relevant knowledge. This paper deploys keywords as the semantic information between knowledge, because they can represent the overall contents of a given document, and because they can play the role of semantic information on the link between related documents. By aggregating links between documents, a document network can be formulated: a keyword-based networked knowledge map can be finally built. Domain expert-based validation test was also conducted on a networked knowledge map of 50 research papers, which confirmed the performance of the proposed methodology to be outstanding with respect to the precision and recall.

Clustering Techniques for XML Data Using Data Mining

  • Kim, Chun-Sik
    • Proceedings of the CALSEC Conference
    • /
    • 2005.03a
    • /
    • pp.189-194
    • /
    • 2005
  • Many studies have been conducted to classify documents, and to extract useful information from documents. However, most search engines have used a keyword based method. This method does not search and classify documents effectively. This paper identifies structures of XML document based on the fact that the XML document has a structural document using a set theory, which is suggested by Broder, and attempts a test for clustering XML document by applying a k-nearest neighbor algorithm. In addition, this study investigates the effectiveness of the clustering technique for large scaled data, compared to the existing bitmap method, by applying a test, which reveals a difference between the clause based documents instead of using a type of vector, in order to measure the similarity between the existing methods.

  • PDF

Keyword Weight based Paragraph Extraction Algorithm (키워드 가중치 기반 문단 추출 알고리즘)

  • Lee, Jongwon;Joo, Sangwoong;Lee, Hyunju;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.504-505
    • /
    • 2017
  • Existing morpheme analyzers classify the words used in writing documents. A system for extracting sentences and paragraphs based on a morpheme analyzer is being developed. However, there are very few systems that compress documents and extract important paragraphs. The algorithm proposed in this paper calculates the weights of the keyword written in the document and extracts the paragraphs containing the keyword. Users can reduce the time to understand the document by reading the paragraphs containing the keyword without reading the entire document. In addition, since the number of extracted paragraphs differs according to the number of keyword used in the search, the user can search various patterns compared to the existing system.

  • PDF

Document Ranking of Web Document Retrieval Systems (웹 정보검색 시스템의 문서 순위 결정)

  • An, Dong-Un;Kang, In-Ho
    • Journal of Information Management
    • /
    • v.34 no.2
    • /
    • pp.55-66
    • /
    • 2003
  • The Web is rich with various sources of information. It contains the contents of documents, multimedia data, shopping materials and so on. Due to the massive and heterogeneous web document collections, users want to find various types of target pages. We can classify user queries as three categories according to users'intent, content search, the site search, and the service search. In this paper, we present that different strategies are needed to meet the need of a user. Also we show the properties of content information, link information and URL information according to the class of a user query. In the content search, content information showed the good result. However, we lost the performance by combining link information and URL information. In the site search, we could increase the performance by combining link information and URL information.

A Document Management System That Can Handle over Terabyte Order Data - An Integration of Self Organized Picture Search, 3D Graphics and DVD Changer Control Technology - (테라바이트급 데이터를 축적.검색표시할 수 있는 문서관리 시스템 - 3D 그래픽과 화상검색 및 DVD 체인저 제어기술의 융합 -)

  • Yoshihiro, Mori;Hiroyuki, Nitta;Mitsuji, Inoue;Koji, Kimura;lzuru, Shimamoto;Hiroharu, Ito;Atsushi, Kitamachi
    • Journal of Information Management
    • /
    • v.32 no.1
    • /
    • pp.108-119
    • /
    • 2001
  • Creating digital document by scanning paper or using a digital camera or using a computer is a daily task at every office. Digital document is increasing at high pace. The quantity of digital document is almost beyond the maximum capacity of the online storage and is destroying searching efficiency. To solve these problems, we developed a document management system(ChronoStar) by integrating various searching methods(Picture, Full-Text Related), 3D graphic and a DVD changer.

  • PDF

Evaluation of Mobile Unified Search Contents of Naver and Google Korea (네이버와 구글의 모바일 통합 검색 컨텐츠 평가)

  • Park, So-Yeon
    • Journal of Korean Library and Information Science Society
    • /
    • v.42 no.4
    • /
    • pp.263-280
    • /
    • 2011
  • This study aims to investigate current status of mobile search services of Korean search portals, and analyze mobile unified search contents of Naver and Google Korea. In particular, this study analyzed characteristics of mobile unified search such as number of retrieved documents, collection distribution, and yearly distribution. Also, documents were evaluated in terms of relevance, credibility, and currency. This study compared quality of Naver's unified Web best and unified Web, and Google's best Web documents and Web documents. The correlation between document's ranking and document's relevance was analyzed. The results of this study can be implemented to the portal's effective development of mobile search service.

Construction of Local Document Management System based on Associative Search

  • Kasagi, Yoshimasa;Yamaguchi, Toru;Takama, Yasufumi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.146-149
    • /
    • 2003
  • As the information that can collect from the web to local database is increasing, we propose a system that can suggest related local documents when new document arrives. We also propose for constructing an association dictionary using web search engines for similarity calculation. The prototype system is also developed, which is described in detail.

  • PDF

An Efficient Search Method For XML document

  • Qian, Xie;Cho, Dong-Sub
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1287-1290
    • /
    • 2011
  • Because of the rapid development of internet, there are more and more documents stored by the XML-based format. When there is a great deal of XML documents, how to get the valuable Information is an important subject. This paper proposes an effective XML document search method to search text contents and structures of XML documents. We build the keyword matrix of text contexts and structure matrixes of structures in XML documents to improve the efficiency of query time. When there is a great deal of XML documents, the search method we propose can improve much efficiency of query time.