• Title/Summary/Keyword: Document searching

Search Result 170, Processing Time 0.025 seconds

Fiber Identification via the TISS and DELTA Systems (TISS system 및 DELTA system에 의한 섬유식별)

  • 전수경
    • Journal of the Korea Furniture Society
    • /
    • v.10 no.1
    • /
    • pp.1-12
    • /
    • 1999
  • Of the vast number of plant taxa in the world, the wood is one of the most useful resources. It is important to identify the fibers of wood and pulp for the plant taxonomy and for the uses, but we do not have enough information on them, on them, especially for the computerizd data. The fiber identification is one of the difficult tasks. In addition to the plant taxonomy and the fiber-using industries, such identification is also important in many other fields, including education. document examiners, etc. For these purpose, the fibers should be exactly distinguished. The TISS system I have programed to identify various woods would also be useful in the identification of fibers by the genus and species in the features of unknown samples and in searching the features of a species based on its scientific name. Such searching programs are being developed in many other countries with a view to searching for the species name by using the features of the cells of the woody materials. With the survey of all the available literature, the features of the fibers of 124 species both of softwood and hardwood were examined under the electron and optical microscopies. Each species were coded and carded by the feature, and the databases were built. The microscopic were inputted into a personal computer program called and by a slide film scanner. The new computer program called TISS 2 was developed using C computer language. Korean language fonts were added to the TISS 2. The TISS 2 can be in adding and searching a image of fiber features both of a known fiber and an unknown fiber. The databases were corded for the DELTA system with was developed by Dallwitz and Paine in Australia, 1986.

  • PDF

Composite Document Object Retrieval and Searching System-[IN2] DOR (복합문서 개체 검색 시스템- [IN2] DOR)

  • Ahn, Tae-Sung;Yim, Joong-Su;Kim, Myung-Hoon;Ahn, Woo-Ram;Lee, Kyung-Il
    • Annual Conference on Human and Language Technology
    • /
    • 2003.10d
    • /
    • pp.113-118
    • /
    • 2003
  • 기존 문서 검색 시스템의 경우 단순히 문서 내에서 텍스트를 추출한 후 그 텍스트를 색인, 검색하는 형태를 가지고 있었다. 본 논문에서는 MS Word, Excel, HWP 등 다양한 형태의 문서에서 텍스트, 표, 이미지, 차트, 동영상 등의 문서 개체를 분석, 색인하고 이를 검색하는 시스템의 개발 방법을 제외하였다. 제안된 시스템은 문서의 내부 자료 구조를 CDML(Composite Document Markup Language)로 변환하고, 이를 색인, 저장함으로 기존의 전문 검색 시스템의 한계를 효과적으로 극복했으며, 문서 내의 검색 대상 개체로 자동 이동하고 하일라이팅 시키는 기술을 구현함으로 사용자 편익성을 높였다. 개발된 시스템의 성능을 평가한 결과, 다양한 문서 형식에 대해 평균 97% 이상의 CDML변환 성공률과 개체 검색 성공률을 보였으며, 이진 파일에서 직접 개체를 추출함으로 매우 높은 분석 및 색인 속도가 달성되었음을 확인할 수 있었다. 본 논문에서 소개된 새로운 패러다임의 문서 검색 솔루션을 통해 다양한 기술적 상업적 파급 효과가 기대되고 있다.

  • PDF

Evaluation of Mobile Unified Search Contents of Naver and Google Korea (네이버와 구글의 모바일 통합 검색 컨텐츠 평가)

  • Park, So-Yeon
    • Journal of Korean Library and Information Science Society
    • /
    • v.42 no.4
    • /
    • pp.263-280
    • /
    • 2011
  • This study aims to investigate current status of mobile search services of Korean search portals, and analyze mobile unified search contents of Naver and Google Korea. In particular, this study analyzed characteristics of mobile unified search such as number of retrieved documents, collection distribution, and yearly distribution. Also, documents were evaluated in terms of relevance, credibility, and currency. This study compared quality of Naver's unified Web best and unified Web, and Google's best Web documents and Web documents. The correlation between document's ranking and document's relevance was analyzed. The results of this study can be implemented to the portal's effective development of mobile search service.

Dynamic index storage and integrated searching service development (동적 색인 스토리지 및 통합 검색 서비스 개발)

  • Lee, Wang-Woo;Lee, Seok-Hyoung;Choe, Ho-Seop;Yoon, Hwa-Mook;Kim, Jong-Hwan;Hur, Yoon-Young
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.346-349
    • /
    • 2007
  • In this paper, the integrated search system made for the web news and review retrieval service is introduced. We made XSLTRobot that extract title, date, author and content from html document like news or reviews for search service. XSLTRobot used the XSLT technology in order to extract desired part of html page. The Intergrated Information Retrieval System(IIRS) is suitable for various search data format. And we introduce Dynamic Index Storage which is module of IIRS. Dynamic Index Storage is used to environment which needs fast index update like news. And it's design focused on retrieval performance because there was not many document that it has to update on a real time.

  • PDF

A Way to Speed up Evaluation of Path-oriented Queries using An Abbreviation-paths and An Extendible Hashing Technique (단축-경로와 확장성 해싱 기법을 이용한 경로-지향 질의의 평가속도 개선 방법)

  • Park Hee-Sook;Cho Woo-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1409-1416
    • /
    • 2004
  • Recently, due to the popularity and explosive growth of the Internet, information exchange is increasing dramatically over the Internet. Also the XML is becoming a standard as well as a major tool of data exchange on the Internet. so that in retrieving the XML document. the problem for speeding up evaluation of path-oriented queries is a main issue. In this paper, we propose a new indexing technique to advance the searching performance of path-oriented queries in document databases. In the new indexing technique, an abbreviation-path file to perform path-oriented queries efficiently is generated which is able to use its hash-code value to index keys. Also this technique can be further enhanced by combining the Extendible Hashing technique with the abbreviation path file to expedite a speed up evaluation of retrieval.

Text-Mining Analyses of News Articles on Schizophrenia (조현병 관련 주요 일간지 기사에 대한 텍스트 마이닝 분석)

  • Nam, Hee Jung;Ryu, Seunghyong
    • Korean Journal of Schizophrenia Research
    • /
    • v.23 no.2
    • /
    • pp.58-64
    • /
    • 2020
  • Objectives: In this study, we conducted an exploratory analysis of the current media trends on schizophrenia using text-mining methods. Methods: First, web-crawling techniques extracted text data from 575 news articles in 10 major newspapers between 2018 and 2019, which were selected by searching "schizophrenia" in the Naver News. We had developed document-term matrix (DTM) and/or term-document matrix (TDM) through pre-processing techniques. Through the use of DTM and TDM, frequency analysis, co-occurrence network analysis, and topic model analysis were conducted. Results: Frequency analysis showed that keywords such as "police," "mental illness," "admission," "patient," "crime," "apartment," "lethal weapon," "treatment," "Jinju," and "residents" were frequently mentioned in news articles on schizophrenia. Within the article text, many of these keywords were highly correlated with the term "schizophrenia" and were also interconnected with each other in the co-occurrence network. The latent Dirichlet allocation model presented 10 topics comprising a combination of keywords: "police-Jinju," "hospital-admission," "research-finding," "care-center," "schizophrenia-symptom," "society-issue," "family-mind," "woman-school," and "disabled-facilities." Conclusion: The results of the present study highlight that in recent years, the media has been reporting violence in patients with schizophrenia, thereby raising an important issue of hospitalization and community management of patients with schizophrenia.

Experiment of Searching Candidate Text Pair for Searching Similar Texts among Massive Document Repository (대용량 문서 집합에서 유사문서 탐색을 위한 후보 문서 쌍 검색 실험)

  • Park, Sun-Young;Chung, Woo-Keun;Cho, Hwan-Gue
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.275-278
    • /
    • 2010
  • 문서 표절과 관련된 이슈가 급증함에 따라 유사 문서 탐색과 관련한 연구가 활발히 진행되고 있다. 특히 인터넷의 발달로 인해 일반 사용자가 수많은 전자 문서에 쉽게 접근할 수 있게 됨에 따라 대용량 문서 집합에 대한 탐색 속도와 정확성의 중요성도 커지고 있다. 대용량 문서 집합 내에서 빠른 시간 내에 유사 문서를 탐색하는 방법에는 전역 사전을 이용하여 후보 문서 쌍(유사할 가능성이 높은 문서의 쌍)를 추출한 후 찾아낸 후보 문서 쌍에만 정밀한 검사를 수행함으로써 검사 시간을 줄이는 방법이 존재한다. 이 때, 후보 문서를 찾아내기 위하여 전역 사전(Global DICtionary, GDIC)이라는 자료 구조를 이용하게 되는데, 이 전역 사전을 효과적으로 사용하면 후보 문서 쌍을 찾아내는 시간을 기존보다 더욱 줄일 수 있다. 본 논문에서는 전역 사전을 더욱 효과적으로 활용하여 후보 문서 쌍 검색 시간을 대폭 줄이는 방법에 대해 기술하며, 어느 정도의 성능 향상이 있는지 실험을 통해 측정하였다. 20,000건의 실험용 말뭉치 자료와 6263건의 실존하는 보고 문서에 대해 실험한 결과, GDIC 생성에서 2.5~4,6%, 후보 문서 쌍 탐색에서 1%~15.4% 정도의 성능이 향상된 것을 확인할 수 있었다. 추후 update query를 최소화하여 GDIC 생성시간을 추가로 줄이는 방법에 대해 연구할 계획이다.

  • PDF

A Proposal of Methods for Extracting Temporal Information of History-related Web Document based on Historical Objects Using Machine Learning Techniques (역사객체 기반의 기계학습 기법을 활용한 웹 문서의 시간정보 추출 방안 제안)

  • Lee, Jun;KWON, YongJin
    • Journal of Internet Computing and Services
    • /
    • v.16 no.4
    • /
    • pp.39-50
    • /
    • 2015
  • In information retrieval process through search engine, some users want to retrieve several documents that are corresponding with specific time period situation. For example, if user wants to search a document that contains the situation before 'Japanese invasions of Korea era', he may use the keyword 'Japanese invasions of Korea' by using searching query. Then, search engine gives all of documents about 'Japanese invasions of Korea' disregarding time period in order. It makes user to do an additional work. In addition, a large percentage of cases which is related to historical documents have different time period between generation date of a document and record time of contents. If time period in document contents can be extracted, it may facilitate effective information for retrieval and various applications. Consequently, we pursue a research extracting time period of Joseon era's historical documents by using historic literature for Joseon era in order to deduct the time period corresponding with document content in this paper. We define historical objects based on historic literature that was collected from web and confirm a possibility of extracting time period of web document by machine learning techniques. In addition to the machine learning techniques, we propose and apply the similarity filtering based on the comparison between the historical objects. Finally, we'll evaluate the result of temporal indexing accuracy and improvement.

An Efficient Preprocessing System for Searching Similar Texts among Massive Document Repository (대용량 문서 집합에서 유사 문서 탐색을 위한 효과적인 전처리 시스템의 설계)

  • Park, Sun-Young;Kim, Ji-Hun;Kim, Seon-Yeong;Kim, Hyung-Joon;Cho, Hwan-Gue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.626-630
    • /
    • 2010
  • Since the paper plagiarism has become one of important social issues, it is necessary to develop system for measuring the similarity between papers. The speed and accuracy of the system are very important features. So many researchers are studying the features. In this paper, we propose a preprocessing method using 'Global Dictionary' model to enhance performance of the system. The global dictionary includes information of all words in the document repository. The system uses the model to find similar papers with low computing time. Finally our experiment showed that a set of more than 20,000 documents could be reduced to about 50 documents drastically by our filtering techniques, which proves the excellence of our system.

A Study on the Improvement of Retrieval Efficiency Based on the CRFMD (공통기술표현포맷에 기반한 다매체자료의 검색효율 향상에 관한 연구)

  • Park, Il-Jong;Jeong, Ki-Tai
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.5-21
    • /
    • 2006
  • In recent years, theories of image and sound analysis have been proposed to work with text retrieval systems and have progressed quickly with the rapid progress in data processing speeds. This study proposes a common representation format for multimedia documents (CRFMD) composed of both images and text to form a single data structure. It also shows that image classification of a given test set is dramatically improved when text features are encoded together with image features. CRFMD might be applicable to other areas of multimedia document retrieval and processing, such as medical image retrieval, World Wide Web searching, and museum collection retrieval.