• Title/Summary/Keyword: Document Databases

Search Result 130, Processing Time 0.022 seconds

A Document Collection Method for More Accurate Search Engine (정확도 높은 검색 엔진을 위한 문서 수집 방법)

  • Ha, Eun-Yong;Gwon, Hui-Yong;Hwang, Ho-Yeong
    • The KIPS Transactions:PartA
    • /
    • v.10A no.5
    • /
    • pp.469-478
    • /
    • 2003
  • Internet information search engines using web robots visit servers conneted to the Internet periodically or non-periodically. They extract and classify data collected according to their own method and construct their database, which are the basis of web information search engines. There procedure are repeated very frequently on the Web. Many search engine sites operate this processing strategically to become popular interneet portal sites which provede users ways how to information on the web. Web search engine contacts to thousands of thousands web servers and maintains its existed databases and navigates to get data about newly connected web servers. But these jobs are decided and conducted by search engines. They run web robots to collect data from web servers without knowledge on the states of web servers. Each search engine issues lots of requests and receives responses from web servers. This is one cause to increase internet traffic on the web. If each web server notify web robots about summary on its public documents and then each web robot runs collecting operations using this summary to the corresponding documents on the web servers, the unnecessary internet traffic is eliminated and also the accuracy of data on search engines will become higher. And the processing overhead concerned with web related jobs on web servers and search engines will become lower. In this paper, a monitoring system on the web server is designed and implemented, which monitors states of documents on the web server and summarizes changes of modified documents and sends the summary information to web robots which want to get documents from the web server. And an efficient web robot on the web search engine is also designed and implemented, which uses the notified summary and gets corresponding documents from the web servers and extracts index and updates its databases.

Efficient Linear Path Query Processing using Information Retrieval Techniques for Large-Scale Heterogeneous XML Documents (정보 검색 기술을 이용한 대규모 이질적인 XML 문서에 대한 효율적인 선형 경로 질의 처리)

  • 박영호;한욱신;황규영
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.540-552
    • /
    • 2004
  • We propose XIR-Linear, a novel method for processing partial match queries on large-scale heterogeneous XML documents using information retrieval (IR) techniques. XPath queries are written in path expressions on a tree structure representing an XML document. An XPath query in its major form is a partial match query. The objective of XIR-Linear is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR-Linear has its basis on the schema-level methods using relational tables and drastically improves their efficiency and scalability using an inverted index technique. The method indexes the labels in label paths as key words in texts, and allows for finding the label paths that match the queries far more efficiently than string match used in conventional methods. We demonstrate the efficiency and scalability of XIR-Linear by comparing it with XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Linear is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions as the number of XML documents increases.

Partitioning and Merging an Index for Efficient XML Keyword Search (효율적 XML키워드 검색을 인덱스 분할 및 합병)

  • Kim, Sung-Jin;Lee, Hyung-Dong;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.754-765
    • /
    • 2006
  • In XML keyword search, a search result is defined as a set of the smallest elements (i.e., least common ancestors) containing all query keywords and a granularity of indexing is an XML element instead of a document. Under the conventional index structure, all least common ancestors produced by the combination of the elements, each of which contains a query keyword, are considered as a search result. In this paper, to avoid unnecessary operations of producing the least common ancestors and reduce query process time, we describe a way to construct a partitioned index composed of several partitions and produce a search result by merging those partitions if necessary. When a search result is restricted to be composed of the least common ancestors whose depths are higher than a given minimum depth, under the proposed partitioned index structure, search systems can reduce the query process time by considering only combinations of the elements belonging to the same partition. Even though the minimum depth is not given or unknown, search systems can obtain a search result with the partitioned index, which requires the same query process time to obtain the search result with non-partitioned index. Our experiment was conducted with the XML documents provided by the DBLP site and INEX2003, and the partitioned index could reduce a substantial amount of query processing time when the minimum depth is given.

Risk of Breast Cancer and Total Malignancies in Rheumatoid Arthritis Patients Undergoing TNF-α Antagonist Therapy: a Meta-analysis of Randomized Control Trials

  • Liu, Yang;Fan, Wei;Chen, Hao;Yu, Ming-Xia
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.8
    • /
    • pp.3403-3410
    • /
    • 2014
  • Context: Interest exits in whether TNF-alpha antagonists increase the risk of breast cancer and total malignancies in patients with rheumatoid arthritis (RA). Objectives: To analyze the risk of malignancies, especially breast cancer, in patients with RA enrolled in randomized control trials (RCTs). Methods: A systematic literature search for RCTs from 1 January 1998 to 1 July 2013 from online databases, such as PubMed, WILEY, EMBASE, ISI web of knowledge and Cochrane Library was conducted. Studies included RCTs that compared the safety of at least one dose of the five TNF-${\alpha}$ antagonists with placebo or methotrexate (MTX) (or TNF-${\alpha}$ antagonists plus MTX vs placebo plus MTX) in RA patients for more than 24 weeks and imported all the references into document management software EndNote${\times}6$. Two independent reviewers selected studies and extracted the data about study design, patients' characteristics and the type, number of all malignancies. Results: 28 RCTs from 34 records with 11,741 patients were analyzed. Of the total, 97 developed at least one malignancy during the double-blind trials, and breast cancer was observed in 17 patients (17.5% of total malignancies). However, there was no statistically significant increased risk observed in either the per protocol (PP) model (OR 0.65, 95%CI [0.22, 1.93]) or the modified intention to treat (mITT) model (OR 0.75, 95%CI [0.25, 2.21]). There were also no significant trend for increased risk of total malignancies on anti-TNF-${\alpha}$ therapy administered at approved doses in either model (OR, 1.06, 95%CI [0.64, 1.75], and OR, 1.30, 95%CI [0.80, 2.14], respectively). As to the two models, modified intention to treat model analysis led to higher estimation than per protocol model analysis. Conclusions: This study did not find a significantly increased risk of breast cancer and total malignancies in adults RA patients treated with TNF-${\alpha}$ antagonists at approved doses. However, it cannot be ignored that more patients developed malignancies with TNF-${\alpha}$ antagonists therapy compared with patients with placebo or MTX, in spite of the lack of statistical significance, so that more strict clinical trials and long-term follow-up are needed, and both mITT and PP analyses should be used in such safety analyses.

Semantic Query Expansion based on Concept Coverage of a Deep Question Category in QA systems (질의 응답 시스템에서 심층적 질의 카테고리의 개념 커버리지에 기반한 의미적 질의 확장)

  • Kim Hae-Jung;Kang Bo-Yeong;Lee Sang-Jo
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.297-303
    • /
    • 2005
  • When confronted with a query, question answering systems endeavor to extract the most exact answers possible by determining the answer type that fits with the key terms used in the query. However, the efficacy of such systems is limited by the fact that the terms used in a query may be in a syntactic form different to that of the same words in a document. In this paper, we present an efficient semantic query expansion methodology based on a question category concept list comprised of terms that are semantically close to terms used in a query. The semantically close terms of a term in a query may be hypernyms, synonyms, or terms in a different syntactic category. The proposed system constructs a concept list for each question type and then builds the concept list for each question category using a learning algorithm. In the question answering experiments on 42,654 Wall Street Journal documents of the TREC collection, the traditional system showed in 0.223 in MRR and the proposed system showed 0.50 superior to the traditional question answering system. The results of the present experiments suggest the promise of the proposed method.

Semantic Search System using Ontology-based Inference (온톨로지기반 추론을 이용한 시맨틱 검색 시스템)

  • Ha Sang-Bum;Park Yong-Tack
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.202-214
    • /
    • 2005
  • The semantic web is the web paradigm that represents not general link of documents but semantics and relation of document. In addition it enables software agents to understand semantics of documents. We propose a semantic search based on inference with ontologies, which has the following characteristics. First, our search engine enables retrieval using explicit ontologies to reason though a search keyword is different from that of documents. Second, although the concept of two ontologies does not match exactly, can be found out similar results from a rule based translator and ontological reasoning. Third, our approach enables search engine to increase accuracy and precision by using explicit ontologies to reason about meanings of documents rather than guessing meanings of documents just by keyword. Fourth, domain ontology enables users to use more detailed queries based on ontology-based automated query generator that has search area and accuracy similar to NLP. Fifth, it enables agents to do automated search not only documents with keyword but also user-preferable information and knowledge from ontologies. It can perform search more accurately than current retrieval systems which use query to databases or keyword matching. We demonstrate our system, which use ontologies and inference based on explicit ontologies, can perform better than keyword matching approach .

Access Control of XML Documents Including Update Operators (갱신 연산을 고려한 XML문서의 접근제어)

  • Lim Chung-Hwan;Park Seog
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.567-584
    • /
    • 2004
  • As XML becomes popular as the way of presenting information on the web, how to secure XML data becomes an important issue. So far study on XML security has focused on security of data communications by using digital sign or encryption technology. But, it now requires not just to communicate secure XML data on communication but also to manage query process to access XML data since XML data becomes more complicated and bigger. We can manage XML data queries by access control technique. Right now current XML data access control only deals with read operation. This approach has no option to process update XML queries. In this paper, we present XML access control model and technique that can support both read and update operations. In this paper, we will propose the operation for XML document update. Also, We will define action type as a new concept to manage authorization information and process update queries. It results in both minimizing access control steps and reducing memory cost. In addition, we can filter queries that have no access rights at the XML data, which it can reduce unnecessary tasks for processing unauthorized query. As a result of the performance evaluation, we show our access control model is proved to be better than other access control model in update query. But it has a little overhead to decide action type in select query.

The Study of Information Strategy Plan to Design OASIS' Future Model (오아시스(전통의학정보포털)의 미래모형 설계를 위한 정보화전략계획 연구)

  • Yea, Sang-Jun;Kim, Chul;Kim, Jin-Hyun;Kim, Sang-Kyun;Jang, Hyun-Chul;Kim, Ik-Tae;Jang, Yun-Ji;Seong, Bo-Seok;Song, Mi-Young
    • Korean Journal of Oriental Medicine
    • /
    • v.17 no.2
    • /
    • pp.63-71
    • /
    • 2011
  • Objectives : We studied the ISP(information strategy plan) of oasis spanning 5 years. From this study we aimed at total road map to upgrade the service systematically and to carry out the related projects. If we do it as road map, oasis will be the core infra service contributing to the improvement of TKM(traditional korean medicine) research capability. Methods : We carried out 3 step ISP method composed of environmental analysis, current status analysis and future plan. We used paper, report and trend analysis document as base materials and did the survey to get opinions from users and TKM experts. We limited this study to drawing the conceptual design of oasis. Results : From environmental analysis we knew that China and USA built up the largest TM databases. We did the survey to get the activation ways of oasis. And we did the benchmarking on the advanced services through current status analysis. Finally we determined 'maximize the research value based the open TKM knowledge infra' as oasis' vision. And we designed oasis' future system which is composed of service layer, application layer and contents layer. Conclusion : First TKM related documents, research materials, researcher information and standards are merged to elevate the TKM information level. Concretely large scale TKM information infra project such as TKM information classification code development, TKM library network building and CAM research information offering are carried out at the same time.

Implementing Space-based Networked Documentation for Donghae-Nambu Railway Areas in Busan Metropolitan City (공간 중심의 연계형 기록화의 실행 방안 부산지역 동해남부선을 사례로)

  • Seol, Moon-won;Kim, Jeong-hyeon
    • The Korean Journal of Archival Studies
    • /
    • no.36
    • /
    • pp.233-269
    • /
    • 2013
  • This study aims to explore practicable and sustainable strategies for locality documentation through networking, linking, and recontextualization of records in digital environments. It applies the 'spanDoc(SPAace-based Networked Documentation) Model to document Donghae-Nambu Railway areas placed in Busan Metropolitan City tentatively. Considering that mobility and openness are the main characteristics of Busan, railway areas and their stations can be proper places for representing such localities of the city. Moreover, there can be much experiences and memories of residents surrounding those areas, because Donghae-Nambu Railway has been used as short distance transportation facilities across the inner city of Busan since 1930's. This study tries to implement the documentation strategy for the selected space, following the procedures of the spanDoc Model. Firstly, this study develops the structure of the subjects through investigating the related information sources and archives from various collecting institutions. Secondly, it carries out records surveys to identify the essential records types for documenting the Donghae-Nambu Railway areas. Thirdly, it describes the subjects and sub-subjects, and the entries of places, people and subjects for adding to the dictionaries. Finally, it links many entities such as subjects, records, and dictionary, and builds the databases regarding the inter-links and systematic accumulation of the outputs of each step.

A Study of Development and Application of an Inland Water Body Training Dataset Using Sentinel-1 SAR Images in Korea (Sentinel-1 SAR 영상을 활용한 국내 내륙 수체 학습 데이터셋 구축 및 알고리즘 적용 연구)

  • Eu-Ru Lee;Hyung-Sup Jung
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1371-1388
    • /
    • 2023
  • Floods are becoming more severe and frequent due to global warming-induced climate change. Water disasters are rising in Korea due to severe rainfall and wet seasons. This makes preventive climate change measures and efficient water catastrophe responses crucial, and synthetic aperture radar satellite imagery can help. This research created 1,423 water body learning datasets for individual water body regions along the Han and Nakdong waterways to reflect domestic water body properties discovered by Sentinel-1 satellite radar imagery. We created a document with exact data annotation criteria for many situations. After the dataset was processed, U-Net, a deep learning model, analyzed water body detection results. The results from applying the learned model to water body locations not involved in the learning process were studied to validate soil water body monitoring on a national scale. The analysis showed that the created water body area detected water bodies accurately (F1-Score: 0.987, Intersection over Union [IoU]: 0.955). Other domestic water body regions not used for training and evaluation showed similar accuracy (F1-Score: 0.941, IoU: 0.89). Both outcomes showed that the computer accurately spotted water bodies in most areas, however tiny streams and gloomy areas had problems. This work should improve water resource change and disaster damage surveillance. Future studies will likely include more water body attribute datasets. Such databases could help manage and monitor water bodies nationwide and shed light on misclassified regions.