• Title/Summary/Keyword: databases

Search Result 5,188, Processing Time 0.03 seconds

Implementation issues for Uncertain Relational Databases

  • Yu, Hairong;Ramer, Arthur
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1998.06a
    • /
    • pp.128-133
    • /
    • 1998
  • This paper aims to present some ideas for implementation of Uncertain Relational Databases (URD) which are extensions of classical relational databases. Our system firstly is based on possibility distribution and probability theory to represent and manipulate fuzzy and probabilistic information, secondly adopts flexible mechanisms that allow the management of uncertain data through the resources provided by both available relational database management systems and front-end interfaces, and lastly chooses dynamic SQL to enhance versatility and adjustability of systems.

  • PDF

An assessment of the taxonomic reliability of DNA barcode sequences in publicly available databases

  • Jin, Soyeong;Kim, Kwang Young;Kim, Min-Seok;Park, Chungoo
    • ALGAE
    • /
    • v.35 no.3
    • /
    • pp.293-301
    • /
    • 2020
  • The applications of DNA barcoding have a wide range of uses, such as in taxonomic studies to help elucidate cryptic species and phylogenetic relationships and analyzing environmental samples for biodiversity monitoring and conservation assessments of species. After obtaining the DNA barcode sequences, sequence similarity-based homology analysis is commonly used. This means that the obtained barcode sequences are compared to the DNA barcode reference databases. This bioinformatic analysis necessarily implies that the overall quantity and quality of the reference databases must be stringently monitored to not have an adverse impact on the accuracy of species identification. With the development of next-generation sequencing techniques, a noticeably large number of DNA barcode sequences have been produced and are stored in online databases, but their degree of validity, accuracy, and reliability have not been extensively investigated. In this study, we investigated the extent to which the amount and types of erroneous barcode sequences were deposited in publicly accessible databases. Over 4.1 million sequences were investigated in three largescale DNA barcode databases (NCBI GenBank, Barcode of Life Data System [BOLD], and Protist Ribosomal Reference database [PR2]) for four major DNA barcodes (cytochrome c oxidase subunit 1 [COI], internal transcribed spacer [ITS], ribulose bisphosphate carboxylase large chain [rbcL], and 18S ribosomal RNA [18S rRNA]); approximately 2% of erroneous barcode sequences were found and their taxonomic distributions were uneven. Consequently, our present findings provide compelling evidence of data quality problems along with insufficient and unreliable annotation of taxonomic data in DNA barcode databases. Therefore, we suggest that if ambiguous taxa are presented during barcoding analysis, further validation with other DNA barcode loci or morphological characters should be mandated.

Full-text databases as a means for resource sharing (자원공유 수단으로서의 전문 데이터베이스)

  • 노진구
    • Journal of Korean Library and Information Science Society
    • /
    • v.24
    • /
    • pp.45-79
    • /
    • 1996
  • Rising publication costs and declining financial resources have resulted in renewed interest among librarians in resource sharing. Although the idea of sharing resources is not new, there is a sense of urgency not seen in the past. Driven by rising publication costs and static and often shrinking budgets, librarians are embracing resource sharing as an idea whose time may finally have come. Resource sharing in electronic environments is creating a shift in the concept of the library as a warehouse of print-based collection to the idea of the library as the point of access to need information. Much of the library's material will be delivered in electronic form, or printed. In this new paradigm libraries can not be expected to su n.0, pport research from their own collections. These changes, along with improved communications, computerization of administrative functions, fax and digital delivery of articles, advancement of data storage technologies, are improving the procedures and means for delivering needed information to library users. In short, for resource sharing to be truly effective and efficient, however, automation and data communication are essential. The possibility of using full-text online databases as a su n.0, pplement to interlibrary loan for document delivery is examined. At this point, this article presents possibility of using full-text online databases as a means to interlibrary loan for document delivery. The findings of the study can be summarized as follows : First, turn-around time and the cost of getting a hard copy of a journal article from online full-text databases was comparable to the other document delivery services. Second, the use of full-text online databases should be considered as a method for promoting interlibrary loan services, as it is more cost-effective and labour saving. Third, for full-text databases to work as a document delivery system the databases must contain as many periodicals as possible and be loaded on as many systems as possible. Forth, to contain many scholarly research journals on full-text databases, we need guidelines to cover electronic document delivery, electronic reserves. Fifth, to be a full full-text database, more advanced information technologies are really needed.

  • PDF

Evaluation of the Redundancy in Decoy Database Generation for Tandem Mass Analysis (탠덤 질량 분석을 위한 디코이 데이터베이스 생성 방법의 중복성 관점에서의 성능 평가)

  • Li, Honglan;Liu, Duanhui;Lee, Kiwook;Hwang, Kyu-Baek
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.56-60
    • /
    • 2016
  • Peptide identification in tandem mass spectrometry is usually done by searching the spectra against target databases consisting of reference protein sequences. To control false discovery rates for high-confidence peptide identification, spectra are also searched against decoy databases constructed by permuting reference protein sequences. In this case, a peptide of the same sequence could be included in both the target and the decoy databases or multiple entries of a same peptide could exist in the decoy database. These phenomena make the protein identification problem complicated. Thus, it is important to minimize the number of such redundant peptides for accurate protein identification. In this regard, we examined two popular methods for decoy database generation: 'pseudo-shuffling' and 'pseudo-reversing'. We experimented with target databases of varying sizes and investigated the effect of the maximum number of missed cleavage sites allowed in a peptide (MC), which is one of the parameters for target and decoy database generation. In our experiments, the level of redundancy in decoy databases was proportional to the target database size and the value of MC, due to the increase in the number of short peptides (7 to 10 AA). Moreover, 'pseudo-reversing' always generated decoy databases with lower levels of redundancy compared to 'pseudo-shuffling'.

Topological Consistency for Collapse Operator on Multi-Scale Databases (다중축척 공간 데이터베이스에서 축소연산자를 위한 위상 일관성)

  • 권오제;강혜경;이기준
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2004.10a
    • /
    • pp.27-40
    • /
    • 2004
  • When we derive multi-scale databases from a source spatial database, thegeometries and topological relations in the source database are transformed according to a predefined set of constraints. This means that the derived databases should be checked to see if the constraints are respected during the construction or updates of databases and to maintain the consistency of multi-scale databases. In this paper, we focus on the topological consistency between the source and derived databases, which is one of the important constraints to respect. In particular, we deal with the method of assessment of topological consistency, when 2-dimensional objects are collapsed to 1-dimensional ones. We introduce eight types of topological relations between 2-dimensional objects and 19 topological ones between 1-dimensional objects and propose four different strategies to convert 2-dimensional topological relations in the source database to 1-dimensional ones objects in the target database. With these strategies, we guarantee the topological consistency between multi-scale databases.

  • PDF

Development of middle-ware for integration of heterogeneous databases (이기종 데이터베이스 통합을 위한 미들웨어 개발)

  • Jung, Da-Un;Park, Si-Hyoung;Choo, Young-Yeol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.05a
    • /
    • pp.101-103
    • /
    • 2012
  • Various applications store informations using difference databases according to respective goals. However, in order to serve data in several databases with one application program, it is needed to integrate data in respective databases. This paper describes development of integrated middle-ware system that selects only necessary informations from heterogeneous databases and save them.

  • PDF

이용자 인터페이스에 관한 연구-인터넷 특허정보 데이터베이스를 중심으로-

  • 최경화;이란주
    • Journal of Korean Library and Information Science Society
    • /
    • v.29
    • /
    • pp.213-239
    • /
    • 1998
  • The purpose of this study is to provide more effective utilization of Internet patent information databases and to help develop an user-friendly interface design. Search functions, level of user-friendliness and user-suport have been closely analyzed, using USPTO, QPAT-US, IBM as well as the relatively well-known domestically-developed Patrom. The factors used in this quality evaluation are those resulting from the combination of CD-ROM databases and traditional databases. The followings are the results. Patrom most effectively utilizes the characteristics of databases. USPTO provides the most adequate Interface for a novice user. QPAT-US employs various auxilary services while IBM uses an assortment of search methods and images. In addition, it is expected that the findings will contribute to the development of user-friendly interface databases.

  • PDF

BioStore: A Repository System for Registering and Distributing Public Biology Databases

  • Tae, Hong-Seok;Han, Jeong-Min;Ahn, Bu-Young;Park, Kie-Jung
    • Genomics & Informatics
    • /
    • v.7 no.1
    • /
    • pp.49-51
    • /
    • 2009
  • Although abundant biology data have been accumulated in public biology databases, such as GenBank and PIR, few easy-interface services are provided for users to access or update them. We have developed a system, named BioStore, that is composed of several programs to aid users to not only access public data but also share their own data easily. The service can be used for maintaining a local database as a repository of raw data files of several public databases and distributing the data files to other users. Currently, BioStore manipulates major bio-databases and will expand to include more databases and more useful interfaces.

Methods for Quality Control and Evaluation in the Scientific and Technical Bibliographic Databases (과학기술분야 서지 DB의 품질관리 및 평가 방안: KORDIC의 KRISTAL DB를 중심으로)

  • Lee Jae-Whoan
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.3
    • /
    • pp.109-134
    • /
    • 1997
  • This study discusses the quality issue of large scientific and technical (S&T) bibliographic databases in South Korea. In details, this study develops the criteria to evaluate the quality of S&T bibliographic databases, evaluates the quality of the selected two databases - UN10N DB and SATURN DB of the KORDIC, and finally, suggests both organizational and technical methods for the quality improvement of such bibliographic databases.

  • PDF

Evaluation of 16S rRNA Databases for Taxonomic Assignments Using a Mock Community

  • Park, Sang-Cheol;Won, Sungho
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.24.1-24.4
    • /
    • 2018
  • Taxonomic identification is fundamental to all microbiology studies. Particularly in metagenomics, which identifies the composition of microorganisms using thousands of sequences, its importance is even greater. Identification is inevitably affected by the choice of database. This study was conducted to evaluate the accuracy of three widely used 16S databases-Greengenes, Silva, and EzBioCloud-and to suggest basic guidelines for selecting reference databases. Using public mock community data, each database was used to assign taxonomy and to test its accuracy. We show that EzBioCloud performs well compared with other existing databases.