• Title/Summary/Keyword: schema for datasets

Search Result 13, Processing Time 0.02 seconds

ShEx Schema Generator for RDF Graphs Created by Direct Mapping

  • Choi, Ji-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.33-43
    • /
    • 2018
  • In this paper, we propose a method to automatically generate the description of an RDF graph structure. The description is expressed in Shape Expression Language (ShEx), which is developed by W3C and provides the syntax for describing the structure of RDF data. The RDF graphs to which this method can be applied are limited to those generated by the direct mapping, which is an algorithm for transforming relational data into RDF by W3C. A relational database consists of its schema including integrity constraints and its instance data. While the instance data can have been published in RDF by some standard methods such as the direct mapping, the translation of the schema has been missing so far. Unlike the users on relational databases, the ones on RDF datasets were forced to write repeated vague SPARQL queries over the datasets to acquire the exact results. This is because the schema for RDF data has not been provided to the users. The ShEx documents generated by our method can be referred as the schema on writing SPARQL queries. They also can validate data on RDF graph update operations with ShEx validators. In other words, they can work as the integrity constraints in relational databases.

R2RML Based ShEx Schema

  • Choi, Ji-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.45-55
    • /
    • 2018
  • R2RML is a W3C standard language that defines how to expose the relational data as RDF triples. The output from an R2RML mapping is only an RDF dataset. By definition, the dataset has no schema. The lack of schema makes the dataset in linked data portal impractical for integrating and analyzing data. To address this issue, we propose an approach for generating automatically schemas for RDF graphs populated by R2RML mappings. More precisely, we represent the schema using ShEx, which is a language for validating and describing RDF. Our approach allows to generate ShEx schemas as well as RDF datasets from R2RML mappings. Our ShEx schema can provide benefits for both data providers and ordinary users. Data providers can verify and guarantee the structural integrity of the dataset against the schema. Users can write SPARQL queries efficiently by referring to the schema. In this paper, we describe data structures and algorithms of the system to derive ShEx documents from R2RML documents and presents a brief demonstration regarding its proper use.

A Study on Recent Trends in Building Linked Data for Overseas Libraries: Focusing on Published Datasets, Reused Vocabulary, and Interlinked External Datasets (해외 도서관 링크드 데이터 구축의 최근 동향 연구 - 발행 데이터세트, 재사용 어휘집, 인터링킹 외부 데이터세트를 중심으로 -)

  • Sung-Sook Lee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.4
    • /
    • pp.5-28
    • /
    • 2022
  • In this study, LD construction cases of overseas libraries were analyzed with focus on published datasets, reused vocabulary, and interlinked external datasets, and based on the analysis results, basic data on LD construction plans of domestic libraries were obtained. As a result of the analysis of 21 library cases, overseas libraries have established a faithful authority LD and conducted new services using published LDs. To this end, overseas libraries collaborated with other libraries and cultural institutions within the region, within the country, and nationally under the leadership of the library, and based on this cooperation, a specialized dataset was published. Overseas libraries used Schema.org to increase the visibility of published LDs, and used BIBFRAME for subdivision of description to define various entities and build LDs based on the defined entities. Overseas libraries have utilized various defined entities to link related information, display results, browse, and download in bulk. Overseas libraries were interested in the continuous up-to-date of interlinked external datasets, and directly utilized external data to reinforce catalog information. In this study, based on the derived implications, points to be considered when issuing LDs by domestic libraries were proposed. The research results can be used as basic data when future domestic libraries plan LD services or upgrade existing services.

Vector space based augmented structural kinematic feature descriptor for human activity recognition in videos

  • Dharmalingam, Sowmiya;Palanisamy, Anandhakumar
    • ETRI Journal
    • /
    • v.40 no.4
    • /
    • pp.499-510
    • /
    • 2018
  • A vector space based augmented structural kinematic (VSASK) feature descriptor is proposed for human activity recognition. An action descriptor is built by integrating the structural and kinematic properties of the actor using vector space based augmented matrix representation. Using the local or global information separately may not provide sufficient action characteristics. The proposed action descriptor combines both the local (pose) and global (position and velocity) features using augmented matrix schema and thereby increases the robustness of the descriptor. A multiclass support vector machine (SVM) is used to learn each action descriptor for the corresponding activity classification and understanding. The performance of the proposed descriptor is experimentally analyzed using the Weizmann and KTH datasets. The average recognition rate for the Weizmann and KTH datasets is 100% and 99.89%, respectively. The computational time for the proposed descriptor learning is 0.003 seconds, which is an improvement of approximately 1.4% over the existing methods.

An Approach for Integrated Modeling of Protein Data using a Fact Constellation Schema and a Tree based XML Model (Fact constellation 스키마와 트리 기반 XML 모델을 적용한 실험실 레벨의 단백질 데이터 통합 기법)

  • Park, Sung-Hee;Li, Rong-Hua;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.519-532
    • /
    • 2004
  • With the explosion of bioinformatics data such proteins and genes, biologists need a integrated system to analyze and organize large datasets that interact with heterogeneous types of biological data. In this paper, we propose a integration system based on a mediated data warehouse architecture using a XML model in order to combine protein related data at biology laboratories. A fact constellation model in this system is used at a common model for integration and an integrated schema it translated to a XML schema. In addition, to track source changes and provenance of data in an integrated database employ incremental update and management of sequence version. This paper shows modeling of integration for protein structures, sequences and classification of structures using the proposed system.

A Study on METS Design Using DDI Metadata (DDI 메타데이터를 활용한 METS 설계에 관한 연구)

  • Park, Jin Ho
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.4
    • /
    • pp.153-171
    • /
    • 2021
  • This study suggested a method of utilizing METS based on DDI metadata to manage, preserve, and service datasets. DDI is a standard for statistical data processing, and there are currently two versions of DDI Codebook (DDI-C) and DDI Lifecycle (DDI-L). In this study, the main elements of DDI-C were mainly used. First the structures and elements of METS and DDI-C were first analyzed. And the mapping of the major elements of METS and DDI-C. The standard was finally taken as METS, the format to express it. Since METS and DDI-C do not show a perfect 1:1 mapping, the DDI-C element that best matches each element of the standard METS was selected. As a result, a new dataset management transmission standard METS using DDI-C metadata elements was designed and presented.

CDISC Extension for Supporting Multinational Clinical Trials (다국적 임상시험 지원을 위한 CDISC 표준의 확장)

  • Yeom, Ji-Hyeon;Chai, In-Young;Kim, Suk-Il;Kim, Hyeak-Man
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.8
    • /
    • pp.566-575
    • /
    • 2009
  • Clinical Data Interchange Standards Consortium (CDISC) developed global and platform-independent data standards to improve ineffective processes of clinical trial studies. Regardless of its objective toward global cooperation, the current version of the CDISC standard cannot describe clinical trial data in various languages for multi-national investigators or reviewers. This problem applies not only to tabulated datasets in Study Data Tabulation Model (SDTM) but also to extensible markup language representation of the datasets in Operational Data Model (ODM) instances. In order to address this issue, we propose to extend the current version of SDTM and ODM to collect clinical data for multi-national clinical trials. SDTM needs to have new special-purpose domain for multi-language representation purpose. Additionally, ODM is recommended to extend its XML schema using subtyping or type inheritance mechanism respectively. Our extension of SDTM and ODM enable to represent any granule of study data tabulation model or XML data entities to describe in efficient languages. This result will contribute to collect multi-language data easily for multi-national clinical trials.

Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

  • Jung, Yong;Seo, Hwa-Jeong;Park, Yu-Rang;Kim, Ji-Hun;Bien, Sang Jay;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • v.9 no.1
    • /
    • pp.19-27
    • /
    • 2011
  • Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/.

An Optimization Technique for RDFS Inference the Applied Order of RDF Schema Entailment Rules (RDF 스키마 함의 규칙 적용 순서를 이용한 RDFS 추론 엔진의 최적화)

  • Kim, Ki-Sung;Yoo, Sang-Won;Lee, Tae-Whi;Kim, Hyung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.151-162
    • /
    • 2006
  • RDF Semantics, one of W3C Recommendations, provides the RDFS entailment rules, which are used for the RDFS inference. Sesame, which is well known RDF repository, supports the RDBMS-based RDFS inference using the forward-chaining strategy. Since inferencing in the forward-chaining strategy is performed in the data loading time, the data loading time in Sesame is slow down be inferencing. In this paper, we propose the order scheme for applying the RDFS entailment rules to improve inference performance. The proposed application order makes the inference process terminate without repetition of the process for most cases and guarantees the completeness of inference result. Also the application order helps to reduce redundant results during the inference by predicting the results which were made already by previously applied rules. In this paper, we show that our approaches can improve the inference performance with comparisons to the original Sesame using several real-life RDF datasets.

A Study for Sharing Patient Medical Information with Demographic Datasets (환자 의료 정보 공유 및 데이터 통합을 위한 데모그래픽 데이터 활용 연구)

  • Lim, Jongwoo;Jung, Eun-Young;Jeong, Byoung-Hui;Park, Dong Kyun;Whangbo, Taeg-Keun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.10
    • /
    • pp.128-136
    • /
    • 2014
  • Recently, although exponentially growing the quantity of information that have been used and shared on internet networks, the patient information of each medical center have not been used and shared among medical centers due to the protection of patients privacy and the different database schema. To address this problem, we have studied the data structure of the patient information, the standard of medical information for patients we propose a patient information sharing system design that each medical center is able to use and share the patient information among medical centers in spite of different patient information systems with protecting patients privacy.