• Title/Summary/Keyword: Linked Data Dataset

Search Result 29, Processing Time 0.024 seconds

Designing Dataset Management and Service System for Digital Libraries Using DCAT (DCAT을 활용한 디지털도서관 데이터셋 관리와 서비스 설계)

  • Park, Jin Ho
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.2
    • /
    • pp.247-266
    • /
    • 2019
  • The purpose of this study is to propose a W3C standard, DCAT, to manage and service dataset that is becoming increasingly important as new knowledge information resources. To do this, we first analyzed the class and properties of the four core classes of DCAT. In addition, I modeled and presented a system that can manage and service various data sets based on DCAT in digital library. The system is divided into source data, data set management, linked data connection, and user service. Especially, the DCAT mapping function is suggested in dataset management. This feature can ensure interoperability of various datasets.

R2RML Based ShEx Schema

  • Choi, Ji-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.45-55
    • /
    • 2018
  • R2RML is a W3C standard language that defines how to expose the relational data as RDF triples. The output from an R2RML mapping is only an RDF dataset. By definition, the dataset has no schema. The lack of schema makes the dataset in linked data portal impractical for integrating and analyzing data. To address this issue, we propose an approach for generating automatically schemas for RDF graphs populated by R2RML mappings. More precisely, we represent the schema using ShEx, which is a language for validating and describing RDF. Our approach allows to generate ShEx schemas as well as RDF datasets from R2RML mappings. Our ShEx schema can provide benefits for both data providers and ordinary users. Data providers can verify and guarantee the structural integrity of the dataset against the schema. Users can write SPARQL queries efficiently by referring to the schema. In this paper, we describe data structures and algorithms of the system to derive ShEx documents from R2RML documents and presents a brief demonstration regarding its proper use.

Analysis of LinkedIn Jobs for Finding High Demand Job Trends Using Text Processing Techniques

  • Kazi, Abdul Karim;Farooq, Muhammad Umer;Fatima, Zainab;Hina, Saman;Abid, Hasan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.223-229
    • /
    • 2022
  • LinkedIn is one of the most job hunting and career-growing applications in the world. There are a lot of opportunities and jobs available on LinkedIn. According to statistics, LinkedIn has 738M+ members. 14M+ open jobs on LinkedIn and 55M+ Companies listed on this mega-connected application. A lot of vacancies are available daily. LinkedIn data has been used for the research work carried out in this paper. This in turn can significantly tackle the challenges faced by LinkedIn and other job posting applications to improve the levels of jobs available in the industry. This research introduces Text Processing in natural language processing on datasets of LinkedIn which aims to find out the jobs that appear most in a month or/and year. Therefore, the large data became renewed into the required or needful source. This study thus uses Multinomial Naïve Bayes and Linear Support Vector Machine learning algorithms for text classification and developed a trained multilingual dataset. The results indicate the most needed job vacancies in any field. This will help students, job seekers, and entrepreneurs with their career decisions

A Study on Recent Trends in Building Linked Data for Overseas Libraries: Focusing on Published Datasets, Reused Vocabulary, and Interlinked External Datasets (해외 도서관 링크드 데이터 구축의 최근 동향 연구 - 발행 데이터세트, 재사용 어휘집, 인터링킹 외부 데이터세트를 중심으로 -)

  • Sung-Sook Lee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.4
    • /
    • pp.5-28
    • /
    • 2022
  • In this study, LD construction cases of overseas libraries were analyzed with focus on published datasets, reused vocabulary, and interlinked external datasets, and based on the analysis results, basic data on LD construction plans of domestic libraries were obtained. As a result of the analysis of 21 library cases, overseas libraries have established a faithful authority LD and conducted new services using published LDs. To this end, overseas libraries collaborated with other libraries and cultural institutions within the region, within the country, and nationally under the leadership of the library, and based on this cooperation, a specialized dataset was published. Overseas libraries used Schema.org to increase the visibility of published LDs, and used BIBFRAME for subdivision of description to define various entities and build LDs based on the defined entities. Overseas libraries have utilized various defined entities to link related information, display results, browse, and download in bulk. Overseas libraries were interested in the continuous up-to-date of interlinked external datasets, and directly utilized external data to reinforce catalog information. In this study, based on the derived implications, points to be considered when issuing LDs by domestic libraries were proposed. The research results can be used as basic data when future domestic libraries plan LD services or upgrade existing services.

BIM data mapping based on M-BDL for BIM-BEMS connection (BIM-BEMS 연계를 위한 M-BDL 기반 BIM 데이터 맵핑)

  • Kang, Tae-Wook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.9
    • /
    • pp.348-354
    • /
    • 2018
  • This study proposes MF (Model Filter)-based M-BDL (MF-based BIM Data Linkage), which is a model filter-based data mapping method for BIM (Building Information Modeling)-BEMS linkage. Recently, BEMS (Building Energy Management System) is actively utilizing 3D spatial information. This allows the user to intuitively manage the facility energy linked to spatial information. To use BIM data in energy management systems, it is essential to link BEMS with BIM data only in terms of the user requirements. On the other hand, if the BIM is a rich dataset and is linked as it is, the user will need to manage the unnecessary information. By mapping only the data required for BEMS in heavy BIM data through M-BDL, the BIM data can be lightened and the amount of data required for maintenance can be reduced. This technology proposes a mapping method that can link the BIM data with the filtered BIM data.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

A biomedically oriented automatically annotated Twitter COVID-19 dataset

  • Hernandez, Luis Alberto Robles;Callahan, Tiffany J.;Banda, Juan M.
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.21.1-21.5
    • /
    • 2021
  • The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

Utilizing Artificial Neural Networks for Establishing Hearing-Loss Predicting Models Based on a Longitudinal Dataset and Their Implications for Managing the Hearing Conservation Program

  • Thanawat Khajonklin;Yih-Min Sun;Yue-Liang Leon Guo;Hsin-I Hsu;Chung Sik Yoon;Cheng-Yu Lin;Perng-Jy Tsai
    • Safety and Health at Work
    • /
    • v.15 no.2
    • /
    • pp.220-227
    • /
    • 2024
  • Background: Though the artificial neural network (ANN) technique has been used to predict noise-induced hearing loss (NIHL), the established prediction models have primarily relied on cross-sectional datasets, and hence, they may not comprehensively capture the chronic nature of NIHL as a disease linked to long-term noise exposure among workers. Methods: A comprehensive dataset was utilized, encompassing eight-year longitudinal personal hearing threshold levels (HTLs) as well as information on seven personal variables and two environmental variables to establish NIHL predicting models through the ANN technique. Three subdatasets were extracted from the afirementioned comprehensive dataset to assess the advantages of the present study in NIHL predictions. Results: The dataset was gathered from 170 workers employed in a steel-making industry, with a median cumulative noise exposure and HTL of 88.40 dBA-year and 19.58 dB, respectively. Utilizing the longitudinal dataset demonstrated superior prediction capabilities compared to cross-sectional datasets. Incorporating the more comprehensive dataset led to improved NIHL predictions, particularly when considering variables such as noise pattern and use of personal protective equipment. Despite fluctuations observed in the measured HTLs, the ANN predicting models consistently revealed a discernible trend. Conclusions: A consistent correlation was observed between the measured HTLs and the results obtained from the predicting models. However, it is essential to exercise caution when utilizing the model-predicted NIHLs for individual workers due to inherent personal fluctuations in HTLs. Nonetheless, these ANN models can serve as a valuable reference for the industry in effectively managing its hearing conservation program.

Author Entity Identification using Representative Properties in Linked Data (대표 속성을 이용한 저자 개체 식별)

  • Kim, Tae-Hong;Jung, Han-Min;Sung, Won-Kyung;Kim, Pyung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.17-29
    • /
    • 2012
  • In recent years, Linked Data that is published under an open license shows increased growth rate and comes into the spotlight due to its interoperability and openness especially in government of developed countries. However there are relatively few out-links compared with its entire number of links and most of links refer a few hub dataset. These occur because of absence of technology that identifies entities in Linked data. In this paper, we present an improved author entity resolution method that using representative properties. To solve problems of previous methods that utilizes relation with other entities(owl:sameAs, owl:differentFrom and so on) or depends on Curation, we design and evaluate an automated realtime resolution process based on multi-ontologies that respects entity's type and its logical characteristics so as to verify entities consistency. The evaluation of author entity resolution shows positive results (The average of K measuring result is 0.8533.) with 29 author information that has obtained confirmation.

Monitoring People's Emotions and Symptoms after COVID-19 Vaccine

  • Najwa N. Alshahrani;Sara N. Abduljaleel;Ghidaa A. Alnefaiy;Hanan S. Alshanbari
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.6
    • /
    • pp.202-206
    • /
    • 2023
  • Today, social media has become a vital tool. The world communicates and reaches the news and each other's opinions through social media accounts. Recently, considerable research has been done on analyzing social media due to its rich data content. At the same time, since the beginning of the COVID-19 pandemic, which has afflicted so many around the world, the search for a vaccine has been intense. There have been many studies analyzing people's feelings during a crisis. This study aims to understand people's opinions about available Coronavirus vaccines through a learning model that was developed for this purpose. The dataset was collected using Twitter's streaming Application Programming Interface (API) , then combined with another dataset that had already been collected. The final dataset was cleaned, then analyzed using Python. Polarity and subjectivity functions were used to obtain the results. The results showed that most people had positive opinions toward vaccines in general and toward the Pfizer one. Our study should help governments and decision-makers dispel people's fears and discover new symptoms linked to those listed by the World Health Organization.