• Title/Summary/Keyword: Data science

Search Result 55,244, Processing Time 0.06 seconds

Computing Resource Sharing and Utilization System for Efficient Research Data Utilization (연구데이터 활용성 극대화 위한 컴퓨팅 리소스 공유활용 체계)

  • Song, Sa-kwang;Cho, Minhee;Lee, Mikyoung;Yim, Hyung-Jun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.430-432
    • /
    • 2022
  • With the recent increase in interest in the open science movement in science and technology fields such as open access, open data, and open source, the movement to share and utilize publicly funded research products is materializing and revitalizing. In line with this trend, many efforts are being made to establish and revitalize a system for sharing and utilizing research data, which is a key resource for research in Korea. These efforts are mainly focused on collecting research data by field and institution, and linking it with DataON, a national research data platform, to search and utilize it. However, developed countries are building a system that can share and utilize not only such research data but also various types of R&D-related computing resources such as IaaS, PaaS, SaaS, and MLaaS. EOSC (European Open Science Cloud), ARDC (Australian Research Data Commons), and CSTCloud (China S&T Cloud) are representative examples. In Korea, the Korea Research Data Commons (KRDC) is designed and a core framework is being developed to facilitate the sharing of these computing resources. In this study, the necessity, concept, composition, and future plans of KRDC are introduced.

  • PDF

A Scalable Data Integrity Mechanism Based on Provable Data Possession and JARs

  • Zafar, Faheem;Khan, Abid;Ahmed, Mansoor;Khan, Majid Iqbal;Jabeen, Farhana;Hamid, Zara;Ahmed, Naveed;Bashir, Faisal
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2851-2873
    • /
    • 2016
  • Cloud storage as a service provides high scalability and availability as per need of user, without large investment on infrastructure. However, data security risks, such as confidentiality, privacy, and integrity of the outsourced data are associated with the cloud-computing model. Over the year's techniques such as, remote data checking (RDC), data integrity protection (DIP), provable data possession (PDP), proof of storage (POS), and proof of retrievability (POR) have been devised to frequently and securely check the integrity of outsourced data. In this paper, we improve the efficiency of PDP scheme, in terms of computation, storage, and communication cost for large data archives. By utilizing the capabilities of JAR and ZIP technology, the cost of searching the metadata in proof generation process is reduced from O(n) to O(1). Moreover, due to direct access to metadata, disk I/O cost is reduced and resulting in 50 to 60 time faster proof generation for large datasets. Furthermore, our proposed scheme achieved 50% reduction in storage size of data and respective metadata that result in providing storage and communication efficiency.

Development of a Data Science Education Program for High School Students Taking the High School Credit System (고교학점제 수강 고등학생을 위한 데이터과학교육 프로그램 개발)

  • Semin Kim;SungHee Woo
    • Journal of Practical Engineering Education
    • /
    • v.14 no.3
    • /
    • pp.471-477
    • /
    • 2022
  • In this study, an educational program was developed that allows students who take data science courses in the high school credit system to explore related fields after learning data science education. Accordingly, the existing research and requirements for data science education were analyzed, a learning plan was designed, and an educational program was developed in accordance with a step-by-step educational program. In addition, since there is no research on data science education for the high school credit system in existing studies, the research was conducted in the stages of problem definition, data collection, data preprocessing, data analysis, data visualization, and simulation, and referred to studies on data science education that have been conducted in existing schools. Through this study, it is expected that research on data science education in the high school credit system will become more active.

A Big Data-Driven Business Data Analysis System: Applications of Artificial Intelligence Techniques in Problem Solving

  • Donggeun Kim;Sangjin Kim;Juyong Ko;Jai Woo Lee
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.35-47
    • /
    • 2023
  • It is crucial to develop effective and efficient big data analytics methods for problem-solving in the field of business in order to improve the performance of data analytics and reduce costs and risks in the analysis of customer data. In this study, a big data-driven data analysis system using artificial intelligence techniques is designed to increase the accuracy of big data analytics along with the rapid growth of the field of data science. We present a key direction for big data analysis systems through missing value imputation, outlier detection, feature extraction, utilization of explainable artificial intelligence techniques, and exploratory data analysis. Our objective is not only to develop big data analysis techniques with complex structures of business data but also to bridge the gap between the theoretical ideas in artificial intelligence methods and the analysis of real-world data in the field of business.

Analysis on NDN Testbeds for Large-scale Scientific Data: Status, Applications, Features, and Issues (과학 빅데이터를 위한 엔디엔 테스트베드 분석: 현황, 응용, 특징, 그리고 이슈)

  • Lim, Huhnkuk;Sin, Gwangcheon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.904-913
    • /
    • 2020
  • As the data volumes and complexity rapidly increase, data-intensive science handling large-scale scientific data needs to investigate new techniques for intelligent storage and data distribution over networks. Recently, Named Data Networking (NDN) and data-intensive science communities have inspired innovative changes in distribution and management for large-scale experimental data. In this article, analysis on NDN testbeds for large-scale scientific data such as climate science data and High Energy Physics (HEP) data is presented. This article is the first attempt to analyze existing NDN testbeds for large-scale scientific data. NDN testbeds for large-scale scientific data are described and discussed in terms of status, NDN-based application, and features, which are NDN testbed instance for climate science, NDN testbed instance for both climate science and HEP, and the NDN testbed in SANDIE project. Finally various issues to prevent pitfalls in NDN testbed establishment for large-scale scientific data are analyzed and discussed, which are drawn from the descriptions of NDN testbeds and features on them.

Current Status and Proposal of University Library Research Data Management Service: Focused on Science and Technology Specialized Universities (대학도서관 연구데이터 관리 서비스 현황 및 제안 - 과학기술특성화 대학을 중심으로 -)

  • Juseop Kim;Suntae Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.3
    • /
    • pp.279-301
    • /
    • 2023
  • The data-driven research environment is rapidly changing. Accordingly, domestic university libraries are also preparing to establish and operate research data management services to support university researchers. This study was designed to propose a research data management service to support researchers in science and technology specialized university libraries. In order to propose the service, 11 universities specializing in science and technology were selected from overseas and domestic universities and their research data management services were analyzed. Key categories were derived from analysis results, research data management, electronic research notebooks, and RDM training. In particular, the 'research data management' category included DMP, data collection, data management, data preservation, data sharing and publishing, data reuse, infrastructure and tools. And it consists of RDM guides and policies. The results of this study will be helpful in introducing and operating research data management services in science and technology specialized university libraries.

Functional Requirements for Research Data Repositories

  • Kim, Suntae
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.8 no.1
    • /
    • pp.25-36
    • /
    • 2018
  • Research data must be testable. Science is all about verification and testing. To make data testable, tools used to produce, collect, and examine data during the research must be available. Quite often, however, these data become inaccessible once the work is over and the results being published. Hence, information and the related context must be provided on how research data are preserved and how they can be reproduced. Open Science is the international movement for making scientific research data properly accessible for research community. One of its major goals is building data repositories to foster wide dissemination of open data. The objectives of this research are to examine the features of research data, common repository platforms, and community requests for the purpose of designing functional requirements for research data repositories. To analyze the features of the research data, we use data curation profiles available from the Data Curation Center of the Purdue University, USA. For common repository platforms we examine Fedora Commons, iRODS, DataONE, Dataverse, Open Science Data Cloud (OSDC), and Figshare. We also analyze the requests from research community. To design a technical solution that would meet public needs for data accessibility and sharing, we take the requirements of RDA Repository Interest Group and the requests for the DataNest Community Platform developed by the Korea Institute of Science and Technology Information (KISTI). As a result, we particularize 75 requirement items grouped into 13 categories (metadata; identifiers; authentication and permission management; data access, policy support; publication; submission/ingest/management, data configuration, location; integration, preservation and sustainability, user interface; data and product quality). We hope that functional requirements set down in this study will be of help to organizations that consider deploying or designing data repositories.

How to retrieve the encrypted data on the blockchain

  • Li, Huige;Zhang, Fangguo;Luo, Peiran;Tian, Haibo;He, Jiejie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.11
    • /
    • pp.5560-5579
    • /
    • 2019
  • Searchable symmetric encryption (SSE) scheme can perform search on encrypted data directly without revealing the plain data and keywords. At present, many constructive SSE schemes were proposed. However, they cannot really resist the malicious adversary, because it (i.e., the cloud server) may delete some important data. As a result, it is very likely that the returned search results are incorrect. In order to better guarantee the integrity of outsourcing data, and ensure the correction of returned search results at the same time, in this paper, we combine SSE with blockchain (BC), and propose a SSE-on-BC framework model. We then construct two concrete schemes based on the size of the data, which can better provide privacy protection and integrity verification for data. Lastly, we present their security and performance analyses, which show that they are secure and feasible.

Data Firewall: A TPM-based Security Framework for Protecting Data in Thick Client Mobile Environment

  • Park, Woo-Ram;Park, Chan-Ik
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.331-337
    • /
    • 2011
  • Recently, Virtual Desktop Infrastructure (VDI) has been widely adopted to ensure secure protection of enterprise data and provide users with a centrally managed execution environment. However, user experiences may be restricted due to the limited functionalities of thin clients in VDI. If thick client devices like laptops are used, then data leakage may be possible due to malicious software installed in thick client mobile devices. In this paper, we present Data Firewall, a security framework to manage and protect security-sensitive data in thick client mobile devices. Data Firewall consists of three components: Virtual Machine (VM) image management, client VM integrity attestation, and key management for Protected Storage. There are two types of execution VMs managed by Data Firewall: Normal VM and Secure VM. In Normal VM, a user can execute any applications installed in the laptop in the same manner as before. A user can access security-sensitive data only in the Secure VM, for which the integrity should be checked prior to access being granted. All the security-sensitive data are stored in the space called Protected Storage for which the access keys are managed by Data Firewall. Key management and exchange between client and server are handled via Trusted Platform Module (TPM) in the framework. We have analyzed the security characteristics and built a prototype to show the performance overhead of the proposed framework.

A Level Evaluation Model for Data Governance (데이터 거버넌스 수준평가 모델 개발의 제안)

  • Jang, Kyoung-Ae;Kim, Woo-Je
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.42 no.1
    • /
    • pp.65-77
    • /
    • 2017
  • The purpose of this paper is to develop a model of level evaluation for data governance that can diagnose and verify level of insufficient part of operating data governance. We expanded the previous study related on attribute indices of data governance and developed a level model of evaluation and items. The model of level evaluation for data governance is the level of evaluation and has items of 400 components. We used previous studies and expert opinion analysis such as the Delphi technique, KJ method in this paper. This study contributes to literature by developing a level evaluation model for data governance at the early phase. This paper will be used for the base line data in objective evidence of performance in the companies and agencies of operating data governance.