• Title/Summary/Keyword: Data science

Search Result 55,244, Processing Time 0.059 seconds

Data Framework Design of EDISON 2.0 Digital Platform for Convergence Research

  • Sunggeun Han;Jaegwang Lee;Inho Jeon;Jeongcheol Lee;Hoon Choi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2292-2313
    • /
    • 2023
  • With improving computing performance, various digital platforms are being developed to enable easily utilization of high-performance computing environments. EDISON 1.0 is an online simulation platform widely used in computational science and engineering education. As the research paradigm changes, the demand for developing the EDISON 1.0 platform centered on simulation into the EDISON 2.0 platform centered on data and artificial intelligence is growing. Herein, a data framework, a core module for data-centric research on EDISON 2.0 digital platform, is proposed. The proposed data framework provides the following three functions. First, it provides a data repository suitable for the data lifecycle to increase research reproducibility. Second, it provides a new data model that can integrate, manage, search, and utilize heterogeneous data to support a data-driven interdisciplinary convergence research environment. Finally, it provides an exploratory data analysis (EDA) service and data enrichment using an AI model, both developed to strengthen data reliability and maximize the efficiency and effectiveness of research endeavors. Using the EDISON 2.0 data framework, researchers can conduct interdisciplinary convergence research using heterogeneous data and easily perform data pre-processing through the web-based UI. Further, it presents the opportunity to leverage the derived data obtained through AI technology to gain insights and create new research topics.

Degree Programs in Data Science at the School of Information in the States (미국 정보 대학의 데이터사이언스 학위 현황 연구)

  • Park, Hyoungjoo
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.2
    • /
    • pp.305-332
    • /
    • 2022
  • This preliminary study examined the degree programs in data science at the School of Information in the States. The focus of this study was the data science degrees offered at the School of Information awarded by the 64 Library and Information Science (LIS) programs accredited by the American Library Association (ALA) in 2022. In addition, this study examined the degrees, majors, minors, specialized tracks, and certificates in data science, as well as the potential careers after earning a data science degree. Overall, eight Schools of Information (iSchools) offered 12 data science degrees. Data science courses at the School of Information focus on topics such as introduction to data science, information retrieval, data mining, database, data and humanities, machine learning, metadata, research methods, data analysis and visualization, internship/capstone, ethics and security, user, policy, and curation and management. Most schools did not offer traditional LIS courses. After earning the data science degree in the School of Information, the potential careers included data scientists, data engineers and data analysts. The researcher hopes the findings of this study can be used as a starting point to discuss the directions of data science programs from the perspectives of the information field, specifically the degrees, majors, minors, specialized tracks and certificates in data science.

Comparison of Various Criteria for Designing ECOC

  • Seok, Kyeong-Ha;Lee, Seung-Chul;Jeon, Gab-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.437-447
    • /
    • 2006
  • Error Correcting Output Coding(ECOC) is used to solve multi-class problem. It is known that it improves the classification accuracy. In this paper, we compared various criteria to design code matrix while encoding. In addition. we prorpose an ensemble which uses the ability of each classifier while decoding. We investigate the justification of the proposed method through real data and synthetic data.

  • PDF

Proposals for Korean Space Observation Data Strategies (한국 우주관측 자료 전략 수립 제안)

  • Baek, Ji-Hye;Choi, Seonghwan;Park, Jongyeob;Kim, Sujin;Sim, Chae Kyung;Yang, Tae-Yong;Jeong, Minsup;Jo, Young-Soo;Choi, Young-Jun
    • Journal of Space Technology and Applications
    • /
    • v.1 no.2
    • /
    • pp.241-255
    • /
    • 2021
  • Space observation data includes research data such as stars, galaxies, Sun, space plasma, planets, and minor bodies observed through space missions, including processing and utilizing the observation data. Astronomy and space science observation systems are getting larger, and space mission opportunities and data size are increasing. Accordingly, the need for systematic and efficient management of space observation data is growing. Therefore, in Korea, a strategy and policy for space observation data should be established. As a stage of preparation, National Aeronautics and Space Administration (NASA)'s data strategy, which developed from extensive understanding and long-term experience for space observation data, was analyzed. Based on the analysis results, we propose a strategic direction and 10 recommendations for Korean space observation data strategies that will be the basis for establishing space observation data policies in the future.

Verification Control Algorithm of Data Integrity Verification in Remote Data sharing

  • Xu, Guangwei;Li, Shan;Lai, Miaolin;Gan, Yanglan;Feng, Xiangyang;Huang, Qiubo;Li, Li;Li, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.2
    • /
    • pp.565-586
    • /
    • 2022
  • Cloud storage's elastic expansibility not only provides flexible services for data owners to store their data remotely, but also reduces storage operation and management costs of their data sharing. The data outsourced remotely in the storage space of cloud service provider also brings data security concerns about data integrity. Data integrity verification has become an important technology for detecting the integrity of remote shared data. However, users without data access rights to verify the data integrity will cause unnecessary overhead to data owner and cloud service provider. Especially malicious users who constantly launch data integrity verification will greatly waste service resources. Since data owner is a consumer purchasing cloud services, he needs to bear both the cost of data storage and that of data verification. This paper proposes a verification control algorithm in data integrity verification for remotely outsourced data. It designs an attribute-based encryption verification control algorithm for multiple verifiers. Moreover, data owner and cloud service provider construct a common access structure together and generate a verification sentinel to verify the authority of verifiers according to the access structure. Finally, since cloud service provider cannot know the access structure and the sentry generation operation, it can only authenticate verifiers with satisfying access policy to verify the data integrity for the corresponding outsourced data. Theoretical analysis and experimental results show that the proposed algorithm achieves fine-grained access control to multiple verifiers for the data integrity verification.

An Open Science 'State of the Art' for Hong Kong: Making Open Research Data Available to Support Hong Kong Innovation Policy

  • Sharif, Naubahar;Ritter, Waltraut;Davidson, Robert L;Edmunds, Scott C
    • Journal of Contemporary Eastern Asia
    • /
    • v.17 no.2
    • /
    • pp.200-221
    • /
    • 2018
  • Open Science is an umbrella term that involves various movements aiming to remove the barriers to sharing any kind of output, resources, methods or tools at any stage of the research process. While the study of open science is relatively advanced in Western countries, we know of no scholarship that attempts to understand open science in Hong Kong. This paper provides a broad-based background on the major research data management organisations, policies and institutions with the intention of laying a foundation for more rigorous future research that quantifies the benefits of open access and open data policies. We explore the status and prospects for open science (open access and open data) in the context of Hong Kong and how open science can contribute to innovation in Hong Kong. Surveying Hong Kong's policies and players, we identify both lost research potential and provide positive examples of Hong Kong's contribution to scientific research. Finally, we offer suggestions regarding what changes can be made to address the gaps we identify.

DEVELOPMENT AND TESTS OF THE ALGORITHM FOR DIRECT DATA TRANSMISSION BETWEEN RVDB AND HUGE CAPACITY DATA SERVER (RVDB와 대용량 서버 간의 직접 데이터 전송 알고리즘 개발과 시험에 관한 연구)

  • Roh, Duk-Gyoo;Oh, Se-Jin;Yeom, Jae-Hwan;Jung, Dong-Kyu;Oh, Chung-Sik;Yun, Young-Joo;Kim, Hyo-Ryoung;Ozeki, Kensuke
    • Publications of The Korean Astronomical Society
    • /
    • v.29 no.3
    • /
    • pp.45-52
    • /
    • 2014
  • This paper describes the development of algorithm for direct data transmission between Raw VLBI Data Buffer (RVDB) and Huge Capacity Data Server (HCDS) operated in Korea-Japan Correlation Center (KJCC). The transmitted data is the VLBI observation data, which is recorded at each radio telescope site, and the data transmitting rate is varying from 1 Gbps, in usual case, upto 8 Gbps. The developed algorithm for data transmission enables the direct data transmission between RVDB and HCDS through 10 Gbps optical network using VLBI Data Interchange Format (VDIF). Proposed method adopts the conventional UDP/IP protocol, but in order to prevent the loss of data during data transmission, the packet error monitoring and data re-transmission functions are newly designed. The VDIF specification and VDIFCP (VDIF Control Protocol) are used for the direct data transmission between RVDB and HCDS. To validate the developed algorithm for data transmission, we conducted the data transmission from RVDB to HCDS, and compared to the transmitted data with the original data bit by bit. We confirmed that the transmitted data is identical to the original data without any loss and it has been recovered well even if there were some packet losses.

Data Science Degree and Curriculum in Korea and its Implications for the Information Field (국내 데이터사이언스 학위 및 교과 운영 현황과 문헌정보학과로의 함의)

  • Park, Hyoungjoo;Lee, Heejin
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.3
    • /
    • pp.431-454
    • /
    • 2022
  • This study examined data science degree programs and courses offered by universities, and those offered by the Library and Information Science (LIS) degree programs, to understand its implications for the LIS programs in Korea. This research assessed the status of data science degrees from 439 schools using the list released by the Korea Educational Development Institute in 2022. To be specific, this study analyzed universities, colleges, majors, sub-majors, interdisciplinary majors, convergence majors, micro-degrees, nanodegrees, tracks, modules, and industry-university cooperative programs within the data science field. This research examined 1,148 courses offered by data science degree programs and 1,325 courses offered by LIS degree programs. Data science degrees in Korea offer courses such as introductory, technical, practical, applied, and in-depth subjects related to data science. Although the LIS programs in Korea do not always offer data science, the courses included topics such as the introduction to data science, database, data visualization, data curation, metadata, big data, and information technology, when courses were offered. The researchers hope the findings of this study will be useful as a starting point for the development and revisions of LIS curriculum on data science in Korea.

A Study on Elementary Education Examples for Data Science using Entry (엔트리를 활용한 초등 데이터 과학 교육 사례 연구)

  • Hur, Kyeong
    • Journal of The Korean Association of Information Education
    • /
    • v.24 no.5
    • /
    • pp.473-481
    • /
    • 2020
  • Data science starts with small data analysis and includes machine learning and deep learning for big data analysis. Data science is a core area of artificial intelligence technology and should be systematically reflected in the school curriculum. For data science education, The Entry also provides a data analysis tool for elementary education. In a big data analysis, data samples are extracted and analysis results are interpreted through statistical guesses and judgments. In this paper, the big data analysis area that requires statistical knowledge is excluded from the elementary area, and data science education examples focusing on the elementary area are proposed. To this end, the general data science education stage was explained first, and the elementary data science education stage was newly proposed. After that, an example of comparing values of data variables and an example of analyzing correlations between data variables were proposed with public small data provided by Entry, according to the elementary data science education stage. By using these Entry data-analysis examples proposed in this paper, it is possible to provide data science convergence education in elementary school, with given data generated from various subjects. In addition, data science educational materials combined with text, audio and video recognition AI tools can be developed by using the Entry.

Data Technology: New Interdisciplinary Science & Technology (데이터 기술: 지식창조를 위한 새로운 융합과학기술)

  • Park, Sung-Hyun
    • Journal of Korean Society for Quality Management
    • /
    • v.38 no.3
    • /
    • pp.294-312
    • /
    • 2010
  • Data Technology (DT) is a new technology which deals with data collection, data analysis, information generation from data, knowledge generation from modelling and future prediction. DT is a newly emerged interdisciplinary science & technology in this 21st century knowledge society. Even though the main body of DT is applied statistics, it also contains management information system (MIS), quality management, process system analysis and so on. Therefore, it is an interdisciplinary science and technology of statistics, management science, industrial engineering, computer science and social science. In this paper, first of all, the definition of DT is given, and then the effects and the basic properties of DT, the differences between IT and DT, the 6 step process for DT application, and a DT example are provided. Finally, the relationship among DT, e-Statistics and Data Mining is explained, and the direction of DT development is proposed.