• Title/Summary/Keyword: Data Paper

Search Result 56,179, Processing Time 0.067 seconds

Bayesian Estimation for the Multiple Regression with Censored Data : Mutivariate Normal Error Terms

  • Yoon, Yong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.165-172
    • /
    • 1998
  • This paper considers a linear regression model with censored data where each error term follows a multivariate normal distribution. In this paper we consider the diffuse prior distribution for parameters of the linear regression model. With censored data we derive the full conditional densities for parameters of a multiple regression model in order to obtain the marginal posterior densities of the relevant parameters through the Gibbs Sampler, which was proposed by Geman and Geman(1984) and utilized by Gelfand and Smith(1990) with statistical viewpoint.

  • PDF

A Mixed Model for Oredered Response Categories

  • Choi, Jae-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.2
    • /
    • pp.339-345
    • /
    • 2004
  • This paper deals with a mixed logit model for ordered polytomous data. There are two types of factors affecting the response varable in this paper. One is a fixed factor with finite quantitative levels and the other is a random factor coming from an experimental structure such as a randomized complete block design. It is discussed how to set up the model for analyzing ordered polytomous data and illustrated how to estimate the paramers in the given model.

  • PDF

A Study on the Sharing of Research Data in Library and Information Science Field (문헌정보학 분야 연구데이터 공유에 관한 연구)

  • Cho, Jane
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.4
    • /
    • pp.59-79
    • /
    • 2017
  • This study analyzed the type, subject and open level of research data in the field of library and information science field shared by Figshare, and statistically analyzed the characteristics of data with relatively high recyclability. The results of the analysis showed that datasets and papers were most common data types, and open access and research data were the most common keywords of data, and that 70% of the data were published in a form that can not be processed mechanically such as pdf. As a result of analysis of the relationship between characteristics of research data and degree of sharing, open access areas such as APC (Article Processing Charge) were found to be most common in the subject. However in data type, gray literature such as paper found to be highly utilized rather than dataset.

Data Reduction Method in Massive Data Sets

  • Namo, Gecynth Torre;Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.1
    • /
    • pp.35-40
    • /
    • 2009
  • Many researchers strive to research on ways on how to improve the performance of RFID system and many papers were written to solve one of the major drawbacks of potent technology related with data management. As RFID system captures billions of data, problems arising from dirty data and large volume of data causes uproar in the RFID community those researchers are finding ways on how to address this issue. Especially, effective data management is important to manage large volume of data. Data reduction techniques in attempts to address the issues on data are also presented in this paper. This paper introduces readers to a new data reduction algorithm that might be an alternative to reduce data in RFID Systems. A process on how to extract data from the reduced database is also presented. Performance study is conducted to analyze the new data reduction algorithm. Our performance analysis shows the utility and feasibility of our categorization reduction algorithms.

A Data-driven Approach for Computational Simulation: Trend, Requirement and Technology

  • Lee, Sunghee;Ahn, Sunil;Joo, Wonkyun;Yang, Myungseok;Yu, Eunji
    • Journal of Internet Computing and Services
    • /
    • v.19 no.1
    • /
    • pp.123-130
    • /
    • 2018
  • With the emergence of a new paradigm called Open Science and Big Data, the need for data sharing and collaboration is also emerging in the computational science field. This paper, we analyzed data-driven research cases for computational science by field; material design, bioinformatics, high energy physics. We also studied the characteristics of the computational science data and the data management issues. To manage computational science data effectively it is required to have data quality management, increased data reliability, flexibility to support a variety of data types, and tools for analysis and linkage to the computing infrastructure. In addition, we analyzed trends of platform technology for efficient sharing and management of computational science data. The main contribution of this paper is to review the various computational science data repositories and related platform technologies to analyze the characteristics of computational science data and the problems of data management, and to present design considerations for building a future computational science data platform.

Tramsmission Method of Periodic and Aperiodic Real-Time Data on a Timer-Controlled Network for Distributed Control Systems (분산제어시스템을 위한 타이머 제어형 통신망의 주기 및 실시간 비주기 데이터 전송 방식)

  • Moon, Hong-ju;Park, Hong-Seong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.6 no.7
    • /
    • pp.602-610
    • /
    • 2000
  • In communication networks used in safety-critical systems such as control systems in nuclear power plants there exist three types of data traffic : urgent or asynchronous hard real-time data hard real-time periodic data and soft real-time periodic data. it is necessary to allocate a suitable bandwidth to each data traffic in order to meet their real-time constraints. This paper proposes a method to meet the real-time constraints for the three types of data traffic simultaneously under a timer-controlled token bus protocol or the IEEE 802.4 token bus protocol and verifies the validity of the presented method by an example. This paper derives the proper region of the high priority token hold time and the target token rotation time for each station within which the real-time constraints for the three types of data traffic are met, Since the scheduling of the data traffic may reduce the possibility of the abrupt increase of the network load this paper proposes a brief heuristic method to make a scheduling table to satisfy their real-time constraints.

  • PDF

A study on the standardization strategy for building of learning data set for machine learning applications (기계학습 활용을 위한 학습 데이터세트 구축 표준화 방안에 관한 연구)

  • Choi, JungYul
    • Journal of Digital Convergence
    • /
    • v.16 no.10
    • /
    • pp.205-212
    • /
    • 2018
  • With the development of high performance CPU / GPU, artificial intelligence algorithms such as deep neural networks, and a large amount of data, machine learning has been extended to various applications. In particular, a large amount of data collected from the Internet of Things, social network services, web pages, and public data is accelerating the use of machine learning. Learning data sets for machine learning exist in various formats according to application fields and data types, and thus it is difficult to effectively process data and apply them to machine learning. Therefore, this paper studied a method for building a learning data set for machine learning in accordance with standardized procedures. This paper first analyzes the requirement of learning data set according to problem types and data types. Based on the analysis, this paper presents the reference model to build learning data set for machine learning applications. This paper presents the target standardization organization and a standard development strategy for building learning data set.

Intelligent Data Governance for the Federated Integration of Air Quality Databases in the Railway Industry (철도 산업의 공기 질 데이터베이스 연합형 통합을 위한 지능형 데이터 거버넌스)

  • Minjeong, Kim;Jong-Un, Won;Sangchan, Park;Gayoung, Park
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.811-830
    • /
    • 2022
  • Purpose: In this paper, we will discuss 1) prioritizing databases to be integrated; 2) which data elements should be emphasized in federated database integration; and 3) the degree of efficiency in the integration. This paper aims to lay the groundwork for building data governance by presenting guidelines for database integration using metrics to identify and evaluate the capabilities of the UK's air quality databases. Methods: This paper intends to perform relative efficiency analysis using Data Envelope Analysis among the multi-criteria decision-making methods. In federated database integration, it is important to identify databases with high integration efficiency when prioritizing databases to be integrated. Results: The outcome of this paper aims not to present performance indicators for the implementation and evaluation of data governance, but rather to discuss what criteria should be used when performing 'federated integration'. Using Data Envelope Analysis in the process of implementing intelligent data governance, authors will establish and present practical strategies to discover databases with high integration efficiency. Conclusion: Through this study, it was possible to establish internal guidelines from an integrated point of view of data governance. The flexiblity of the federated database integration under the practice of the data governance, makes it possible to integrate databases quickly, easily, and effectively. By utilizing the guidelines presented in this study, authors anticipate that the process of integrating multiple databases, including the air quality databases, will evolve into the intelligent data governance based on the federated database integration when establishing the data governance practice in the railway industry.

A Study on Data Resource Management Comparing Big Data Environments with Traditional Environments (전통적 환경과 빅데이터 환경의 데이터 자원 관리 비교 연구)

  • Park, Jooseok;Kim, Inhyun
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.91-102
    • /
    • 2016
  • In traditional environments we have called the data life cycle DIKW, which represents data-information-knowledge-wisdom. In big data environments, on the other hand, we call it DIA, which represents data-insight-action. The difference between the two data life cycles results in new architecture of data resource management. In this paper, we study data resource management architecture for big data environments. Especially main components of the architecture are proposed in this paper.

  • PDF

Brief Paper: An Analysis of Curricula for Data Science Undergraduate Programs

  • Cho, Soosun
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.171-176
    • /
    • 2022
  • Today, it is imperative to educate students on how to best prepare themselves for the new data driven era of the future. Undergraduate education plays an important role in providing students with more Data Science opportunities and expanding the supply of Data Science talent. This paper surveys and analyzes the curricula of Data Science-related bachelor's degree programs in the United States. The 'required' and 'elective' courses in a curriculum for obtaining a B.S. degree were evaluated by course weight to indicate its necessity. As a result, it was possible to find out which courses were important in Data Science programs and which areas were emphasized for B.S. degrees in Data Science. We found that courses belong to the Data Science area, such as data management, data visualization, and data modeling, were more required for Data Science B.S. degrees in the United States.