• 제목/요약/키워드: Data Paper

검색결과 56,198건 처리시간 0.063초

Bayesian Estimation for the Multiple Regression with Censored Data : Mutivariate Normal Error Terms

  • Yoon, Yong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • 제9권2호
    • /
    • pp.165-172
    • /
    • 1998
  • This paper considers a linear regression model with censored data where each error term follows a multivariate normal distribution. In this paper we consider the diffuse prior distribution for parameters of the linear regression model. With censored data we derive the full conditional densities for parameters of a multiple regression model in order to obtain the marginal posterior densities of the relevant parameters through the Gibbs Sampler, which was proposed by Geman and Geman(1984) and utilized by Gelfand and Smith(1990) with statistical viewpoint.

  • PDF

A Mixed Model for Oredered Response Categories

  • Choi, Jae-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권2호
    • /
    • pp.339-345
    • /
    • 2004
  • This paper deals with a mixed logit model for ordered polytomous data. There are two types of factors affecting the response varable in this paper. One is a fixed factor with finite quantitative levels and the other is a random factor coming from an experimental structure such as a randomized complete block design. It is discussed how to set up the model for analyzing ordered polytomous data and illustrated how to estimate the paramers in the given model.

  • PDF

문헌정보학 분야 연구데이터 공유에 관한 연구 (A Study on the Sharing of Research Data in Library and Information Science Field)

  • 조재인
    • 정보관리학회지
    • /
    • 제34권4호
    • /
    • pp.59-79
    • /
    • 2017
  • 본 연구는 Figshare를 통해 공유되고 있는 문헌정보학분야 연구데이터의 유형, 주제, 공개 수준 등을 분석하고 재사용성이 상대적으로 높은 데이터의 특성을 통계적으로 해석해 보았다. 분석 결과 데이터의 유형은 dataset과 paper 유형이, 주제 분야는 open access와 research data가 가장 많은 비중을 차지하였으며, 70%에 가까운 연구데이터가 pdf와 같이 편집과 재사용이 원활하지 않은 형태로 공개되어 있는 것으로 조사되었다. 또한 연구데이터의 특성과 활용 정도간의 관계 분석 결과, 주제에 있어서는 APC(Article Processing Charge)를 비롯한 open access 영역이 가장 많이 활용되고 있는 것으로 나타났으며, 데이터 유형에 있어서는 paper의 활용도가 가장 높은 것으로 나타났다.

Data Reduction Method in Massive Data Sets

  • Namo, Gecynth Torre;Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • 제7권1호
    • /
    • pp.35-40
    • /
    • 2009
  • Many researchers strive to research on ways on how to improve the performance of RFID system and many papers were written to solve one of the major drawbacks of potent technology related with data management. As RFID system captures billions of data, problems arising from dirty data and large volume of data causes uproar in the RFID community those researchers are finding ways on how to address this issue. Especially, effective data management is important to manage large volume of data. Data reduction techniques in attempts to address the issues on data are also presented in this paper. This paper introduces readers to a new data reduction algorithm that might be an alternative to reduce data in RFID Systems. A process on how to extract data from the reduced database is also presented. Performance study is conducted to analyze the new data reduction algorithm. Our performance analysis shows the utility and feasibility of our categorization reduction algorithms.

A Data-driven Approach for Computational Simulation: Trend, Requirement and Technology

  • Lee, Sunghee;Ahn, Sunil;Joo, Wonkyun;Yang, Myungseok;Yu, Eunji
    • 인터넷정보학회논문지
    • /
    • 제19권1호
    • /
    • pp.123-130
    • /
    • 2018
  • With the emergence of a new paradigm called Open Science and Big Data, the need for data sharing and collaboration is also emerging in the computational science field. This paper, we analyzed data-driven research cases for computational science by field; material design, bioinformatics, high energy physics. We also studied the characteristics of the computational science data and the data management issues. To manage computational science data effectively it is required to have data quality management, increased data reliability, flexibility to support a variety of data types, and tools for analysis and linkage to the computing infrastructure. In addition, we analyzed trends of platform technology for efficient sharing and management of computational science data. The main contribution of this paper is to review the various computational science data repositories and related platform technologies to analyze the characteristics of computational science data and the problems of data management, and to present design considerations for building a future computational science data platform.

분산제어시스템을 위한 타이머 제어형 통신망의 주기 및 실시간 비주기 데이터 전송 방식 (Tramsmission Method of Periodic and Aperiodic Real-Time Data on a Timer-Controlled Network for Distributed Control Systems)

  • 문홍주;박홍성
    • 제어로봇시스템학회논문지
    • /
    • 제6권7호
    • /
    • pp.602-610
    • /
    • 2000
  • In communication networks used in safety-critical systems such as control systems in nuclear power plants there exist three types of data traffic : urgent or asynchronous hard real-time data hard real-time periodic data and soft real-time periodic data. it is necessary to allocate a suitable bandwidth to each data traffic in order to meet their real-time constraints. This paper proposes a method to meet the real-time constraints for the three types of data traffic simultaneously under a timer-controlled token bus protocol or the IEEE 802.4 token bus protocol and verifies the validity of the presented method by an example. This paper derives the proper region of the high priority token hold time and the target token rotation time for each station within which the real-time constraints for the three types of data traffic are met, Since the scheduling of the data traffic may reduce the possibility of the abrupt increase of the network load this paper proposes a brief heuristic method to make a scheduling table to satisfy their real-time constraints.

  • PDF

기계학습 활용을 위한 학습 데이터세트 구축 표준화 방안에 관한 연구 (A study on the standardization strategy for building of learning data set for machine learning applications)

  • 최정열
    • 디지털융복합연구
    • /
    • 제16권10호
    • /
    • pp.205-212
    • /
    • 2018
  • 고성능 CPU/GPU의 개발과 심층신경망 등의 인공지능 알고리즘, 그리고 다량의 데이터 확보를 통해 기계학습이 다양한 응용 분야로 확대 적용되고 있다. 특히, 사물인터넷, 사회관계망서비스, 웹페이지, 공공데이터로부터 수집된 다량의 데이터들이 기계학습의 활용에 가속화를 가하고 있다. 기계학습을 위한 학습 데이터세트는 응용 분야와 데이터 종류에 따라 다양한 형식으로 존재하고 있어 효과적으로 데이터를 처리하고 기계학습에 적용하기에 어려움이 따른다. 이에 본 논문은 표준화된 절차에 따라 기계학습을 위한 학습 데이터세트를 구축하기 위한 방안을 연구하였다. 먼저 학습 데이터세트가 갖추어야할 요구사항을 문제 유형과 데이터 유형별로 분석하였다. 이를 토대로 기계학습 활용을 위한 학습 데이터세트 구축에 관한 참조모델을 제안하였다. 또한 학습 데이터세트 구축 참조모델을 국제 표준으로 개발하기 위해 대상 표준화 기구의 선정 및 표준화 전략을 제시하였다.

철도 산업의 공기 질 데이터베이스 연합형 통합을 위한 지능형 데이터 거버넌스 (Intelligent Data Governance for the Federated Integration of Air Quality Databases in the Railway Industry)

  • 김민정;원종운;박상찬;박가영
    • 품질경영학회지
    • /
    • 제50권4호
    • /
    • pp.811-830
    • /
    • 2022
  • Purpose: In this paper, we will discuss 1) prioritizing databases to be integrated; 2) which data elements should be emphasized in federated database integration; and 3) the degree of efficiency in the integration. This paper aims to lay the groundwork for building data governance by presenting guidelines for database integration using metrics to identify and evaluate the capabilities of the UK's air quality databases. Methods: This paper intends to perform relative efficiency analysis using Data Envelope Analysis among the multi-criteria decision-making methods. In federated database integration, it is important to identify databases with high integration efficiency when prioritizing databases to be integrated. Results: The outcome of this paper aims not to present performance indicators for the implementation and evaluation of data governance, but rather to discuss what criteria should be used when performing 'federated integration'. Using Data Envelope Analysis in the process of implementing intelligent data governance, authors will establish and present practical strategies to discover databases with high integration efficiency. Conclusion: Through this study, it was possible to establish internal guidelines from an integrated point of view of data governance. The flexiblity of the federated database integration under the practice of the data governance, makes it possible to integrate databases quickly, easily, and effectively. By utilizing the guidelines presented in this study, authors anticipate that the process of integrating multiple databases, including the air quality databases, will evolve into the intelligent data governance based on the federated database integration when establishing the data governance practice in the railway industry.

전통적 환경과 빅데이터 환경의 데이터 자원 관리 비교 연구 (A Study on Data Resource Management Comparing Big Data Environments with Traditional Environments)

  • 박주석;김인현
    • 한국빅데이터학회지
    • /
    • 제1권2호
    • /
    • pp.91-102
    • /
    • 2016
  • 전통적인 환경에서 데이터 생명주기는 데이터-정보-지식-지혜 전환과정으로 요약된다. 반면에 빅데이터 환경에서 데이터 생명주기는 데이터-통찰-실행 전환과정으로 요약된다. 이러한 전환과정의 차이점은 데이터 생명주기를 지원하는 데이터 자원 관리에도 변화를 요구한다. 본 논문에서는 전통적인 데이터 자원 관리와 비교하여 빅데이터 환경을 위한 데이터 자원 관리를 연구한다. 특히 빅데이터 자원관리를 위한 주요 구성요소를 제안한다.

  • PDF

Brief Paper: An Analysis of Curricula for Data Science Undergraduate Programs

  • Cho, Soosun
    • Journal of Multimedia Information System
    • /
    • 제9권2호
    • /
    • pp.171-176
    • /
    • 2022
  • Today, it is imperative to educate students on how to best prepare themselves for the new data driven era of the future. Undergraduate education plays an important role in providing students with more Data Science opportunities and expanding the supply of Data Science talent. This paper surveys and analyzes the curricula of Data Science-related bachelor's degree programs in the United States. The 'required' and 'elective' courses in a curriculum for obtaining a B.S. degree were evaluated by course weight to indicate its necessity. As a result, it was possible to find out which courses were important in Data Science programs and which areas were emphasized for B.S. degrees in Data Science. We found that courses belong to the Data Science area, such as data management, data visualization, and data modeling, were more required for Data Science B.S. degrees in the United States.