• Title/Summary/Keyword: Source Data

Search Result 6,604, Processing Time 0.037 seconds

An Empirical Study on the Effects of Source Data Quality on the Usefulness and Utilization of Big Data Analytics Results (원천 데이터 품질이 빅데이터 분석결과의 유용성과 활용도에 미치는 영향)

  • Park, Sohyun;Lee, Kukhie;Lee, Ayeon
    • Journal of Information Technology Applications and Management
    • /
    • v.24 no.4
    • /
    • pp.197-214
    • /
    • 2017
  • This study sheds light on the source data quality in big data systems. Previous studies about big data success have called for future research and further examination of the quality factors and the importance of source data. This study extracted the quality factors of source data from the user's viewpoint and empirically tested the effects of source data quality on the usefulness and utilization of big data analytics results. Based on the previous researches and focus group evaluation, four quality factors have been established such as accuracy, completeness, timeliness and consistency. After setting up 11 hypotheses on how the quality of the source data contributes to the usefulness, utilization, and ongoing use of the big data analytics results, e-mail survey was conducted at a level of independent department using big data in domestic firms. The results of the hypothetical review identified the characteristics and impact of the source data quality in the big data systems and drew some meaningful findings about big data characteristics.

A Distributed Privacy-Utility Tradeoff Method Using Distributed Lossy Source Coding with Side Information

  • Gu, Yonghao;Wang, Yongfei;Yang, Zhen;Gao, Yimu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.5
    • /
    • pp.2778-2791
    • /
    • 2017
  • In the age of big data, distributed data providers need to ensure the privacy, while data analysts need to mine the value of data. Therefore, how to find the privacy-utility tradeoff has become a research hotspot. Besides, the adversary may have the background knowledge of the data source. Therefore, it is significant to solve the privacy-utility tradeoff problem in the distributed environment with side information. This paper proposes a distributed privacy-utility tradeoff method using distributed lossy source coding with side information, and quantitatively gives the privacy-utility tradeoff region and Rate-Distortion-Leakage region. Four results are shown in the simulation analysis. The first result is that both the source rate and the privacy leakage decrease with the increase of source distortion. The second result is that the finer relevance between the public data and private data of source, the finer perturbation of source needed to get the same privacy protection. The third result is that the greater the variance of the data source, the slighter distortion is chosen to ensure more data utility. The fourth result is that under the same privacy restriction, the slighter the variance of the side information, the less distortion of data source is chosen to ensure more data utility. Finally, the provided method is compared with current ones from five aspects to show the advantage of our method.

An Evaluation Study on Artificial Intelligence Data Validation Methods and Open-source Frameworks (인공지능 데이터 품질검증 기술 및 오픈소스 프레임워크 분석 연구)

  • Yun, Changhee;Shin, Hokyung;Choo, Seung-Yeon;Kim, Jaeil
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.10
    • /
    • pp.1403-1413
    • /
    • 2021
  • In this paper, we investigate automated data validation techniques for artificial intelligence training, and also disclose open-source frameworks, such as Google's TensorFlow Data Validation (TFDV), that support automated data validation in the AI model development process. We also introduce an experimental study using public data sets to demonstrate the effectiveness of the open-source data validation framework. In particular, we presents experimental results of the data validation functions for schema testing and discuss the limitations of the current open-source frameworks for semantic data. Last, we introduce the latest studies for the semantic data validation using machine learning techniques.

A Novel Redundant Data Storage Algorithm Based on Minimum Spanning Tree and Quasi-randomized Matrix

  • Wang, Jun;Yi, Qiong;Chen, Yunfei;Wang, Yue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.227-247
    • /
    • 2018
  • For intermittently connected wireless sensor networks deployed in hash environments, sensor nodes may fail due to internal or external reasons at any time. In the process of data collection and recovery, we need to speed up as much as possible so that all the sensory data can be restored by accessing as few survivors as possible. In this paper a novel redundant data storage algorithm based on minimum spanning tree and quasi-randomized matrix-QRNCDS is proposed. QRNCDS disseminates k source data packets to n sensor nodes in the network (n>k) according to the minimum spanning tree traversal mechanism. Every node stores only one encoded data packet in its storage which is the XOR result of the received source data packets in accordance with the quasi-randomized matrix theory. The algorithm adopts the minimum spanning tree traversal rule to reduce the complexity of the traversal message of the source packets. In order to solve the problem that some source packets cannot be restored if the random matrix is not full column rank, the semi-randomized network coding method is used in QRNCDS. Each source node only needs to store its own source data packet, and the storage nodes choose to receive or not. In the decoding phase, Gaussian Elimination and Belief Propagation are combined to improve the probability and efficiency of data decoding. As a result, part of the source data can be recovered in the case of semi-random matrix without full column rank. The simulation results show that QRNCDS has lower energy consumption, higher data collection efficiency, higher decoding efficiency, smaller data storage redundancy and larger network fault tolerance.

Blended-Transfer Learning for Compressed-Sensing Cardiac CINE MRI

  • Park, Seong Jae;Ahn, Chang-Beom
    • Investigative Magnetic Resonance Imaging
    • /
    • v.25 no.1
    • /
    • pp.10-22
    • /
    • 2021
  • Purpose: To overcome the difficulty in building a large data set with a high-quality in medical imaging, a concept of 'blended-transfer learning' (BTL) using a combination of both source data and target data is proposed for the target task. Materials and Methods: Source and target tasks were defined as training of the source and target networks to reconstruct cardiac CINE images from undersampled data, respectively. In transfer learning (TL), the entire neural network (NN) or some parts of the NN after conducting a source task using an open data set was adopted in the target network as the initial network to improve the learning speed and the performance of the target task. Using BTL, an NN effectively learned the target data while preserving knowledge from the source data to the maximum extent possible. The ratio of the source data to the target data was reduced stepwise from 1 in the initial stage to 0 in the final stage. Results: NN that performed BTL showed an improved performance compared to those that performed TL or standalone learning (SL). Generalization of NN was also better achieved. The learning curve was evaluated using normalized mean square error (NMSE) of reconstructed images for both target data and source data. BTL reduced the learning time by 1.25 to 100 times and provided better image quality. Its NMSE was 3% to 8% lower than with SL. Conclusion: The NN that performed the proposed BTL showed the best performance in terms of learning speed and learning curve. It also showed the highest reconstructed-image quality with the lowest NMSE for the test data set. Thus, BTL is an effective way of learning for NNs in the medical-imaging domain where both quality and quantity of data are always limited.

Development of Multichannel Marine Seismic Data Acquisition System and its Application (다중채널 해양탄성파탐사 시스템개발과 응용)

  • Shin, Sung-Ryul;Kim, Chan-Su;Yeo, Eun-Min;Kim, Young-Jun
    • Proceedings of the Korean Society of Marine Engineers Conference
    • /
    • 2005.11a
    • /
    • pp.144-145
    • /
    • 2005
  • In this study, we have developed the high resolution multichannel seismic data acquisition system and shallow marine seismic source. It is easy to operate our source system which utilizes piezoelectric transducer of high electrical power. According to water depth, survey condition and purpose, transducer number of source system can be easily changed in order to maximize field applicability. In the recording part, we used 24 bits and 8 channel high speed A/D board in order to achieve the improvement of data quality and the efficiency of data acquisition. The developed system was tested and varied with the data acquisition parameters such as source-receiver offset, and transducer number versus water depth for the field application.

  • PDF

Facilitating Data Source Movement with Time-Division Access in Content-Centric Networking

  • Priyono, Olivica;Kong, In-Yeup;Hwang, Won-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.4
    • /
    • pp.433-440
    • /
    • 2014
  • Wireless communication offers the flexibility to the node movement at the spatial dimension more than the wire communication not only in IP architecture but also in Content-Centric Networking. Although it gives such advantage, the intra-domain movement of a node especially the data source node affects the communication to the access point node which in the end affects the acceptance ratio of the client node that requests the data packets from the data source node. In this paper, we use time-division access method to maintain the acceptance ratio of the client node as the effect of the intra-domain data source node movement in Content-Centric Networking. The simulation result shows that the acceptance ratio of the client node can be maintained using the time-division access method as long as the interval access time is less than the coherence time.

A Comparison of Data Extraction Techniques and an Implementation of Data Extraction Technique using Index DB -S Bank Case- (원천 시스템 환경을 고려한 데이터 추출 방식의 비교 및 Index DB를 이용한 추출 방식의 구현 -ㅅ 은행 사례를 중심으로-)

  • 김기운
    • Korean Management Science Review
    • /
    • v.20 no.2
    • /
    • pp.1-16
    • /
    • 2003
  • Previous research on data extraction and integration for data warehousing has concentrated mainly on the relational DBMS or partly on the object-oriented DBMS. Mostly, it describes issues related with the change data (deltas) capture and the incremental update by using the triggering technique of active database systems. But, little attention has been paid to data extraction approaches from other types of source systems like hierarchical DBMS, etc. and from source systems without triggering capability. This paper argues, from the practical point of view, that we need to consider not only the types of information sources and capabilities of ETT tools but also other factors of source systems such as operational characteristics (i.e., whether they support DBMS log, user log or no log, timestamp), and DBMS characteristics (i.e., whether they have the triggering capability or not, etc), in order to find out appropriate data extraction techniques that could be applied to different source systems. Having applied several different data extraction techniques (e.g., DBMS log, user log, triggering, timestamp-based extraction, file comparison) to S bank's source systems (e.g., IMS, DB2, ORACLE, and SAM file), we discovered that data extraction techniques available in a commercial ETT tool do not completely support data extraction from the DBMS log of IMS system. For such IMS systems, a new date extraction technique is proposed which first creates Index database and then updates the data warehouse using the Index database. We illustrates this technique using an example application.

ACA: Automatic search strategy for radioactive source

  • Jianwen Huo;Xulin Hu;Junling Wang;Li Hu
    • Nuclear Engineering and Technology
    • /
    • v.55 no.8
    • /
    • pp.3030-3038
    • /
    • 2023
  • Nowadays, mobile robots have been used to search for uncontrolled radioactive source in indoor environments to avoid radiation exposure for technicians. However, in the indoor environments, especially in the presence of obstacles, how to make the robots with limited sensing capabilities automatically search for the radioactive source remains a major challenge. Also, the source search efficiency of robots needs to be further improved to meet practical scenarios such as limited exploration time. This paper proposes an automatic source search strategy, abbreviated as ACA: the location of source is estimated by a convolutional neural network (CNN), and the path is planned by the A-star algorithm. First, the search area is represented as an occupancy grid map. Then, the radiation dose distribution of the radioactive source in the occupancy grid map is obtained by Monte Carlo (MC) method simulation, and multiple sets of radiation data are collected through the eight neighborhood self-avoiding random walk (ENSAW) algorithm as the radiation data set. Further, the radiation data set is fed into the designed CNN architecture to train the network model in advance. When the searcher enters the search area where the radioactive source exists, the location of source is estimated by the network model and the search path is planned by the A-star algorithm, and this process is iterated continuously until the searcher reaches the location of radioactive source. The experimental results show that the average number of radiometric measurements and the average number of moving steps of the ACA algorithm are only 2.1% and 33.2% of those of the gradient search (GS) algorithm in the indoor environment without obstacles. In the indoor environment shielded by concrete walls, the GS algorithm fails to search for the source, while the ACA algorithm successfully searches for the source with fewer moving steps and sparse radiometric data.

A Study on the Improvement of the Water Source Energy Distribution Regulation for High Efficient Data Center Cooling System in Korea (데이터센터 냉방시스템 고효율화를 위한 국내 수열에너지 보급 제도 개선에 관한 연구)

  • Cho, Yong;Choi, Jong Min
    • Journal of the Korean Society for Geothermal and Hydrothermal Energy
    • /
    • v.17 no.3
    • /
    • pp.21-29
    • /
    • 2021
  • In this study, the current regulation of the water source energy, one of the renewable energy, was analyzed, and the improvement plan for the high efficient data center cooling system was suggested. In the improvement plan, the design and construction guidelines of the water source energy system permit to adopt the cooling and heating system with or without heat pump. In addition, it should also include the system operated in the cooling mode only all year-round. The domestic test standards to consider the water source operating conditions should be developed. Especially, it is highly recommended that the test standards to include the system with forced cooling and free cooling modes related with the enhanced data center cooling system adopting the water source energy.