• Title/Summary/Keyword: Distributed Data

Search Result 5,976, Processing Time 0.033 seconds

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data (빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계)

  • Kim, Do Gyun;Choi, Jin Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.4
    • /
    • pp.553-566
    • /
    • 2020
  • Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

Design of Distributed Cloud System for Managing large-scale Genomic Data

  • Seine Jang;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.119-126
    • /
    • 2024
  • The volume of genomic data is constantly increasing in various modern industries and research fields. This growth presents new challenges and opportunities in terms of the quantity and diversity of genetic data. In this paper, we propose a distributed cloud system for integrating and managing large-scale gene databases. By introducing a distributed data storage and processing system based on the Hadoop Distributed File System (HDFS), various formats and sizes of genomic data can be efficiently integrated. Furthermore, by leveraging Spark on YARN, efficient management of distributed cloud computing tasks and optimal resource allocation are achieved. This establishes a foundation for the rapid processing and analysis of large-scale genomic data. Additionally, by utilizing BigQuery ML, machine learning models are developed to support genetic search and prediction, enabling researchers to more effectively utilize data. It is expected that this will contribute to driving innovative advancements in genetic research and applications.

Design of a ParamHub for Machine Learning in a Distributed Cloud Environment

  • Su-Yeon Kim;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.161-168
    • /
    • 2024
  • As the size of big data models grows, distributed training is emerging as an essential element for large-scale machine learning tasks. In this paper, we propose ParamHub for distributed data training. During the training process, this agent utilizes the provided data to adjust various conditions of the model's parameters, such as the model structure, learning algorithm, hyperparameters, and bias, aiming to minimize the error between the model's predictions and the actual values. Furthermore, it operates autonomously, collecting and updating data in a distributed environment, thereby reducing the burden of load balancing that occurs in a centralized system. And Through communication between agents, resource management and learning processes can be coordinated, enabling efficient management of distributed data and resources. This approach enhances the scalability and stability of distributed machine learning systems while providing flexibility to be applied in various learning environments.

Optimization and Performance Analysis of Cloud Computing Platform for Distributed Processing of Big Data (대용량 데이터의 분산 처리를 위한 클라우드 컴퓨팅 환경 최적화 및 성능평가)

  • Hong, Seung-Tae;Shin, Young-Sung;Chang, Jae-Woo
    • Spatial Information Research
    • /
    • v.19 no.4
    • /
    • pp.55-71
    • /
    • 2011
  • Recently, interest in cloud computing which provides IT resources as service form in IT field is increasing. As a result, much research has been done on the distributed data processing that store and manage a large amount of data in many servers. Meanwhile, in order to effectively utilize the spatial data which is rapidly increasing day by day with the growth of GIS technology, distributed processing of spatial data using cloud computing is essential. Therefore, in this paper, we review the representative distributed data processing techniques and we analyze the optimization requirements for performance improvement of the distributed processing techniques for a large amount of data. In addition, we uses the Hadoop and we evaluate the performance of the distributed data processing techniques for their optimization requirements.

An Efficient Data Distribution Method on a Distributed Shared Memory Machine (분산공유 메모리 시스템 상에서의 효율적인 자료분산 방법)

  • Min, Ok-Gee
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1433-1442
    • /
    • 1996
  • Data distribution of SPMD(Single Program Multiple Data) pattern is one of main features of HPF (High Performance Fortran). This paper describes design is sues for such data distribution and its efficient execution model on TICOM IV computer, named SPAX(Scalable Parallel Architecture computer based on X-bar network). SPAX has a hierarchical clustering structure that uses distributed shared memory(DSM). In such memory structure, it cannot make a full system utilization to apply unanimously either SMDD(shared Memory Data Distribution) or DMDD(Distributed Memory Data Distribution). Here we propose another data distribution model, called DSMDD(Distributed Shared Memory Data Distribution), a data distribution model based on hierarchical masters-slaves scheme. In this model, a remote master and slaves are designated in each node, shared address scheme is used within a node and message passing scheme between nodes. In our simulation, assuming a node size in which system performance degradation is minimized,DSMDD is more effective than SMDD and DMDD. Especially,the larger number of logical processors and the less data dependency between distributed data,the better performace is obtained.

  • PDF

Network Type Distributed Control System with Considering Data Collision (데이터 충돌을 고려한 네트워크형 분산 제어 시스템)

  • Choi, Goon-Ho
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.29 no.1
    • /
    • pp.113-120
    • /
    • 2015
  • Network type distributed control system uses a communication line which is named the BUS to exchange a data among the sub-systems. Usually, on the bus, only one data must be exited at one time, so the control algorithm to prevent collision or to manage a priority of data is important. Including CAN Protocol, many kind of FieldBus which are used for distributed control system, prevent data collision by controlling transmission time. But, a system which have to make a control signal or get a data from a sensor at fixed time will be met a problem when it is composed by using a network type distributed control structure. In this paper, some of these cases will be discussed and solutions be proposed for preventing a data collision. Also, using Arago Disk System which have a structure for inner loop control, the validity of the proposed methods will be verified.

A Study on a Distributed Data Fabric-based Platform in a Multi-Cloud Environment

  • Moon, Seok-Jae;Kang, Seong-Beom;Park, Byung-Joon
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.321-326
    • /
    • 2021
  • In a multi-cloud environment, it is necessary to minimize physical movement for efficient interoperability of distributed source data without building a data warehouse or data lake. And there is a need for a data platform that can easily access data anywhere in a multi-cloud environment. In this paper, we propose a new platform based on data fabric centered on a distributed platform suitable for cloud environments that overcomes the limitations of legacy systems. This platform applies the knowledge graph database technique to the physical linkage of source data for interoperability of distributed data. And by integrating all data into one scalable platform in a multi-cloud environment, it uses the holochain technique so that companies can easily access and move data with security and authority guaranteed regardless of where the data is stored. The knowledge graph database mitigates the problem of heterogeneous conflicts of data interoperability in a decentralized environment, and Holochain accelerates the memory and security processing process on traditional blockchains. In this way, data access and sharing of more distributed data interoperability becomes flexible, and metadata matching flexibility is effectively handled.

The Transfer Technique among Decision Tree Models for Distributed Data Mining (분산형 데이터마이닝 구현을 위한 의사결정나무 모델 전송 기술)

  • Kim, Choong-Gon;Woo, Jung-Geun;Baik, Sung-Wook
    • Journal of Digital Contents Society
    • /
    • v.8 no.3
    • /
    • pp.309-314
    • /
    • 2007
  • A decision tree algorithm should be modified to be suitable in distributed and collaborative environments for distributed data mining. The distributed data mining system proposed in this paper consists of several agents and a mediator. Each agent deals with a local data mining for data in each local site and communicates with one another to build the global decision tree model. The mediator helps several agents to efficiently communicate among them. One of advantages in distributed data mining is to save much time to analyze huge data with several agents. The paper focuses on a transfer technique among agents dealing with each local decision tree model to reduce huge overhead in communication among them.

  • PDF

Ontology data processing method in distributed semantic web environment (분산 시맨틱웹 환경에서의 온톨로지 데이터 처리 기법 연구)

  • Kim, Byung-Gon;Oh, Sung-Kyun
    • Journal of Digital Contents Society
    • /
    • v.9 no.2
    • /
    • pp.277-284
    • /
    • 2008
  • As the increasing of users' request about internet web service, the importance of ontology to construct semantic web is increasing now. Early Internet data processing was studied in the form of data integration through centralized ontology construction. However, because of distributed environment of internet, when integrating data of distributed site, it is required to integrate data of each site in terms of peer-to-peer data processing for corresponding to fast change of internet. In this paper, in distributed environment, we propose data processing method which construct ontology in each site with ontology language OWL. Furthermore, through relational representation of OWL, we propose the system containing distributed query processing for data constructed in different site with different method.

  • PDF

Data Resource Management under Distributed Computing Environment (분산 컴퓨팅 환경하에서의 데이타 자원 관리)

  • 조희경;안중호
    • Proceedings of the Korea Database Society Conference
    • /
    • 1994.09a
    • /
    • pp.105-129
    • /
    • 1994
  • The information system of corporations are facing a new environment expressed by miniaturization, decentralization and Open System. It is therefore of utmost importance for corporations to adapt flexibly th such new environment by providing for corresponding changes to their existing information systems. The objectives of this study are to identify this new environment faced by today′s information system and develop effective methods for data resource management under this new environment. In this study, it is assumed that the new environment faced by information systems can be specified as Distributed Computing Environment, and in order to achieve such system, presents Client/server architecture as its representative computing structure, This study defines Client/server architecture as a computing architecture which specialize the fuctionality of the client system and the server system in order to have an application distribute and perform cooperative processing at the best platform. Furthermore, from among the five structures utilized in Client/server architecture for distribution and cooperative processing of application between server and client this study presents two different data management methods under the Client/server environment; one is "Remote Data Management Method" which uses file server or database server and. the other is "Distributed Data Management Method" using distributed database management system. The result of this study leads to the conclusion that in the client/server environment although distributed application is assumed, the data could become centralized (in the case of file server or database server) or decentralized (in the case of distributed database system) and the data management method through a distributed database system where complete responsibility and powers with respect to control of data used by the user are given not only is it more adaptable to modern flexible corporate environment, but in terms of system operation, it presents a more efficient data management alternative compared to existing data management methods in terms of cutting costs.

  • PDF