• Title/Summary/Keyword: Big data processing

Search Result 1,063, Processing Time 0.032 seconds

Study on Data Processing of the IOT Sensor Network Based on a Hadoop Cloud Platform and a TWLGA Scheduling Algorithm

  • Li, Guoyu;Yang, Kang
    • Journal of Information Processing Systems
    • /
    • v.17 no.6
    • /
    • pp.1035-1043
    • /
    • 2021
  • An Internet of Things (IOT) sensor network is an effective solution for monitoring environmental conditions. However, IOT sensor networks generate massive data such that the abilities of massive data storage, processing, and query become technical challenges. To solve the problem, a Hadoop cloud platform is proposed. Using the time and workload genetic algorithm (TWLGA), the data processing platform enables the work of one node to be shared with other nodes, which not only raises efficiency of one single node but also provides the compatibility support to reduce the possible risk of software and hardware. In this experiment, a Hadoop cluster platform with TWLGA scheduling algorithm is developed, and the performance of the platform is tested. The results show that the Hadoop cloud platform is suitable for big data processing requirements of IOT sensor networks.

The Overview of the Public Opinion Survey and Emerging Ethical Challenges in the Healthcare Big Data Research (보건의료빅데이터 연구에 대한 대중의 인식도 조사 및 윤리적 고찰)

  • Cho, Su Jin;Choe, Byung In
    • The Journal of KAIRB
    • /
    • v.4 no.1
    • /
    • pp.16-22
    • /
    • 2022
  • Purpose: The traditional ethical study only suggests a blurred insight on the research using medical big data, especially in this rapid-changing and demanding environment which is called "4th Industry Revolution." Current institutional/ethical issues in big data research need to approach with the thoughtful insight of past ethical study reflecting the understanding of present conditions of this study. This study aims to examine the ethical issues that are emerging in recent health care big data research. So, this study aims to survey the public perceptions on of health care big data as part of the process of public discourse and the acceptance of the utility and provision of big data research as a subject of health care information. In addition, the emerging ethical challenges and how to comply with ethical principles in accordance with principles of the Belmont report will be discussed. Methods: Survey was conducted from June 3th August to 6th September 2020. The online survey was conducted through voluntary participation through Internet users. A total of 319 people who completed the survey (±5.49%P [95% confidence level] were analyzed. Results: In the area of the public's perspective, the survey showed that the medical information is useful for new medical development, but it is also necessary to obtain consents from subjects in order to use that medical information for various research purposes. In addition, many people were more concerned about the possibility of re-identifying personal information in medical big data. Therefore, they mentioned the necessity of transparency and privacy protection in the use of medical information. Conclusion: Big data on medical care is a core resource for the development of medicine directly related to human life, and it is necessary to open up medical data in order to realize the public good. But the ethical principles should not be overlooked. The right to self-determination must be guaranteed by means of clear, diverse consent or withdrawal of subjects, and processed in a lawful, fair and transparent manner in the processing of personal information. In addition, scientific and ethical validity of medical big data research is indispensable. Such ethical healthcare data is the only key that will lead to innovation in the future.

  • PDF

Design of Incremental K-means Clustering-based Radial Basis Function Neural Networks Model (증분형 K-means 클러스터링 기반 방사형 기저함수 신경회로망 모델 설계)

  • Park, Sang-Beom;Lee, Seung-Cheol;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.5
    • /
    • pp.833-842
    • /
    • 2017
  • In this study, the design methodology of radial basis function neural networks based on incremental K-means clustering is introduced for learning and processing the big data. If there is a lot of dataset to be trained, general clustering may not learn dataset due to the lack of memory capacity. However, the on-line processing of big data could be effectively realized through the parameters operation of recursive least square estimation as well as the sequential operation of incremental clustering algorithm. Radial basis function neural networks consist of condition part, conclusion part and aggregation part. In the condition part, incremental K-means clustering algorithms is used tweights of the conclusion part are given as linear function and parameters are calculated using recursive least squareo get the center points of data and find the fitness using gaussian function as the activation function. Connection s estimation. In the aggregation part, a final output is obtained by center of gravity method. Using machine learning data, performance index are shown and compared with other models. Also, the performance of the incremental K-means clustering based-RBFNNs is carried out by using PSO. This study demonstrates that the proposed model shows the superiority of algorithmic design from the viewpoint of on-line processing for big data.

Design of a Large-scale Task Dispatching & Processing System based on Hadoop (하둡 기반 대규모 작업 배치 및 처리 기술 설계)

  • Kim, Jik-Soo;Cao, Nguyen;Kim, Seoyoung;Hwang, Soonwook
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.613-620
    • /
    • 2016
  • This paper presents a MOHA(Many-Task Computing on Hadoop) framework which aims to effectively apply the Many-Task Computing(MTC) technologies originally developed for high-performance processing of many tasks, to the existing Big Data processing platform Hadoop. We present basic concepts, motivation, preliminary results of PoC based on distributed message queue, and future research directions of MOHA. MTC applications may have relatively low I/O requirements per task. However, a very large number of tasks should be efficiently processed with potentially heavy inter-communications based on files. Therefore, MTC applications can show another pattern of data-intensive workloads compared to existing Hadoop applications, typically based on relatively large data block sizes. Through an effective convergence of MTC and Big Data technologies, we can introduce a new MOHA framework which can support the large-scale scientific applications along with the Hadoop ecosystem, which is evolving into a multi-application platform.

Pre-processing for IPC Classification of Patent Documents (특허문서의 IPC 분류를 위한 데이터 변환 및 통합)

  • Su-Hyun Park;Jin Kim
    • Annual Conference of KIPS
    • /
    • 2023.11a
    • /
    • pp.367-368
    • /
    • 2023
  • 4차 산업혁명으로 다양한 기술과 아이디어가 생겨나고 있고, 이를 보호하기 위한 특허는 그 등록 건수가 매년 증가하는 추세이다. 그러나 현재 특허문서를 분류하는 과정을 수동으로 진행하고 있기에 이를 자동으로 진행할 수 있는 분류기를 생성할 필요를 느꼈고, 본 논문에서는 특허문서를 분류기에 적용할 데이터의 전처리 과정 중 데이터 변환과 통합 과정을 다루었다.

Big data-based information recommendation system (빅데이터 기반 정보 추천 시스템)

  • Lee, Jong-Chan;Lee, Moon-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.3
    • /
    • pp.443-450
    • /
    • 2018
  • Due to the improvement of quality of life, health care is a main concern of modern people, and the demand for healthcare system is increasing naturally. However, it is difficult to provide customized wellness information suitable for a specific user because there are various medical information on the Internet and it is difficult to estimate the reliability of the information. In this study, we propose a user - centered service that can provide customized service suitable for users rather than simple search function by classifying big data as text mining and providing personalized medical information. We built a big data system and measured the data processing time while increasing the Hadoop slave node for efficient big data analysis. It is confirmed that it is efficient to build big data system than existing system.

Design and Implementation of a Big Data Analytics Framework based on Cargo DTG Data for Crackdown on Overloaded Trucks

  • Kim, Bum-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.12
    • /
    • pp.67-74
    • /
    • 2019
  • In this paper, we design and implement an analytics platform based on bulk cargo DTG data for crackdown on overloaded trucks. DTG(digital tachograph) is a device that stores the driving record in real time; that is, it is a device that records the vehicle driving related data such as GPS, speed, RPM, braking, and moving distance of the vehicle in one second unit. The fast processing of DTG data is essential for finding vehicle driving patterns and analytics. In particular, a big data analytics platform is required for preprocessing and converting large amounts of DTG data. In this paper, we implement a big data analytics framework based on cargo DTG data using Spark, which is an open source-based big data framework for crackdown on overloaded trucks. As the result of implementation, our proposed platform converts real large cargo DTG data sets into GIS data, and these are visualized by a map. It also recommends crackdown points.

A Development and Application of Data Visualization EducationProgram for 3rd Grade Students in Elementary School (초등학교 3학년 학생들을 위한 데이터 시각화 교육 프로그램 개발 및 적용)

  • Jiseon Woo;Kapsu Kim
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.6
    • /
    • pp.481-490
    • /
    • 2022
  • With the development of computing technology, the big data era has arrived, and we live with a lot of data around us. Elementary school students are no exception. Therefore, it is very important to learn to process data from elementary school. Since elementary school students have intuitive thinking, data visualization, which expresses data directly in pictures, is an important learning element. In this study, we study how effective elementary school students can visualize data in their daily lives to improve their information processing capabilities. Adata visualization program was developed by organizing and visualizing data using data visualization tools for the 8th class, which can be done by third graders in elementary school, and then experiencing the process of interaction. As a result of applying the developed program to 186 students in 7 classes, knowledge information processing competency factors were evaluated before and after class. As a result of the pre- and post-test, there was a significant difference in knowledge information processing capabilities. Therefore, the data visualization program developed in this study is effective.

Comparison analysis of big data integration models (빅데이터 통합모형 비교분석)

  • Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.755-768
    • /
    • 2017
  • As Big Data becomes the core of the fourth industrial revolution, big data-based processing and analysis capabilities are expected to influence the company's future competitiveness. Comparative studies of RHadoop and RHIPE that integrate R and Hadoop environment, have not been discussed by many researchers although RHadoop and RHIPE have been discussed separately. In this paper, we constructed big data platforms such as RHadoop and RHIPE applicable to large scale data and implemented the machine learning algorithms such as multiple regression and logistic regression based on MapReduce framework. We conducted a study on performance and scalability with those implementations for various sample sizes of actual data and simulated data. The experiments demonstrated that our RHadoop and RHIPE can scale well and efficiently process large data sets on commodity hardware. We showed RHIPE is faster than RHadoop in almost all the data generally.

Query Optimization on Large Scale Nested Data with Service Tree and Frequent Trajectory

  • Wang, Li;Wang, Guodong
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.37-50
    • /
    • 2021
  • Query applications based on nested data, the most commonly used form of data representation on the web, especially precise query, is becoming more extensively used. MapReduce, a distributed architecture with parallel computing power, provides a good solution for big data processing. However, in practical application, query requests are usually concurrent, which causes bottlenecks in server processing. To solve this problem, this paper first combines a column storage structure and an inverted index to build index for nested data on MapReduce. On this basis, this paper puts forward an optimization strategy which combines query execution service tree and frequent sub-query trajectory to reduce the response time of frequent queries and further improve the efficiency of multi-user concurrent queries on large scale nested data. Experiments show that this method greatly improves the efficiency of nested data query.