• Title/Summary/Keyword: HADOOP

Search Result 398, Processing Time 0.023 seconds

Estimation of ship operational efficiency from AIS data using big data technology

  • Kim, Seong-Hoon;Roh, Myung-Il;Oh, Min-Jae;Park, Sung-Woo;Kim, In-Il
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.12 no.1
    • /
    • pp.440-454
    • /
    • 2020
  • To prevent pollution from ships, the Energy Efficiency Design Index (EEDI) is a mandatory guideline for all new ships. The Ship Energy Efficiency Management Plan (SEEMP) has also been applied by MARPOL to all existing ships. SEEMP provides the Energy Efficiency Operational Indicator (EEOI) for monitoring the operational efficiency of a ship. By monitoring the EEOI, the shipowner or operator can establish strategic plans, such as routing, hull cleaning, decommissioning, new building, etc. The key parameter in calculating EEOI is Fuel Oil Consumption (FOC). It can be measured on board while a ship is operating. This means that only the shipowner or operator can calculate the EEOI of their own ships. If the EEOI can be calculated without the actual FOC, however, then the other stakeholders, such as the shipbuilding company and Class, or others who don't have the measured FOC, can check how efficiently their ships are operating compared to other ships. In this study, we propose a method to estimate the EEOI without requiring the actual FOC. The Automatic Identification System (AIS) data, ship static data, and environment data that can be publicly obtained are used to calculate the EEOI. Since the public data are of large capacity, big data technologies, specifically Hadoop and Spark, are used. We verify the proposed method using actual data, and the result shows that the proposed method can estimate EEOI from public data without actual FOC.

An Analysis of Existing Studies on Parallel and Distributed Processing of the Rete Algorithm (Rete 알고리즘의 병렬 및 분산 처리에 관한 기존 연구 분석)

  • Kim, Jaehoon
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.7
    • /
    • pp.31-45
    • /
    • 2019
  • The core technologies for intelligent services today are deep learning, that is neural networks, and parallel and distributed processing technologies such as GPU parallel computing and big data. However, for intelligent services and knowledge sharing services through globally shared ontologies in the future, there is a technology that is better than the neural networks for representing and reasoning knowledge. It is a knowledge representation of IF-THEN in RIF or SWRL, which is the standard rule language of the Semantic Web, and can be inferred efficiently using the rete algorithm. However, when the number of rules processed by the rete algorithm running on a single computer is 100,000, its performance becomes very poor with several tens of minutes, and there is an obvious limitation. Therefore, in this paper, we analyze the past and current studies on parallel and distributed processing of rete algorithm, and examine what aspects should be considered to implement an efficient rete algorithm.

An Analysis of Impact on the Quality of Life for Chronic Patients based Big Data (빅데이터 기반 만성질환자의 삶의 질에 미치는 영향분석)

  • Kim, Min-kyoung;Cho, Young-bok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.11
    • /
    • pp.1351-1356
    • /
    • 2019
  • The purpose of this study is to investigate the effect of personal factors and community factors on the quality of life based on the presence of chronic patients based on the Big Data Platform. As a method of study, second data of 2017 community health survey and Statistics Korea by City·Gun·Gu public office were used and a multi-level analysis was conducted after separating EQ-5D index, individual factor and community factor. As a result, men, age, education level, monthly household income, having economic activity, the number of sports infrastructure were positively associated with the quality of life, and subjective health not good, extremely perceived stress were negatively associated with the quality of life. Research will continue to provide a platform independent of hardware that can utilize the cloud and open source for medical big data analysis in the future.

FAST Design for Large-Scale Satellite Image Processing (대용량 위성영상 처리를 위한 FAST 시스템 설계)

  • Lee, Youngrim;Park, Wanyong;Park, Hyunchun;Shin, Daesik
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.25 no.4
    • /
    • pp.372-380
    • /
    • 2022
  • This study proposes a distributed parallel processing system, called the Fast Analysis System for remote sensing daTa(FAST), for large-scale satellite image processing and analysis. FAST is a system that designs jobs in vertices and sequences, and distributes and processes them simultaneously. FAST manages data based on the Hadoop Distributed File System, controls entire jobs based on Apache Spark, and performs tasks in parallel in multiple slave nodes based on a docker container design. FAST enables the high-performance processing of progressively accumulated large-volume satellite images. Because the unit task is performed based on Docker, it is possible to reuse existing source codes for designing and implementing unit tasks. Additionally, the system is robust against software/hardware faults. To prove the capability of the proposed system, we performed an experiment to generate the original satellite images as ortho-images, which is a pre-processing step for all image analyses. In the experiment, when FAST was configured with eight slave nodes, it was found that the processing of a satellite image took less than 30 sec. Through these results, we proved the suitability and practical applicability of the FAST design.

Development of Big Data and AutoML Platforms for Smart Plants (스마트 플랜트를 위한 빅데이터 및 AutoML 플랫폼 개발)

  • Jin-Young Kang;Byeong-Seok Jeong
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.83-95
    • /
    • 2023
  • Big data analytics and AI play a critical role in the development of smart plants. This study presents a big data platform for plant data and an 'AutoML platform' for AI-based plant O&M(Operation and Maintenance). The big data platform collects, processes and stores large volumes of data generated in plants using Hadoop, Spark, and Kafka. The AutoML platform is a machine learning automation system aimed at constructing predictive models for equipment prognostics and process optimization in plants. The developed platforms configures a data pipeline considering compatibility with existing plant OISs(Operation Information Systems) and employs a web-based GUI to enhance both accessibility and convenience for users. Also, it has functions to load user-customizable modules into data processing and learning algorithms, which increases process flexibility. This paper demonstrates the operation of the platforms for a specific process of an oil company in Korea and presents an example of an effective data utilization platform for smart plants.

Optimization and Performance Analysis of Distributed Parallel Processing Platform for Terminology Recognition System (전문용어 인식 시스템을 위한 분산 병렬 처리 플랫폼 최적화 및 성능평가)

  • Choi, Yun-Soo;Lee, Won-Goo;Lee, Min-Ho;Choi, Dong-Hoon;Yoon, Hwa-Mook;Song, Sa-kwang;Jung, Han-Min
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.10
    • /
    • pp.1-10
    • /
    • 2012
  • Many statistical methods have been adapted for terminology recognition to improve its accuracy. However, since previous studies have been carried out in a single core or a single machine, they have difficulties in real-time analysing explosively increasing documents. In this study, the task where bottlenecks occur in the process of terminology recognition is classified into linguistic processing in the process of 'candidate terminology extraction' and collection of statistical information in the process of 'terminology weight assignment'. A terminology recognition system is implemented and experimented to address each task by means of the distributed parallel processing-based MapReduce. The experiments were performed in two ways; the first experiment result revealed that distributed parallel processing by means of 12 nodes improves processing speed by 11.27 times as compared to the case of using a single machine and the second experiment was carried out on 1) default environment, 2) multiple reducers, 3) combiner, and 4) the combination of 2)and 3), and the use of 3) showed the best performance. Our terminology recognition system contributes to speed up knowledge extraction of large scale science and technology documents.

Design of Splunk Platform based Big Data Analysis System for Objectionable Information Detection (Splunk 플랫폼을 활용한 유해 정보 탐지를 위한 빅데이터 분석 시스템 설계)

  • Lee, Hyeop-Geon;Kim, Young-Woon;Kim, Ki-Young;Choi, Jong-Seok
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.1
    • /
    • pp.76-81
    • /
    • 2018
  • The Internet of Things (IoT), which is emerging as a future economic growth engine, has been actively introduced in areas close to our daily lives. However, there are still IoT security threats that need to be resolved. In particular, with the spread of smart homes and smart cities, an explosive amount of closed-circuit televisions (CCTVs) have been installed. The Internet protocol (IP) information and even port numbers assigned to CCTVs are open to the public via search engines of web portals or on social media platforms, such as Facebook and Twitter; even with simple tools these pieces of information can be easily hacked. For this reason, a big-data analytics system is needed, capable of supporting quick responses against data, that can potentially contain risk factors to security or illegal websites that may cause social problems, by assisting in analyzing data collected by search engines and social media platforms, frequently utilized by Internet users, as well as data on illegal websites.

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

Design of a MapReduce-Based Mobility Pattern Mining System for Next Place Prediction (다음 장소 예측을 위한 맵리듀스 기반의 이동 패턴 마이닝 시스템 설계)

  • Kim, Jongwhan;Lee, Seokjun;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.8
    • /
    • pp.321-328
    • /
    • 2014
  • In this paper, we present a MapReduce-based mobility pattern mining system which can predict efficiently the next place of mobile users. It learns the mobility pattern model of each user, represented by Hidden Markov Models(HMM), from a large-scale trajectory dataset, and then predicts the next place for the user to visit by applying the learned models to the current trajectory. Our system consists of two parts: the back-end part, in which the mobility pattern models are learned for individual users, and the front-end part, where the next place for a certain user to visit is predicted based on the mobility pattern models. While the back-end part comprises of three distinct MapReduce modules for POI extraction, trajectory transformation, and mobility pattern model learning, the front-end part has two different modules for candidate route generation and next place prediction. Map and reduce functions of each module in our system were designed to utilize the underlying Hadoop infrastructure enough to maximize the parallel processing. We performed experiments to evaluate the performance of the proposed system by using a large-scale open benchmark dataset, GeoLife, and then could make sure of high performance of our system as results of the experiments.

Digital Forensic Investigation of HBase (HBase에 대한 디지털 포렌식 조사 기법 연구)

  • Park, Aran;Jeong, Doowon;Lee, Sang Jin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.2
    • /
    • pp.95-104
    • /
    • 2017
  • As the technology in smart device is growing and Social Network Services(SNS) are becoming more common, the data which is difficult to be processed by existing RDBMS are increasing. As a result of this, NoSQL databases are getting popular as an alternative for processing massive and unstructured data generated in real time. The demand for the technique of digital investigation of NoSQL databases is increasing as the businesses introducing NoSQL database in their system are increasing, although the technique of digital investigation of databases has been researched centered on RDMBS. New techniques of digital forensic investigation are needed as NoSQL Database has no schema to normalize and the storage method differs depending on the type of database and operation environment. Research on document-based database of NoSQL has been done but it is not applicable as itself to other types of NoSQL Database. Therefore, the way of operation and data model, grasp of operation environment, collection and analysis of artifacts and recovery technique of deleted data in HBase which is a NoSQL column-based database are presented in this paper. Also the proposed technique of digital forensic investigation to HBase is verified by an experimental scenario.