• Title/Summary/Keyword: HDFs

Search Result 151, Processing Time 0.045 seconds

Spatial Big Data Query Processing System Supporting SQL-based Query Language in Hadoop (Hadoop에서 SQL 기반 질의언어를 지원하는 공간 빅데이터 질의처리 시스템)

  • Joo, In-Hak
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2017
  • In this paper we present a spatial big data query processing system that can store spatial data in Hadoop and query the data with SQL-based query language. The system stores large-scale spatial data in HDFS-based storage system, and supports spatial queries expressed in SQL-based query language extended for spatial data processing. It supports standard spatial data types and functions defined in OGC simple feature model in the query language. This paper presents the development of core functions of the system including query language parsing, query validation, query planning, and connection with storage system. We compares the performance of the suggested system with an existing system, and our experiments show that the system shows about 58% performance improvement of query execution time over the existing system when executing region query for spatial data stored in Hadoop.

Development of Procurement Announcement Analysis Support System (전자조달공고 분석지원 시스템 개발)

  • Lim, Il-kwon;Park, Dong-Jun;Cho, Han-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.8
    • /
    • pp.53-60
    • /
    • 2018
  • Domestic public e-procurement has been recognized excellence at home and abroad. However, it is difficult for procurement companies to check the related announcements and to grasp the status of procurement announcements at a glance. In this paper, we propose an e-Procurement Announcement Analysis Support System using the HDFS, HDFS, Apache Spark, and Collaborative Filtering Technology for procurement announcement recommendation service and procurement announcement and contract trend analysis service for effective e-procurement system. Procurement announcement recommendation service can relieve the procurement company from searching for announcements according to the characteristics and characteristics of the procurement company. The procurement announcement/contract trend analysis service visualizes the procurement announcement/contract information and procures It is implemented so that the analysis information of electronic procurement can be seen at a glance to the company and the demand organization.

A Distributed Cache Management Scheme for Efficient Accesses of Small Files in HDFS (HDFS에서 소형 파일의 효율적인 접근을 위한 분산 캐시 관리 기법)

  • Oh, Hyunkyo;Kim, Kiyeon;Hwang, Jae-Min;Park, Junho;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.11
    • /
    • pp.28-38
    • /
    • 2014
  • In this paper, we propose the distributed cache management scheme to efficiently access small files in Hadoop Distributed File Systems(HDFS). The proposed scheme can reduce the number of metadata managed by a name node since many small files are merged and stored in a chunk. It is also possible to reduce the file access costs, by keeping the information of requested files using the client cache and data node caches. The client cache keeps small files that a user requests and metadata. Each data node cache keeps the small files that are frequently requested by users. It is shown through performance evaluation that the proposed scheme significantly reduces the processing time over the existing scheme.

Data Access Frequency based Data Replication Method using Erasure Codes in Cloud Storage System (클라우드 스토리지 시스템에서 데이터 접근빈도와 Erasure Codes를 이용한 데이터 복제 기법)

  • Kim, Ju-Kyeong;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.85-91
    • /
    • 2014
  • Cloud storage system uses a distributed file system for storing and managing data. Traditional distributed file system makes a triplication of data in order to restore data loss in disk failure. However, enforcing data replication method increases storage utilization and causes extra I/O operations during replication process. In this paper, we propose a data replication method using erasure codes in cloud storage system to improve storage space efficiency and I/O performance. In particular, according to data access frequency, the proposed method can reduce the number of data replications but using erasure codes can keep the same data recovery performance. Experimental results show that proposed method improves performance in storage efficiency 40%, read throughput 11%, write throughput 10% better than HDFS does.

Design of a Platform for Collecting and Analyzing Agricultural Big Data (농업 빅데이터 수집 및 분석을 위한 플랫폼 설계)

  • Nguyen, Van-Quyet;Nguyen, Sinh Ngoc;Kim, Kyungbaek
    • Journal of Digital Contents Society
    • /
    • v.18 no.1
    • /
    • pp.149-158
    • /
    • 2017
  • Big data have been presenting us with exciting opportunities and challenges in economic development. For instance, in the agriculture sector, mixing up of various agricultural data (e.g., weather data, soil data, etc.), and subsequently analyzing these data deliver valuable and helpful information to farmers and agribusinesses. However, massive data in agriculture are generated in every minute through multiple kinds of devices and services such as sensors and agricultural web markets. It leads to the challenges of big data problem including data collection, data storage, and data analysis. Although some systems have been proposed to address this problem, they are still restricted either in the type of data, the type of storage, or the size of data they can handle. In this paper, we propose a novel design of a platform for collecting and analyzing agricultural big data. The proposed platform supports (1) multiple methods of collecting data from various data sources using Flume and MapReduce; (2) multiple choices of data storage including HDFS, HBase, and Hive; and (3) big data analysis modules with Spark and Hadoop.

Processing Method of Mass Small File Using Hadoop Platform (하둡 플랫폼을 이용한 대량의 스몰파일 처리방법)

  • Kim, Chang-Bok;Chung, Jae-Pil
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.4
    • /
    • pp.401-408
    • /
    • 2014
  • Hadoop is composed with MapReduce programming model for distributed processing and HDFS distributed file system. Hadoop is suitable framework for big data processing, but processing of mass small files have many problems. The processing of mass small file in hadoop have problems to created one mapper per one file, and it have problems to needed many memory for store of meta information of file. This paper have comparison evaluation processing method of mass small file with various method in hadoop platform. The processing of general compression format is inadequate because of processing by one mapper regardless of data size. The processing of sequence and hadoop archive file is removed memory problem of namenode by compress and combine of small file. Hadoop archive file is faster then sequence file about combine time of small file. The processing using CombineFileInputFormat class is needed not combine of small file, and it have similar speed big data processing method.

Analysis on sharing between terrestrial FS and FSS of 40GHz bands, related with HDFSS identification (우리나라 HD-FSS 주파수 분배에 대비한 40GHz 지상망과의 간섭영향 분석)

  • 이일용;성향숙
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.2A
    • /
    • pp.181-186
    • /
    • 2004
  • Analysis on sharing between GSO FSS and terrestrial system in the 40㎓ band, related with the problem for sharing between terrestrial services and FSS and identification of HDFSS downlink bands in World Radiocommunication Conference 2003, was practiced by assuming that both systems are operated in Korea. According to results from simulation using the characteristic parameters of GSO FSS and terrestrial FS system in 40 ㎓ described in ITU-R Recommendations, in case that elevation and azimuth angle of antenna of FS station are adjusted to point directly to the geostationary satellite, the GSO system can cause the worst interference to the FS system. This situation is possible to occur in the installation of 40 GHz FS station in urban area where there are high-rise buildings. If high-density FS stations in 40 ㎓ band are operated in the future, interference mitigation techniques to avoid GSO arc should be considered.

Improvement of the Biocompatibility of Chitosan Dermal Scaffold by Rigorous Dry Heat Treatment

  • Kim, Chun-Ho;Park, Hyun-Sook;Gin, Yong-Jae;Son, Young-Sook;Lim, Sae-Hwan;Park, Young-Ju;Park, Ki-Sook;Park, Chan-Woong
    • Macromolecular Research
    • /
    • v.12 no.4
    • /
    • pp.367-373
    • /
    • 2004
  • We have developed a rigorous heat treatment method to improve the biocompatibility of chitosan as a tissue-engineered scaffold. The chitosan scaffold was prepared by the controlled freezing and lyophilizing method using dilute acetic acid and then it was heat-treated at 110$^{\circ}C$ in vacuo for 1-3 days. To explore changes in the physicochemical properties of the heat-treated scaffold, we analyzed the degree of deacetylation by colloid titration with poly(vinyl potassium sulfate) and the structural changes were analyzed by scanning electron microscopy, Fourier transform infrared (FT-IR) spectroscopy, wide-angle X-ray diffractometry (WAXD), and lysozyme susceptibility. The degree of deacetylation of chitosan scaffolds decreased significantly from 85 to 30% as the heat treatment time increased. FT-IR spectroscopic and WAXD data indicated the formation of amide bonds between the amino groups of chitosan and acetic acids carbonyl group, and of interchain hydrogen bonding between the carbonyl groups in the C-6 residues of chitosan and the N-acetyl groups. Our rigorous heat treatment method causes the scaffold to become more susceptible to lysozyme treatment. We performed further examinations of the changes in the biocompatibility of the chitosan scaffold after rigorous heat treatment by measuring the initial cell binding capacity and cell growth rate. Human dermal fibroblasts (HDFs) adhere and spread more effectively to the heat-treated chitosan than to the untreated sample. When the cell growth of the HDFs on the film or the scaffold was analyzed by an MTT assay, we found that rigorous heat treatment stimulated cell growth by 1.5∼1.95-fold relative to that of the untreated chitosan. We conclude that the rigorous dry heat treatment process increases the biocompatibility of the chitosan scaffold by decreasing the degree of deacetylation and by increasing cell attachment and growth.

A Performance Analysis Based on Hadoop Application's Characteristics in Cloud Computing (클라우드 컴퓨팅에서 Hadoop 애플리케이션 특성에 따른 성능 분석)

  • Keum, Tae-Hoon;Lee, Won-Joo;Jeon, Chang-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.5
    • /
    • pp.49-56
    • /
    • 2010
  • In this paper, we implement a Hadoop based cluster for cloud computing and evaluate the performance of this cluster based on application characteristics by executing RandomTextWriter, WordCount, and PI applications. A RandomTextWriter creates given amount of random words and stores them in the HDFS(Hadoop Distributed File System). A WordCount reads an input file and determines the frequency of a given word per block unit. PI application induces PI value using the Monte Carlo law. During simulation, we investigate the effect of data block size and the number of replications on the execution time of applications. Through simulation, we have confirmed that the execution time of RandomTextWriter was proportional to the number of replications. However, the execution time of WordCount and PI were not affected by the number of replications. Moreover, the execution time of WordCount was optimum when the block size was 64~256MB. Therefore, these results show that the performance of cloud computing system can be enhanced by using a scheduling scheme that considers application's characteristics.

Lambda Architecture Used Apache Kudu and Impala (Apache Kudu와 Impala를 활용한 Lambda Architecture 설계)

  • Hwang, Yun-Young;Lee, Pil-Won;Shin, Yong-Tae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.207-212
    • /
    • 2020
  • The amount of data has increased significantly due to advances in technology, and various big data processing platforms are emerging, to handle it. Among them, the most widely used platform is Hadoop developed by the Apache Software Foundation, and Hadoop is also used in the IoT field. However, the existing Hadoop-based IoT sensor data collection and analysis environment has a problem of overloading the name node due to HDFS' Small File, which is Hadoop's core project, and it is impossible to update or delete the imported data. This paper uses Apache Kudu and Impala to design Lambda Architecture. The proposed Architecture classifies IoT sensor data into Cold-Data and Hot-Data, stores it in storage according to each personality, and uses Batch-View created through Batch and Real-time View generated through Apache Kudu and Impala to solve problems in the existing Hadoop-based IoT sensor data collection analysis environment and shorten the time users access to the analyzed data.