• Title/Summary/Keyword: 비병렬 데이터

Search Result 303, Processing Time 0.031 seconds

VLSI Architecture Design of Reconstruction Filter for Morphological Image Segmentation (형태학적 영상 분할을 위한 재구성 필터의 VLSI 구조 설계)

  • Lee, Sang-Yeol;Chung, Eui-Yoon;Lee, Ho-Young;Kim, Hee-Soo;Ha, Yeong-Ho
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.12
    • /
    • pp.41-50
    • /
    • 1999
  • In this paper, the new VLSI architecture of a reconstruction filter for morphological image segmentation is proposed. The filter, based on the $h_{max}$ operation, simplifies the interior of each region while preserving the boundary information. The proposed architecture adopts a partitioned memory structure and an efficient image scanning strategy to reduce the operations. The proposed memory partitioning scheme makes it possible that every data required for processing can be read from each memory at a time, resulting in parallel data processing. By the extended connectivity consideration, the operation is much decreased because more simplification is achieved in scanning stage. The selective raster scan strategy endows the satisfactory noise removal capability with negligible hardware complexity increase. The proposed architecture is designed using VHDL, and functional evaluation is performed by the CAD tool, Mentor. The experiment results show that the proposed architecture can simplify image profile with less than 18% operations of the conventional method.

  • PDF

A Design of AES-based CCMP Core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP Core 설계)

  • Hwang Seok-Ki;Lee Jin-Woo;Kim Chay-Hyeun;Song You-Soo;Shin Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.4
    • /
    • pp.798-803
    • /
    • 2005
  • This paper describes a design of AES(Advanced Encryption Standard)-based CCMP core for IEEE 802.1li wireless LAN security. To maximize its performance, two AES cores ate used, one is for counter mode for data confidentiality and the other is for CBC(Cipher Block Chaining) mode for authentication and data integrity. The S-box that requires the largest hardware in AES core is implemented using composite field arithmetic, and the gate count is reduced by about $20\%$ compared with conventional LUT(Lookup Table)-based design. The CCMP core designed in Verilog-HDL has 13,360 gates, and the estimated throughput is about 168 Mbps at 54-MHz clock frequency. The functionality of the CCMP core is verified by Excalibur SoC implementation.

The Design of Hardware MPI Units for MPSoC (MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.86-92
    • /
    • 2011
  • In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip $0.18{\mu}m$. And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).

A Design of AES-based CCMP core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP 코어 설계)

  • Hwang Seok-Ki;Kim Jong-Whan;Shin Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.6A
    • /
    • pp.640-647
    • /
    • 2006
  • This paper describes a design of AES-based CCMP(Counter mode with CBC-MAC Protocol) core for IEEE 802.11i wireless LAN security. To maximize the performance of CCMP core, two AES cores are used, one is the counter mode for data confidentiality and the other is the CBC node for authentication and data integrity. The S-box that requires the largest hardware in ARS core is implemented using composite field arithmetic, and the gate count is reduced by about 27% compared with conventional LUT(Lookup Table)-based design. The CCMP core was verified using Excalibur SoC kit, and a MPW chip is fabricated using a 0.35-um CMOS standard cell technology. The test results show that all the function of the fabricated chip works correctly. The CCMP processor has 17,000 gates, and the estimated throughput is about 353-Mbps at 116-MHz@3.3V, satisfying 54-Mbps data rate of the IEEE 802.11a and 802.11g specifications.

Design of an Efficient VLSI Architecture of SADCT Based on Systolic Array (시스톨릭 어레이에 기반한 SADCT의 효율적 VLSl 구조설계)

  • Gang, Tae-Jun;Jeong, Ui-Yun;Gwon, Sun-Gyu;Ha, Yeong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.3
    • /
    • pp.282-291
    • /
    • 2001
  • In this paper, an efficient VLSI architecture of Shape Adaptive Discrete Cosine Transform(SADCT) based on systolic array is proposed. Since transform size in SADCT is varied according to the shape of object in each block, it are dropped that both usability of processing elements(PE´s) and throughput rate in time-recursive SADCT structure. To overcome these disadvantages, it is proposed that the architecture based on a systolic way structure which doesn´t need memory. In the proposed architecture, throughput rate is improved by consecutive processing of one-dimensional SADCT without memory and PE´s in the first column are connected to that in the last one for improvement of usability of PE. And input data are put into each column of PE in parallel according to the maximum data number in each rearranged block. The proposed architecture is described by VHDL. Also, its function is evaluated by MentorTM. Even though the hardware complexity is somewhat increased, the throughput rate is improved about twofold.

  • PDF

A Design of AES-based CCMP Core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP Core 설계)

  • Hwang, Seok-Ki;Lee, Jin-Woo;Kim, Chay-Hyeun;Song, You-Soo;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.367-370
    • /
    • 2005
  • This paper describes a design of AES(Advanced Encryption Standard)-based CCMP core for IEEE 802.11i wireless LAN security. To maximize its performance, two AES cores are used, one is for counter mode for data confidentiality and the other is for CBC(Cipher Block Chaining)mode for authentication and data integrity. The S-box that requires the largest hardware in AES core is implemented using composite field arithmetic, and the gate count is reduced by about 25% compared with conventional LUT(Lookup Table)-based design. The CCMP core designed in Verilog-HDL has 15,450 gates, and the estimated throughput is about 128 Mbps at 50-MHz clock frequency). The functionality of the CCMP core is verified by Excalibur SoC implementation.

  • PDF

Distributed AI Learning-based Proof-of-Work Consensus Algorithm (분산 인공지능 학습 기반 작업증명 합의알고리즘)

  • Won-Boo Chae;Jong-Sou Park
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.1-14
    • /
    • 2022
  • The proof-of-work consensus algorithm used by most blockchains is causing a massive waste of computing resources in the form of mining. A useful proof-of-work consensus algorithm has been studied to reduce the waste of computing resources in proof-of-work, but there are still resource waste and mining centralization problems when creating blocks. In this paper, the problem of resource waste in block generation was solved by replacing the relatively inefficient computation process for block generation with distributed artificial intelligence model learning. In addition, by providing fair rewards to nodes participating in the learning process, nodes with weak computing power were motivated to participate, and performance similar to the existing centralized AI learning method was maintained. To show the validity of the proposed methodology, we implemented a blockchain network capable of distributed AI learning and experimented with reward distribution through resource verification, and compared the results of the existing centralized learning method and the blockchain distributed AI learning method. In addition, as a future study, the thesis was concluded by suggesting problems and development directions that may occur when expanding the blockchain main network and artificial intelligence model.

Declustering of High-dimensional Data by Cyclic Sliced Partitioning (주기적 편중 분할에 의한 다차원 데이터 디클러스터링)

  • Kim Hak-Cheol;Kim Tae-Wan;Li Ki-Joune
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.596-608
    • /
    • 2004
  • A lot of work has been done to reduce disk access time in I/O intensive systems, which store and handle massive amount of data, by distributing data across multiple disks and accessing them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell to a disk number on the assumption that data space is regular grid-like partitioned. Although we can achieve good performance for low-dimensional data by grid-like partitioning, its performance becomes degenerate as grows the dimension of data even with a good disk allocation scheme. This comes from the fact that they partition entire data space equally regardless of distribution ratio of data objects. Most of the data in high-dimensional space exist around the surface of space. For that reason, we propose a new declustering algorithm based on the partitioning scheme which partition data space from the surface. With an unbalanced partitioning scheme, several experimental results show that we can remarkably reduce the number of data blocks touched by a query as grows the dimension of data and a query size. In this paper, we propose disk allocation schemes based on the layout of the resultant data blocks after partitioning. To show the performance of the proposed algorithm, we have performed several experiments with different dimensional data and for a wide range of number of disks. Our proposed disk allocation method gives a performance within 10 additive disk accesses compared with strictly optimal allocation scheme. We compared our algorithm with Kronecker sequence based declustering algorithm, which is reported to be the best among the grid partition and mapping function based declustering algorithms. We can improve declustering performance up to 14 times as grows dimension of data.

A Benchmark of AI Application based on Open Source for Data Mining Environmental Variables in Smart Farm (스마트 시설환경 환경변수 분석을 위한 Open source 기반 인공지능 활용법 분석)

  • Min, Jae-Ki;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.159-159
    • /
    • 2017
  • 스마트 시설환경은 대표적으로 원예, 축산 분야 등 여러 형태의 농업현장에 정보 통신 및 데이터 분석 기술을 도입하고 있는 시설화된 생산 환경이라 할 수 있다. 근래에 하드웨어적으로 급증한 스마트 시설환경에서 생산되는 방대한 생육/환경 데이터를 올바르고 적합하게 사용하기 위해서는 일반 산업 현장과는 차별화 된 분석기법이 요구된다고 할 수 있다. 소프트웨어 공학 분야에서 연구된 빅데이터 처리 기술을 기계적으로 농업 분야의 빅데이터에 적용하기에는 한계가 있을 수 있다. 시설환경 내/외부의 다양한 환경 변수는 시계열 데이터의 난해성, 비가역성, 불특정성, 비정형 패턴 등에 기인하여 예측 모델 연구가 매우 난해한 대상이기 때문이라 할 수 있다. 본 연구에서는 근래에 관심이 급증하고 있는 인공신경망 연구 소프트웨어인 Tensorflow (www.tensorflow.org)와 대표적인 Open source인 OpenNN (www.openn.net)을 스마트 시설환경 환경변수 상호간 상관성 분석에 응용하였다. 해당 소프트웨어 라이브러리의 운영환경을 살펴보면 Tensorflow 는 Linux(Ubuntu 16.04.4), Max OS X(EL capitan 10.11), Windows (x86 compatible)에서 활용가능하고, OpenNN은 별도의 운영환경에 대한 바이너리를 제공하지 않고 소스코드 전체를 제공하므로, 해당 운영환경에서 바이너리 컴파일 후 활용이 가능하다. 소프트웨어 개발 언어의 경우 Tensorflow는 python이 기본 언어이며 python(v2.7 or v3.N) 가상 환경 내에서 개발이 수행이 된다. 주의 깊게 살펴볼 부분은 이러한 개발 환경의 제약으로 인하여 Tensorflow의 주요한 장점 중에 하나인 고속 연산 기능 수행이 일부 운영 환경에 국한이 되어 제공이 된다는 점이다. GPU(Graphics Processing Unit)의 제공하는 하드웨어 가속기능은 Linux 운영체제에서 활용이 가능하다. 가상 개발 환경에 운영되는 한계로 인하여 실시간 정보 처리에는 한계가 따르므로 이에 대한 고려가 필요하다. 한편 근래(2017.03)에 공개된 Tensorflow API r1.0의 경우 python, C++, Java언어와 함께 Go라는 언어를 새로 지원하여 개발자의 활용 범위를 매우 높였다. OpenNN의 경우 C++ 언어를 기본으로 제공하며 C++ 컴파일러를 지원하는 임의의 개발 환경에서 모두 활용이 가능하다. 특징은 클러스터링 플랫폼과 연동을 통해 하드웨어 가속 기능의 부재를 일부 극복했다는 점이다. 상기 두 가지 패키지를 이용하여 2016년 2월부터 5월 까지 충북 음성군 소재 딸기 온실 내부에서 취득한 온도, 습도, 조도, CO2에 대하여 Large-scale linear model을 실험적(시간단위, 일단위, 주단위 분할)으로 적용하고, 인접한 세그먼트의 환경변수 예측 모델링을 수행하였다. 동일한 조건의 학습을 수행함에 있어, Tensorflow가 개발 소요 시간과 학습 실행 속도 측면에서 매우 우세하였다. OpenNN을 이용하여 대등한 성능을 보이기 위해선 병렬 클러스터링 기술을 활용해야 할 것이다. 오프라인 일괄(Offline batch)처리 방식의 한계가 있는 인공신경망 모델링 기법과 현장 보급이 불가능한 고성능 하드웨어 연산 장치에 대한 대안 마련을 위한 연구가 필요하다.

  • PDF

Design of Bit Manipulation Accelerator fo Communication DSP (통신용 DSP를 위한 비트 조작 연산 가속기의 설계)

  • Jeong Sug H.;Sunwoo Myung H.
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.8 s.338
    • /
    • pp.11-16
    • /
    • 2005
  • This paper proposes a bit manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform bit manipulation functions since かey have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can efficiently process bit manipulation functions using parallel shift and Exclusive-OR (XOR) operations and bit jnsertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and synthesized using the SEC $0.18\mu m$ standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about $40\%\sim80\%$ for scrambling, convolutional encoding and interleaving compared with existing DSPs.