• Title/Summary/Keyword: parallel library

Search Result 189, Processing Time 0.02 seconds

Design and implementation of the SliM image processor chip (SliM 이미지 프로세서 칩 설계 및 구현)

  • 옹수환;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.10
    • /
    • pp.186-194
    • /
    • 1996
  • The SliM (sliding memory plane) array processor has been proposed to alleviate disadvantages of existing mesh-connected SIMD(single instruction stream- multiple data streams) array processors, such as the inter-PE(processing element) communication overhead, the data I/O overhead and complicated interconnections. This paper presents the deisgn and implementation of SliM image processor ASIC (application specific integrated circuit) chip consisting of mesh connected 5 X 5 PE. The PE architecture implemented here is quite different from the originally proposed PE. We have performed the front-end design, such as VHDL (VHSIC hardware description language)modeling, logic synthesis and simulation, and have doen the back-end design procedure. The SliM ASIC chip used the VTI 0.8$\mu$m standard cell library (v8r4.4) has 55,255 gates and twenty-five 128 X 9 bit SRAM modules. The chip has the 326.71 X 313.24mil$^{2}$ die size and is packed using the 144 pin MQFP. The chip operates perfectly at 25 MHz and gives 625 MIPS. For performance evaluation, we developed parallel algorithms and the performance results showed improvement compared with existing image processors.

  • PDF

A HIGH PERFORMANCE CLUSTER FOR ASTRONOMICAL COMPUTATIONS (천문 계산용 고성능 클러스터 구축)

  • KIM JONGSOO;KIM BONG GYU;YIM IN SUNG;BAEK CHANG HYUN;NAM HYUN WOONG;RYU DONGSU;KANG YOUNG WOON
    • Publications of The Korean Astronomical Society
    • /
    • v.19 no.1
    • /
    • pp.77-81
    • /
    • 2004
  • A high performance computing cluster for astronomical computations has been built at Korea Astronomy Observatory. The 64 node cluster interconnected with Gigabit Ethernet is composed of 128 Intel Xeon processors, 160 GB memory, 6 TB global storage space, and an LTO (Linear Tape-Open) tape library. The cluster was installed and has been managed with the Open Source Cluster Application Resource (OSCAR) framework. Its performance for parallel computations was measured with a three-dimensional hydrodynamic code and showed quite a good scalability as the number of computational cells increases. The cluster has already been utilized for several computational research projects, some of which resulted in a few publications, even though its full operation time is less than one year. As a major resource of the $K^*Grid$ testbed, the cluster has been used for Grid computations, too.

Development of an Embedded Bluetooth Audio Streaming Solution on SoC Platform (SoC 플랫폼 상에서 임베디드 블루투스 오디오 스트리밍 솔루션 개발)

  • Kim, Tae-Hyoun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.7 s.104
    • /
    • pp.589-598
    • /
    • 2006
  • In this paper, we describe the development and optimization of an embedded Biuetooth solution on an SoC platform for real-time audio streaming over a Bluetooth wireless link. The solution includes embedded Bluetooth protocol stack and profile simplemented on a virtual operating system for portability, and other optimization techniques to fully exploit the benefits of multimedia-oriented SoC. The optimization techniques implemented in this paper are memory access minimization by using on-chip scratch pad memory, codec library optimization with DSP and parallel memory access instruction set, and dynamic audio quality adjustment regarding current wireless link status. Experimental results show that the optimized solution presented in this paper can support high-qualify audio streaming without the support of external memory.

Visual Cell OOK Modulation : A Case Study of MIMO CamCom (시각 셀 OOK 변조 : MIMO CamCom 연구 사례)

  • Le, Nam-Tuan;Jang, Yeong Min
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.9
    • /
    • pp.781-786
    • /
    • 2013
  • Multiplexing information over parallel data channels based on RF MIMO concept is possible to achieve considerable data rates over large transmission ranges with just a single transmitting element. Visual multiplexing MIMO techniques will send independent streams of bits using the multiple elements of the light transmitter array and recording over a group of camera pixels can further enhance the data rates. The proposed system is a combination of the reliance on computer vision algorithms for tracking and OOK cell frame modulation. LED array are controlled to transmit message in the form of digital information using ON-OFF signaling with ON-OFF pulses (ON = bit 1, OFF = bit 0). A camera captures image frames of the array which are then individually processed and sequentially decoded to retrieve data. To demodulated data transmission, a motion tracking algorithm is implemented in OpenCV (Open source Computer Vision library) to classify the transmission pattern. One of the most advantages of proposed architecture is Computer Vision (CV) based image analysis techniques which can be used to spatially separate signals and remove interferences from ambient light. It will be the future challenges and opportunities for mobile communication networking research.

A Study of High Performance WebKit Mobile Web Browser (WebKit 모바일 웹 브라우저의 성능 향상을 위한 기법 연구)

  • Kim, Cheong-Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.7 no.1
    • /
    • pp.48-52
    • /
    • 2012
  • As the growing popularity of smartphones, mobile web browsing has become one of the most important and popular applications in mobile devices. Furthermore, it is clear that the demand for PC-like full browser performance on mobile devices is increasing greatly. WebKit is an open source web browser engine adopted by Google Android. This paper proposed a technique of increasing the performance of WebKit by paralleling its libraries. This method was applied to JPEG library and the performance evaluation was conducted in PC environment. The results was used to estimate the performance prediction on multi-core mobile embedded architecture and to show the feasibility of the proposed method to estimate the performance gain on heterogeneous multi-core embedded architecture.

Implementation and Performance Analysis of Hadoop MapReduce over Lustre Filesystem (러스터 파일 시스템 기반 하둡 맵리듀스 실행 환경 구현 및 성능 분석)

  • Kwak, Jae-Hyuck;Kim, Sangwan;Huh, Taesang;Hwang, Soonwook
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.8
    • /
    • pp.561-566
    • /
    • 2015
  • Hadoop is becoming widely adopted in scientific and commercial areas as an open-source distributed data processing framework. Recently, for real-time processing and analysis of data, an attempt to apply high-performance computing technologies to Hadoop is being made. In this paper, we have expanded the Hadoop Filesystem library to support Lustre, which is a popular high-performance parallel distributed filesystem, and implemented the Hadoop MapReduce execution environment over the Lustre filesystem. We analysed Hadoop MapReduce over Lustre by using Hadoop standard benchmark tools. We found that Hadoop MapReduce over Lustre execution has a performance 2-13 times better than a typical Hadoop MapReduce execution.

Application of Clinical Laboratory Tests in Musculoskeletal Diseases (근골격계 질환에서 진단의학검사의 활용)

  • Ha, Won-Bae;Geum, Ji-Hye;Shin, Seon-Ho;Lee, Jung-Han
    • The Journal of Churna Manual Medicine for Spine and Nerves
    • /
    • v.13 no.2
    • /
    • pp.109-125
    • /
    • 2018
  • Objectives : It is difficult to make accurate diagnosis of musculoskeletal disease because of its multiple, subjective and non-specific symptoms. It is possible to reduce errors of differential diagnosis through detailed history taking and physical examination in parallel with laboratory tests based on clinical decision. Methods : Korean and foreign on-line databases(Pubmed, Cochran Library, NDSL, KISS and OASIS) were researched for articles discussing laboratory tests in musculoskeletal diseases. Results : Laboratory tests could be applied usefully for various musculoskeletal diseases, In this review, available laboratory components in these musculoskeletal diseases are summarized, and then significance and usefulness of disease-specific laboratory examination are described. Conclusions : When examining musculoskeletal patients, it needs to accurate differential diagnosis by full interview and physical examination, to select required tests by understanding laboratory tests thoroughly, and to judge the prognosis precisely.

Design of Partial Product Accumulator using Multi-Operand Decimal CSA and Improved Decimal CLA (다중 피연산자 십진 CSA와 개선된 십진 CLA를 이용한 부분곱 누산기 설계)

  • Lee, Yang;Park, TaeShin;Kim, Kanghee;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.56-65
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

Implementation of Hardware Data Prefetcher Adaptable for Various State-of-the-Art Workload (다양한 최신 워크로드에 적용 가능한 하드웨어 데이터 프리페처 구현)

  • Kim, KangHee;Park, TaeShin;Song, KyungHwan;Yoon, DongSung;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.20-35
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

High Noise Margin LVDS I/O Circuits for Highly Parallel I/O Environments (다수의 병렬 입.출력 환경을 위한 높은 노이즈 마진을 갖는 LVDS I/O 회로)

  • Kim, Dong-Gu;Kim, Sam-Dong;Hwang, In-Seok
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.44 no.1
    • /
    • pp.85-93
    • /
    • 2007
  • This paper presents new LVDS I/O circuits with a high noise margin for use in highly parallel I/O environments. The proposed LVDS I/O includes transmitter and receiver parts. The transmitter circuits consist of a differential phase splitter and a output stage with common mode feedback(CMFB). The differential phase splitter generates a pair of differential signals which have a balanced duty cycle and $180^{\circ}$ phase difference over a wide supply voltage variation due to SSO(simultaneous switching output) noises. The CMFB output stage produces the required constant output current and maintains the required VCM(common mode voltage) within ${\pm}$0.1V tolerance without external circuits in a SSO environment. The proposed receiver circuits in this paper utilizes a three-stage structure(single-ended differential amp., common source amp., output stage) to accurately receive high-speed signals. The receiver part employs a very wide common mode input range differential amplifier(VCDA). As a result, the receiver improves the immunities for the common mode noise and for the supply voltage difference, represented by Vgdp, between the transmitter and receiver sides. Also, the receiver produces a rail-to-rail, full swing output voltage with a balanced duty cycle(50% ${\pm}$ 3%) without external circuits in a SSO environment, which enables correct data recovery. The proposed LVDS I/O circuits have been designed and simulated with 0.18um TSMC library using H-SPICE.