• Title/Summary/Keyword: 8-bit parallel processing

Search Result 45, Processing Time 0.018 seconds

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.

Highly Efficient and Low Power FIR Filter Chip for PRML Read Channel (PRML Read Channel용 고효율, 저전력 FIR 필터 칩)

  • Jin Yong, Kang;Byung Gak, Jo;Myung Hoon, Sunwoo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.9
    • /
    • pp.115-124
    • /
    • 2004
  • This paper proposes a high efficient and low power FIR filter chip for partial-response maximum likelihood (PRML) disk drive read channels; it is a 6-bit, 8-tap digital FIR filter. The proposed filter employs a parallel processing architecture and consists of 4 pipeline stages. It uses the modified Booth algorithm for multiplication and compressor logic for addition. CMOS pass-transistor logic is used for low power consumption and single-rail logic is used to reduce the chip area. The proposed filter is actually implemented and the chip dissipates 120mV at 100MHz, uses a 3.3V power supply and occupies 1.88 ${\times}$ 1.38 $\textrm{mm}^2$. The implemented filter requires approximately 11.7% less power compared with the existing architectures that use the similar technology.

A Scalable Montgomery Modular Multiplier (확장 가능형 몽고메리 모듈러 곱셈기)

  • Choi, Jun-Baek;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.625-633
    • /
    • 2021
  • This paper describes a scalable architecture for flexible hardware implementation of Montgomery modular multiplication. Our scalable modular multiplier architecture, which is based on a one-dimensional array of processing elements (PEs), performs word parallel operation and allows us to adjust computational performance and hardware complexity depending on the number of PEs used, NPE. Based on the proposed architecture, we designed a scalable Montgomery modular multiplier (sMM) core supporting eight field sizes defined in SEC2. Synthesized with 180-nm CMOS cell library, our sMM core was implemented with 38,317 gate equivalents (GEs) and 139,390 GEs for NPE=1 and NPE=8, respectively. When operating with a 100 MHz clock, it was evaluated that 256-bit modular multiplications of 0.57 million times/sec for NPE=1 and 3.5 million times/sec for NPE=8 can be computed. Our sMM core has the advantage of enabling an optimized implementation by determining the number of PEs to be used in consideration of computational performance and hardware resources required in application fields, and it can be used as an IP (intellectual property) in scalable hardware design of elliptic curve cryptography (ECC).

Shallow Marine Seismic Refraction Data Acquisition and Interpretation Using digital Technique (디지털 技法을 이용한 淺海底 屈折法 彈性波 探査資料의 取得과 解析)

  • 이호영;김철민
    • 한국해양학회지
    • /
    • v.27 no.1
    • /
    • pp.19-34
    • /
    • 1992
  • Marine seismic refraction surveys have been carried out by Korea Institute of Geology, Mining and Materials(KIGAM) since 1984. The recording of refraction data was based on analog instrumentation. Therefore the resolution of refraction data was not good enough to distinguish many layers. The objective of the interpretation of seismic refraction data is the determination of intervals and critically refracted seismic wave propagation velocities through the layers beneath the sea floor. To determine intervals and velocities precisely, the resolution of refraction data should be enhanced. The intent of the study is to improve the quality of shallow marine refraction data by the digital technique using microcomputer- based acquisition and processing system. The system consists of an IBM AT microcomputer clone, an analog-digital(A/D) converter. A mass storage unit and a parallel processing board. The A/D converter has 12 bits of precision and 250 kHz of conversion rate. The magneto-optical disk drive is used for the mass storage of seismic refraction data. Shallow marine seismic refraction surveys have been carried out using the system at 6 locations off Ulsan and Pusan area. The refraction data were acquired by the radio sonobuoy. The refraction profiles have been produced by the laser printer with 300 dpi resolution after the basic computer processing. 5-9 layers were interpreted from digital refraction profiles, whereas 2-4 layers were interpreted from analog refraction profiles. the propagation velocities of sediments were interpreted as 1.6-2.1 km/sec. The propagation velocities of acoustic basement were interpreted as 2.4-2.7 km/sec off Ulsan area, 4.8 km/sec off Pusan area.

  • PDF

A Frequency Domain DV-to-MPEG-2 Transcoding (DV에서 MPEG-2로의 주파수 영역 변환 부호화)

  • Kim, Do-Nyeon;Yun, Beom-Sik;Choe, Yun-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.2
    • /
    • pp.138-148
    • /
    • 2001
  • Digital Video (DV) coding standards for digital video cassette recorder are based mainly on DCT and variable length coding. DV has low hardware complexity but high compressed bit rate of about 26 Mb/s. Thus, it is necessary to encode video with low complex video coding at the studios and then transcode compressed video into MPEG-2 for video-on-demand system. Because these coding methods exploit DCT, transcoding in the DCT domain can reduce computational complexity by excluding duplicated procedures. In transcoding DV into MPEC-2 intra coding, multiplying matrix by transformed data is used for 4:1:1-to-4:2:2 chroma format conversion and the conversion from 2-4-8 to 8-8 DCT mode, and therefore enables parallel processing. Variance of sub block for MPEG-2 rate control is computed completely in the DCT domain. These are verified through experiments. We estimate motion hierarchically using DCT coefficients for transcoding into MPEG-2 inter coding. First, we estimate motion of a macro block (MB) only with 4 DC values of 4 sub blocks and then estimate motion with 16-point MB using IDCT of 2$\times$2 low frequencies in each sub block, and finish estimation at a sub pixel as the fifth step. ME with overlapped search range shows better PSNR performance than ME without overlapping.

  • PDF