• Title/Summary/Keyword: Pipeline implementation

Search Result 184, Processing Time 0.023 seconds

Montgomery Multiplier Supporting Dual-Field Modular Multiplication (듀얼 필드 모듈러 곱셈을 지원하는 몽고메리 곱셈기)

  • Kim, Dong-Seong;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.6
    • /
    • pp.736-743
    • /
    • 2020
  • Modular multiplication is one of the most important arithmetic operations in public-key cryptography such as elliptic curve cryptography (ECC) and RSA, and the performance of modular multiplier is a key factor influencing the performance of public-key cryptographic hardware. An efficient hardware implementation of word-based Montgomery modular multiplication algorithm is described in this paper. Our modular multiplier was designed to support eleven field sizes for prime field GF(p) and binary field GF(2k) as defined by SEC2 standard for ECC, making it suitable for lightweight hardware implementations of ECC processors. The proposed architecture employs pipeline scheme between the partial product generation and addition operation and the modular reduction operation to reduce the clock cycles required to compute modular multiplication by 50%. The hardware operation of our modular multiplier was demonstrated by FPGA verification. When synthesized with a 65-nm CMOS cell library, it was realized with 33,635 gate equivalents, and the maximum operating clock frequency was estimated at 147 MHz.

3D Volumetric Capture-based Dynamic Face Production for Hyper-Realistic Metahuman (극사실적 메타휴먼을 위한 3D 볼류메트릭 캡쳐 기반의 동적 페이스 제작)

  • Oh, Moon-Seok;Han, Gyu-Hoon;Seo, Young-Ho
    • Journal of Broadcast Engineering
    • /
    • v.27 no.5
    • /
    • pp.751-761
    • /
    • 2022
  • With the development of digital graphics technology, the metaverse has become a significant trend in the content market. The demand for technology that generates high-quality 3D (dimension) models is rapidly increasing. Accordingly, various technical attempts are being made to create high-quality 3D virtual humans represented by digital humans. 3D volumetric capture is spotlighted as a technology that can create a 3D manikin faster and more precisely than the existing 3D model creation method. In this study, we try to analyze 3D high-precision facial production technology based on practical cases of the difficulties in content production and technologies applied in volumetric 3D and 4D model creation. Based on the actual model implementation case through 3D volumetric capture, we considered techniques for 3D virtual human face production and producted a new metahuman using a graphics pipeline for an efficient human facial generation.

Design of a Delayed Dual-Core Lock-Step Processor with Automatic Recovery in Soft Errors (소프트 에러 발생 시 자동 복구하는 이중 코어 지연 락스텝 프로세서의 설계)

  • Juho Kim;Seonghyun Yang;Seongsoo Lee
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.683-686
    • /
    • 2023
  • In this paper, we designed a Delayed Dual Core Lock-Step (D-DCLS) processor where two cores operate same instructions with delay and the result is compared to mitigate soft errors and common mode failures in automotive electronic systems. Because D-DCLS does not know which core an error occurred in, each core must be recovered to the point before the error occurred, but complex hardware modifications are required to return all intermediate values on the pipeline stage. In this paper, in order for easy hardware implementation, all register values are saved to a buffer whenever a branch instruction is executed. When an error is detected, the saved register values are automatically restored, and then 'BX LR' instruction is executed to return to the last branch point. The proposed D-DCLS processor was designed using Verilog HDL and was confirmed to continue normal operation after automatically recovering error.

Development of Product Recommendation System Using MultiSAGE Model and ESG Indicators (MultiSAGE 모델과 ESG 지표를 적용한 상품 추천 시스템 개발)

  • Hyeon-woo Kim;Yong-jun Kim;Gil-sang Yoo
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.69-78
    • /
    • 2024
  • Recently, consumers have shown an increasing tendency to seek information related to environmental, social, and governance (ESG) aspects in order to choose products with higher social value and environmental friendliness. In this paper, we proposes a product recommendation system applying ESG indicators tailored to the recent consumer trend of value-based consumption, utilizing a model called MultiSAGE that combines GraphSAGE and GAT. To achieve this, ESG rating data for 1,033 companies in 2022 collected from the Korea ESG Standard Institute and actual product data from N companies were transformed into a Heterogeneous Graph format through a data processing pipeline. The MultiSAGE model was then applied in machine learning to implement a recommendation system that, given a specific product, suggests eco-friendly alternatives. The implementation results indicate that consumers can easily compare and purchase products with ESG indicators applied, and it is anticipated that this system will be utilized in recommending products with social value and environmental friendliness.

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Eun-Sil Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.25-37
    • /
    • 2024
  • This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company's websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.

Parallel Distributed Implementation of GHT on Ethernet Multicluster (이더넷 다중 클러스터에서 GHT의 병렬 분산 구현)

  • Kim, Yeong-Soo;Kim, Myung-Ho;Choi, Heung-Moon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.3
    • /
    • pp.96-106
    • /
    • 2009
  • Extending the scale of the distributed processing in a single Ethernet cluster is physically restricted by maximum ports per switch. This paper presents an implementation of MPI-based multicluster consisting of multiple Ethernet switches for extending the scale of distributed processing, and a asymptotical analysis for communication overhead through execution-time analysis model. To determine an optimum task partitioning, we analyzed the processing time for various partitioning schemes, and AAP(accumulator array partitioning) scheme was finally chosen to minimize the overall communication overhead. The scope of data partitioned in AAP was modified to fit for incremented nodes, and suitable load balancing algorithm was implemented. We tried to alleviate the communication overhead through exploiting the pipelined broadcast and flat-tree based result gathering, and overlapping of the communication and the computation time. We used the linear pipeline broadcast to reduce the communication overhead in intercluster which is interconnected by a single link. Experimental results shows nearly linear speedup by the proposed parallel distributed GHT implemented on MPI-based Ethernet multicluster with four 100Mbps Ethernet switches and up to 128 nodes of Pentium PC.

Design of a Bit-Level Super-Systolic Array (비트 수준 슈퍼 시스톨릭 어레이의 설계)

  • Lee Jae-Jin;Song Gi-Yong
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.12
    • /
    • pp.45-52
    • /
    • 2005
  • A systolic array formed by interconnecting a set of identical data-processing cells in a uniform manner is a combination of an algorithm and a circuit that implements it, and is closely related conceptually to arithmetic pipeline. High-performance computation on a large array of cells has been an important feature of systolic array. To achieve even higher degree of concurrency, it is desirable to make cells of systolic array themselves systolic array as well. The structure of systolic array with its cells consisting of another systolic array is to be called super-systolic array. This paper proposes a scalable bit-level super-systolic amy which can be adopted in the VLSI design including regular interconnection and functional primitives that are typical for a systolic architecture. This architecture is focused on highly regular computational structures that avoids the need for a large number of global interconnection required in general VLSI implementation. A bit-level super-systolic FIR filter is selected as an example of bit-level super-systolic array. The derived bit-level super-systolic FIR filter has been modeled and simulated in RT level using VHDL, then synthesized using Synopsys Design Compiler based on Hynix $0.35{\mu}m$ cell library. Compared conventional word-level systolic array, the newly proposed bit-level super-systolic arrays are efficient when it comes to area and throughput.

An FPGA Implementation of the Synthesis Filter for MPEG-1 Audio Layer III by a Distributed Arithmetic Lookup Table (분산산술연산방식을 이용한 MPEG-1 오디오 계층 3 합성필터의 FPGA 군현)

  • Koh Sung-Shik;Choi Hyun-Yong;Kim Jong-Bin;Ku Dae-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.554-561
    • /
    • 2004
  • As the technologies of semiconductor and multimedia communication have been improved. the high-quality video and the multi-channel audio have been highlighted. MPEG Audio Layer 3 decoder has been implemented as a Processor using a standard. Since the synthesis filter of MPEG-1 Audio Layer 3 decoder requires the most outstanding operation in the entire decoder. the synthesis filter that can reduce the amount of operation is needed for the design of the high-speed processor. Therefore, in this paper, the synthesis filter. the most important part of MPEG Audio, is materialized in FPGA using the method of DAULT (distributed arithemetic look-up table). For the design of high-speed synthesis filter, the DAULT method is used instead of a multiplier and a Pipeline structure is used. The Performance improvement by 30% is obtained by additionally making the result of multiplication of data with cosine function into the table. All hardware design of this Paper are described using VHDL (VHIC Hardware Description Language) Active-HDL 6.1 of ALDEC is used for VHDL simulation and Synplify Pro 7.2V is used for Model-sim and synthesis. The corresponding library is materialized by XC4013E and XC4020EX. XC4052XL of XILINX and XACT M1.4 is used for P&R tool. The materialized processor operates from 20MHz to 70MHz.

An Implementation of Low Power MAC using Improvement of Multiply/Subtract Operation Method and PTL Circuit Design Methodology (승/감산 연산방법의 개선 및 PTL회로설계 기법을 이용한 저전력 MAC의 구현)

  • Sim, Gi-Hak;O, Ik-Gyun;Hong, Sang-Min;Yu, Beom-Seon;Lee, Gi-Yeong;Jo, Tae-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.4
    • /
    • pp.60-70
    • /
    • 2000
  • An 8$\times$8+20-bit MAC is designed with low power design methodologies at each of the system design levels. At algorithm level, a new method for multipl $y_tract operation is proposed, and it saves the transistor counts over conventional methods in hardware realization. A new Booth selector circuit using NMOS pass-transistor logic is also proposed at circuit level. It is superior to other circuits designed by CMOS in power-delay-product. And at architecture level, we adopted an ELM adder that is known to be the most efficient in power consumption, operating frequency, area and design regularity as the final adder. For registers, dynamic CMOS single-edge triggered flip-flops are used because they need less transistors per bit. To increase the operating frequency 2-stage pipeline architecture is adopted, and fast 4:2 compressors are applied in Wallace tree block. As a simulation result, the designed MAC in 0.6${\mu}{\textrm}{m}$ 1-poly 3-metal CMOS process is operated at 200MHz, 3.3V and consumed 35㎽ of power in multiply operation, and operated at 100MHz consuming 29㎽ in MAC operations, respectively.ly.

  • PDF

Implementation of Random Controlling of Convergence Point in VR Image Content Production (VR 영상콘텐츠 제작을 위한 컨버전스 포인트 임의조절 구현)

  • Jin, Hyung Woo;Baek, Gwang Ho;Kim, Mijin
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.111-119
    • /
    • 2015
  • As a variety of HMD(Head Mounted Display) has come out, the production of 3D images onto which VR(Virtual Reality) technologies are grafted has been contributed to activating the production of image contents depending on a tangible or immersing type. VR-based image contents have enlarged their applicability across the entertainment industry from animation and game to realistic images. At the same time, the solution development for producing VR image contents has also gained elasticity. However, among those production solutions which have been used until now, fixed stereo camera based photographing has a limit that the binocular disparity of a user is fixed. This does not only restrict a way of expression a producer intends to direct, but also may cause the effect of 3D or space not to be sensed enough as view condition is not considered enough in a user's side. This study is aimed at resolving with skills applying in the latter part of 3D image production the problem that convergence points may be adjusted with restriction, which tends to happen at the time of the production of VR image contents. The later stage of the 3D imaging work analyzes and applies to game engines the significance of adjusting convergence points through the visualization of binocular disparity so that it is available to implement a function that the points could be controlled at random by a user.