• Title/Summary/Keyword: Matrix Multiplication

Search Result 167, Processing Time 0.02 seconds

Efficient Implementation of Finite Field Operations in NIST PQC Rainbow (NIST PQC Rainbow의 효율적 유한체 연산 구현)

  • Kim, Gwang-Sik;Kim, Young-Sik
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.527-532
    • /
    • 2021
  • In this paper, we propose an efficient finite field computation method for Rainbow algorithm, which is the only multivariate quadratic-equation based digital signature among the current US NIST PQC standardization Final List algorithms. Recently, Chou et al. proposed a new efficient implementation method for Rainbow on the Cortex-M4 environment. This paper proposes a new multiplication method over the finite field that can reduce the number of XOR operations by more than 13.7% compared to the Chou et al. method. In addition, a multiplicative inversion over that can be performed by a 4x4 matrix inverse instead of the table lookup method is presented. In addition, the performance is measured by porting the software to which the new method was applied onto RaspberryPI 3B+.

A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations (유한요소 비압축성 유동장 해석을 위한 이중공액구배법의 GPU 기반 연산에 대한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Jung, Hye Dong;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.40 no.9
    • /
    • pp.597-604
    • /
    • 2016
  • A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.

High-Risk Area for Human Infection with Avian Influenza Based on Novel Risk Assessment Matrix (위험 매트릭스(Risk Matrix)를 활용한 조류인플루엔자 인체감염증 위험지역 평가)

  • Sung-dae Park;Dae-sung Yoo
    • Korean Journal of Poultry Science
    • /
    • v.50 no.1
    • /
    • pp.41-50
    • /
    • 2023
  • Over the last decade, avian influenza (AI) has been considered an emerging disease that would become the next pandemic, particularly in countries like South Korea, with continuous animal outbreaks. In this situation, risk assessment is highly needed to prevent and prepare for human infection with AI. Thus, we developed the risk assessment matrix for a high-risk area of human infection with AI in South Korea based on the notion that risk is the multiplication of hazards with vulnerability. This matrix consisted of highly pathogenic avian influenza (HPAI) in poultry farms and the number of poultry-associated production facilities assumed as hazards of avian influenza and vulnerability, respectively. The average number of HPAI in poultry farms at the 229-municipal level as the hazard axis of the matrix was predicted using a negative binomial regression with nationwide outbreaks data from 2003 to 2018. The two components of the matrix were classified into five groups using the K-means clustering algorithm and multiplied, consequently producing the area-specific risk level of human infection. As a result, Naju-si, Jeongeup-si, and Namwon-si were categorized as high-risk areas for human infection with AI. These findings would contribute to designing the policies for human infection to minimize socio-economic damages.

Multiplication of Infectious Flacherie and Densonucleosis Viruses in the Silkworm, Bombyx mori (가잠의 전염성 연화병 및 농핵병 바이러스 증식에 관한 연구)

  • 김근영;강석권
    • Journal of Sericultural and Entomological Science
    • /
    • v.25 no.2
    • /
    • pp.1-31
    • /
    • 1984
  • Flacherie, as one of the most prevalent silkworm diseases, causes severe economic damage to sericultural industry and its pathogens have been proved to be flacherie virus (FV) and densonucleosis virus (DNV). Multiplications of the viruses in the larvae of the silkworm, Bombyx mori, were studied by the sucrose density gradient centrifugation and electron microscopy. The quantitative and qualitative changes of nucleic acids and proteins were investigated from the midgut and hemolymph in the silkworm larvae infected separately with FV and DNV. The histopathological changes of epithelial cells of infected midgut also were examined by an electron microscope. 1. Purified fractions of FV or DNV in a sucrose density gradient centrifugation yielded one homogenous and sharp peak without a shoulder, suggesting no heterogenous materials in the preparation. Electron microscopy also revealed that FV and DNV were spherical particles, 27nm and 21nm in diameter, respectively. 2. Silkworm larvae showed a decrease in body weight on the 6th day and in midgut weight on the 3rd day after inoculation with FV or DNV. 3. DNA content was higher in the midgut when infected with FV or DNV, but the hemolymph of the infected larvae showed no difference during first 6 days after inoculation, after which DNA concentration declined rapidly. 4. RNA synthesis of silkworm larvae infected separately with FV and DNV was stimulated in the midgut, but RNA content was reduced in the hemolymph at the early stage of virus multiplication. At the late stage of virus multiplication, however, it was extremely reduced in both midgut and hemolymph. 5. The concentration of protein in the midgut and hemolymph of silkworm larvae infected separately with FV and DNV showed no difference from that of the healthy larvae at the early stage of virus multiplication, but it was significantly reduced at the late stage of virus multiplication. 6. There was no difference in the electrophoretic patterns of RNAs extracted from the midgut of healthy or virus-infected larvae. 7. The electrophoresis of proteins extracted from the midgut infected with FV or DNV, when carried out on the 1st and 5th day after virus inoculation, showed no difference from that of the healthy larvae. But, there was an additional band with medium motility in the proteins on the 8th day after virus inoculation, while a band with low mobility shown in the proteins of healthy larvae disappeared in the infected larvae. However, a band with high mobility in the healthy larvae was separated into two fractions in the infected larvae. 8. The electrophoretic pattern of hemolymph proteins of the silkworm larvae infected separately with FV and DNV was similar to that of the healthy larvae, but the concentration of hemolymph proteins in the infected larvae was lower than that of the healthy larvae at the late stage. 9. Two types of inclusion bodies were shown by the double staining of pyronin-methyl green in the columnar cell of the midgut on the 8th day after FV inoculation. 10. Electron microscopy of the infected midgut revealed that the 'cytoplasmic wall' of the goblet cell thickened on the 5th day after FV inoculation and several types of the cytopathogenic structures, such as virus$.$specific vesicles, virus particles, linear structures, tubular structures, and high electron-dense matrices were observed in the cytoplasm of the goblet cell. The virus particles were also observed in the microvilli and the structures similar to spherical virus particles were observed around the virus-specific vesicles, suggesting the virus assembly in the cytoplasm. 11. Fluorescence micrograph of the infected midgut stained with acridine orange showed that the nucleus, the site of DNV multiplication in the columnar cell, enlarged on the 5th day after virus inoculation. 12. Electron microscopic examination of DNV infected midgut revealed that the nucleolus of the columnar cell was broken into granules and those granules dispersed into apical region of the nucleus on the 5th day after virus inoculation. On the 8th day after inoculation, it was also observed that the nucleus of the columnar cell was full with the high electron-dense virogenic stroma which were similar to virus particles. These facts suggest that the virogenic stroma were the sites of virus assembly in the process of DNV multiplication.

  • PDF

An Efficient Load-Sharing Scheme for Internet-Based Clustering Systems (인터넷 기반 클러스터 시스템 환경에서 효율적인 부하공유 기법)

  • 최인복;이재동
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.2
    • /
    • pp.264-271
    • /
    • 2004
  • A load-sharing algorithm must deal with load imbalance caused by characteristics of a network and heterogeneity of nodes in Internet-based clustering systems. This paper has proposed the Efficient Load-Sharing algorithm. Efficient-Load-Sharing algorithm creates a scheduler based on the WF(Weighted Factoring) algorithm and then allocates tasks by an adaptive granularity strategy and the refined fixed granularity algorithm for better performance. In this paper, adaptive granularity strategy is that master node allocates tasks of relatively slower node to faster node and refined fixed granularity algorithm is to overlap between the time spent by slave nodes on computation and the time spent for network communication. For the simulation, the matrix multiplication using PVM is performed on the heterogeneous clustering environment which consists of two different networks. Compared to other algorithms such as Send, GSS and Weighted Factoring, the proposed algorithm results in an improvement of performance by 75%, 79% and 17%, respectively.

  • PDF

GPGPU Acceleration of SAT Algorithm with Propagation Routine Parallelization (전달 루틴의 병렬화를 통한 SAT 알고리즘의 GPGPU 가속화)

  • Kang, Hyeong-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.10
    • /
    • pp.1919-1926
    • /
    • 2016
  • Because of the enormous processing ability, General-Purpose Graphics Processing Unit(GPGPU) has been applied to many fields including electronics design automation. The SAT algorithm is one of the core algorithm in many electronics design automation tools. There has been some efforts to apply GPGPU to the SAT algorithm, but it is difficult to parallelize the SAT algorithm because of its characteristics. In this paper, I applied GPGPU to the SAT algorithm by parallelizing the propagation routine that is relatively suitable to parallel processing. On the basis of the similarity of the propagation routine to the sparse matrix multiplication, the data structure for the SAT problem is constituted, and the parallel propagation routine is described. To prevent data loss between paralllel threads, atomic operations are exploited. The experimental results for some benchmark SAT problems show that the proposed algorithm is superior to the previous GPGPU-based SAT solver.

An Algorithm For Load-Sharing and Fault-Tolerance In Internet-Based Clustering Systems (인터넷 기반 클러스터 시스템 환경에서 부하공유 및 결함허용 알고리즘)

  • Choi, In-Bok;Lee, Jae-Dong
    • The KIPS Transactions:PartA
    • /
    • v.10A no.3
    • /
    • pp.215-224
    • /
    • 2003
  • Since there are various networks and heterogeneity of nodes in Internet, the existing load-sharing algorithms are hardly adapted for use in Internet-based clustering systems. Therefore, in Internet-based clustering systems, a load-sharing algorithm must consider various conditions such as heterogeneity of nodes, characteristics of a network and imbalance of load, and so on. This paper has proposed an expanded-WF algorithm which is based on a WF (Weighted Factoring) algorithm for load-sharing in Internet-based clustering systems. The proposed algorithm uses an adaptive granularity strategy for load-sharing and duplicate execution of partial job for fault-tolerance. For the simulation, the to matrix multiplication using PVM is performed on the heterogeneous clustering environment which consists of two different networks. Compared to other algorithms such as Send, GSS and Weighted Factoring, the proposed algorithm results in an improvement of performance by 55%, 63% and 20%, respectively. Also, this paper shows that It can process the fault-tolerance.

PSNR Comparison of DCT-domain Image Resizing Methods (DCT 영역 영상 크기 조절 방법들에 대한 PSNR 비교)

  • Kim Do nyeon;Choi Yoon sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.10C
    • /
    • pp.1484-1489
    • /
    • 2004
  • Given a video frame in terms of its 8${\times}$8 block-DCT coefncients, we wish to obtain a downsized or upsized version of this Dame also in terms of 8${\times}$8 block DCT coefficients. The DCT being a linear unitary transform is distributive over matrix multiplication. This fact has been used for downsampling video frames in the DCT domains in Dugad's, Mukherjee's, and Park's methods. The downsampling and upsampling schemes combined together preserve all the low-frequency DCT coefficients of the original image. This implies tremendous savings for coding the difference between the original frame (unsampled image) and its prediction (the upsampled image).This is desirable for many applications based on scalable encoding of video. In this paper, we extend the earlier works to various DCT sizes, when we downsample and then upsample of an image by a factor of two. Through experiment, we could improve the PSM values whenever we increase the DCT block size. However, because the complexity will be also increase, we can say there is a tradeoff. The experiment result would provide important data for developing fast algorithms of compressed-domain image/video resizing.

Optimized hardware implementation of CIE1931 color gamut control algorithms for FPGA-based performance improvement (FPGA 기반 성능 개선을 위한 CIE1931 색역 변환 알고리즘의 최적화된 하드웨어 구현)

  • Kim, Dae-Woon;Kang, Bong-Soon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.6
    • /
    • pp.813-818
    • /
    • 2021
  • This paper proposes an optimized hardware implementation method for existing CIE1931 color gamut control algorithm. Among the post-processing methods of dehazing algorithms, existing algorithm with relatively low computations have the disadvantage of consuming many hardware resources by calculating large bits using Split multiplier in the computation process. The proposed algorithm achieves computational reduction and hardware miniaturization by reducing the predefined two matrix multiplication operations of the existing algorithm to one. And by optimizing the Split multiplier computation, it is implemented more efficient hardware to mount. The hardware was designed in the Verilog HDL language, and the results of logical synthesis using the Xilinx Vivado program were compared to verify real-time processing performance in 4K environments. Furthermore, this paper verifies the performance of the proposed hardware with mounting results on two FPGAs.

Vehicle ECU Design Incorporating LIN/CAN Vehicle Interface with Kalman Filter Function (LIN/CAN 차량용 인터페이스와 칼만 필터 기능을 통합한 차량용 ECU 설계)

  • Jeong, Seonwoo;Kim, Yongbin;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.762-765
    • /
    • 2021
  • In this paper, an automotive ECU (electronic control unit) with Kalman filter accelerator is designed and implemented. RISC-V is exploited as a processor core. Accelerator for Kalman filter matrix operation, CAN (controller area network) controller for in-vehicle network, and LIN (local interconnect network) controller are designed and embedded. Kalman filter operation consists of time update process and measurement update process. Current state variable and its error covariance are estimated in time update process. Final values are corrected from input measurement data and Kalman gain in measurement update process. Usually floating-point multiplication is exploited in software implementation, but fixed-point multiplier considering accuracy analysis is exploited in this paper to reduce hardware area. In 28nm silicon fabrication, its operating frequency, area, and gate counts are 100MHz, 0.37mm2, and 760k gates, respectively.