• Title/Summary/Keyword: 비병렬 데이터

Search Result 303, Processing Time 0.062 seconds

A new warp scheduling technique for improving the performance of GPUs by utilizing MSHR information (GPU 성능 향상을 위한 MSHR 정보 기반 워프 스케줄링 기법)

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.72-83
    • /
    • 2017
  • GPUs can provide high throughput with latency hiding by executing many warps in parallel. MSHR(Miss Status Holding Registers) for L1 data cache tracks cache miss requests until required data is serviced from lower level memory. In recent GPUs, excessive requests for cache resources cause underutilization problem of GPU resources due to cache resource reservation fails. In this paper, we propose a new warp scheduling technique to reduce stall cycles under MSHR resource shortage. Cache miss rates for each warp is predicted based on the observation that each warp shows similar cache miss rates for long period. The warps showing low miss rates or computation-intensive warps are given high priority to be issued when MSHR is full status. Our proposal improves GPU performance by utilizing cache resource more efficiently based on cache miss rate prediction and monitoring the MSHR entries. According to our experimental results, reservation fail cycles can be reduced by 25.7% and IPC is increased by 6.2% with the proposed scheduling technique compared to loose round robin scheduler.

Design of Binary Constant Envelope System using the Pre-Coding Scheme in the Multi-User CDMA Communication System (다중 사용자 CDMA 통신 시스템에서 프리코딩 기법을 사용한 2진 정진폭 시스템 설계)

  • 김상우;유흥균;정순기;이상태
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.15 no.5
    • /
    • pp.486-492
    • /
    • 2004
  • In this paper, we newly propose the binary CA-CDMA(constant amplitude CDMA) system using pre-coding method to solve the high PAPR problem caused by multi-user signal transmission in the CDMA system. 4-user CA-CDMA, the basis of proposed binary CA-CDMA system, makes binary output signal for 4 input users. It produces the output of binary(${\pm}$2) amplitude by using a parity signal resulting from the XOR operation of 4 users data. Another sub-channel or more bandwidth is not necessary because it is transmitted together with user data and can be easily recovered in the receiver. The extension of the number of users can be possible by the simple repetition of the basic binary 4-user CA-CDMA. For example, binary 16-user CA-CDMA is made easily by allocating the four 4-user CA-CDMA systems in parallel and leading the four outputs to the fifth 4-user CA-CDMA system as input, because the output signal of each 4-user CA-CDMA is also binary. By the same extension procedure, binary 64 and 256-user CA-CDMA systems can be made with the constant amplitude. As a result, the code rate of this proposed CA-CDMA system is just 1 and binary CA-CDMA does not change the transmission rate with the constant output signal(PAPR = 0 ㏈). Therefore, the power efficiency of the HPA can be maximized without the nonlinear distortion. From the simulation results, it is verified that the conventional CDMA system has multi-level output signal, but the proposed binary CA-CDMA system always produces binary output. And it is also found that the BER of conventional CDMA system is increased by nonlinear HPA, but the BER of proposed binary CA-CDMA system is not changed.

Performance Evaluation of Multi-User Detectors Employing Subtractive Interference Cancellation Schemes for a DS-CDMA System (감산형 간섭제거기법을 적용한 DS-CDMA 다중사용자 검파기의 성능분석)

  • Seo, Jung-Wook;Kim, Young-Chul;Oh, Chang-Heon;Ko, Bong-Jin;Cho, Sung-Joon
    • Journal of Advanced Navigation Technology
    • /
    • v.6 no.1
    • /
    • pp.17-26
    • /
    • 2002
  • In this paper, we have analyzed the BER (Bit Error Rate) performance of multi-user detectors employing SIC(Successive Interference Cancellation) and PIC (Parallel Interference Cancellation) which are the representative schemes in the subtractive interference cancellation. We have considered the MUD structure employing HIC (Hybrid Interference Cancellation) which combines SIC with PIC scheme, and then analyzed the BER performance. We have evaluated the BER performance of SIC and HIC schemes which execute the soft decision to generate the tentative data bit for the purpose of the interference cancellation in MAI and noise environments. Through the numerical analysis and computer simulation, it is shown that HIC can remove the effect of MAI more efficiently than the others, that improve the BER performance and increase the capacity of DS-CDMA systems regardless of the power control conditions. The reason is that the SIC scheme in front of HIC can solve the near-far problem caused by the imperfect power control and PIC scheme in the rear of it can improve the performance much more.

  • PDF

Design of a Neural Network PI Controller for F/M of Heavy Water Reactor Actuator Pressure (신경회로망과 PI제어기를 이용한 중수로 핵연료 교체 로봇의 구동압력 제어)

  • Lim, Dae-Yeong;Lee, Chang-Goo;Kim, Young-Baik;Kim, Young-Chul;Chong, Kil-To
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.3
    • /
    • pp.1255-1262
    • /
    • 2012
  • Look into the nuclear power plant of Wolsong currently, it is controlled in order to required operating pressure with PI controller. PI controller has a simple structure and satisfy design requirements to gain setting. However, It is difficult to control without changing the gain from produce changes in parameters such as loss of the valves and the pipes. To solve these problems, the dynamic change of the PI controller gain, or to compensate for the PI controller output is desirable to configure the controller. The aim of this research and development in the parameter variations can be controlled to a stable controller design which is reduced an error and a vibration. Proposed PI/NN control techniques is the PI controller and the neural network controller that combines a parallel and the neural network controller part is compensated output of the controller for changes in the parameters were designed to be robust. To directly evaluate the controller performance can be difficult to test in real processes to reflect the characteristics of the process. Therefore, we develope the simulator model using the real process data and simulation results when compared with the simulated process characteristics that showed changes in the parameters. As a result the PI/NN controller error and was confirmed to reduce vibrations.

Implementation of Hardware Data Prefetcher Adaptable for Various State-of-the-Art Workload (다양한 최신 워크로드에 적용 가능한 하드웨어 데이터 프리페처 구현)

  • Kim, KangHee;Park, TaeShin;Song, KyungHwan;Yoon, DongSung;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.20-35
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

Fast Image Pre-processing Algorithms Using SSE Instructions (SSE 명령어를 이용한 영상의 고속 전처리 알고리즘)

  • Park, Eun-Soo;Cui, Xuenan;Kim, Jun-Chul;Im, Yu-Cheong;Kim, Hak-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.2
    • /
    • pp.65-77
    • /
    • 2009
  • This paper proposes fast image processing algorithms using SSE (Streaming SIMD Extensions) instructions. The CPU's supporting SSE instructions have 128bit XMM registers; data included in these registers are processed at the same time with the SIMD (Single Instruction Multiple Data) mode. This paper develops new SIMD image processing algorithms for Mean filter, Sobel horizontal edge detector, and Morphological erosion operation which are most widely used in automated optical inspection systems and compares their processing times. In order to objectively evaluate the processing time, the developed algorithms are compared with OpenCV 1.0 operated in SISD (Single Instruction Single Data) mode, Intel's IPP 5.2 and MIL 8.0 which are fast image processing libraries supporting SIMD mode. The experimental result shows that the proposed algorithms on average are 8 times faster than the SISD mode image processing library and 1.4 times faster than the SIMD fast image processing libraries. The proposed algorithms demonstrate their applicability to practical image processing systems at high speed without commercial image processing libraries or additional hardwares.

A Design of AES-based WiBro Security Processor (AES 기반 와이브로 보안 프로세서 설계)

  • Kim, Jong-Hwan;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.7 s.361
    • /
    • pp.71-80
    • /
    • 2007
  • This paper describes an efficient hardware design of WiBro security processor (WBSec) supporting for the security sub-layer of WiBro wireless internet system. The WBSec processor, which is based on AES (Advanced Encryption Standard) block cipher algorithm, performs data oncryption/decryption, authentication/integrity, and key encryption/decryption for packet data protection of wireless network. It carries out the modes of ECB, CTR, CBC, CCM and key wrap/unwrap with two AES cores working in parallel. In order to achieve an area-efficient implementation, two design techniques are considered; First, round transformation block within AES core is designed using a shared structure for encryption/decryption. Secondly, SubByte/InvSubByte blocks that require the largest hardware in AES core are implemented using field transformation technique. It results that the gate count of WBSec is reduced by about 25% compared with conventional LUT (Look-Up Table)-based design. The WBSec processor designed in Verilog-HDL has about 22,350 gates, and the estimated throughput is about 16-Mbps at key wrap mode and maximum 213-Mbps at CCM mode, thus it can be used for hardware design of WiBro security system.

A Study on the Implementation of the Multi-Process Structured ISDN Terminal Adaptor for Sending the Ultra Sound Medical Images (다중처리 구조를 갖는 초음파 의료영상 전송용 ISDN(Integrated Services Digital Network) TA(Terminal Adaptor) 구현에 관한 연구)

  • 남상규;이영후
    • Journal of Biomedical Engineering Research
    • /
    • v.15 no.3
    • /
    • pp.317-324
    • /
    • 1994
  • This paper proposed a new method in the implementation of ISDN (integrated services digital network) LAPD (link access procedure on the D-channel) and LAPB (link access procedure on the B-channel) protocols. The proposed method in this paper implement ISDW LAPD protocol through multi-tasking operating system and adopt a kernel part that is changed operating system to target board. The features of implemented system are (1) the para.llel processing of the events generated at each layer, as follows (2) the supporting necessary timers for the implementation of ISDW LAPD protocol from the kernel part by using software, (3) the recommanded SAP (Service Access Point) from CCITT was composed by using port function in the operating system. With the proposed method, the protocols of ISDH layerl, layer2 and layer3 (call control) were implemented by using the kernel part and related tests were carried out by connecting the ISDH terminal simulator to ISDN S-interface system using the ISDN LAPD protocol The results showed that ISDW S-interface terminals could be discriminated by TEI (Terminal Equipment Identifier) assignment in layer 2 (LAPD) and the message transmission of layer 3 was verified by establishing the multi-frame transmission and then through the path established by the LAPD protocol, a user data was tranfered and received on B-channel with LAPB protocol Thererfore, as new efficient ISDN S-interface environment was implemented in the thesis, it was verified that the implemented system can be utilized by connecting ISDW in the future to transfer a medical image data.

  • PDF

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

Improving Lifetime Prediction Modeling for SiON Dielectric nMOSFETs with Time-Dependent Dielectric Breakdown Degradation (SiON 절연층 nMOSFET의 Time Dependent Dielectric Breakdown 열화 수명 예측 모델링 개선)

  • Yeohyeok Yun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.4
    • /
    • pp.173-179
    • /
    • 2023
  • This paper analyzes the time-dependent dielectric breakdown(TDDB) degradation mechanism for each stress region of Peri devices manufactured by 4th generation VNAND process, and presents a complementary lifetime prediction model that improves speed and accuracy in a wider reliability evaluation region compared to the conventional model presented. SiON dielectric nMOSFETs were measured 10 times each under 5 constant voltage stress(CVS) conditions. The analysis of stress-induced leakage current(SILC) confirmed the significance of the field-based degradation mechanism in the low electric field region and the current-based degradation mechanism in the high field region. Time-to-failure(TF) was extracted from Weibull distribution to ascertain the lifetime prediction limitations of the conventional E-model and 1/E-model, and a parallel complementary model including both electric field and current based degradation mechanisms was proposed by extracting and combining the thermal bond breakage rate constant(k) of each model. Finally, when predicting the lifetime of the measured TDDB data, the proposed complementary model predicts lifetime faster and more accurately, even in the wider electric field region, compared to the conventional E-model and 1/E-model.