• Title/Summary/Keyword: Memory access

Search Result 1,134, Processing Time 0.027 seconds

다중프로세서 컴퓨터시스템을 위한 버스중재 프로토콜의 성능 분석 및 비교

  • 김병량
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1992.10a
    • /
    • pp.2-2
    • /
    • 1992
  • 최근 여러 분야에서 컴퓨터의 용도가 확산되고 더 높은 computing power에 대한 요구가 증가함에 따라, 컴퓨터의 성능을 향상시키기 위하여 프로세서의 고속화와 함께 시스템 구조의 개선을 위한 많은 연구가 진행되고 있다. 한 시스템내에 여러 개의 CPU들이 존재하는 다중프로세서 시스템(multiprocessor system) 구조를 가진 슈퍼미니급 중형 컴퓨터들은 상호연결망으로서 버스(bus) 방식을 많이 채택하고 있다. 버스 구조는 하드웨어가 간단하여 구현이 용이하지만, 여러 개의 시스템 지원들(프로세서들, 기억장치 모듈들 및 입출력 모듈들)이 버스를 공유하기 때문에 경합으로 인한 지연 시간이 발생하게 된다. 이러한 지연 시간으로 인한 성능 저하를 개선하는 방법으로는 버스 수의 증가와 최적 통제 프로토콜의 설계가 있다. 본 연구에서는 여러 개의 버스를 가진 다중프로세서 시스템에서 4가지 대표적인 버스 중재 프로토콜들에 대해 성능을 분석, 비교하여 최적 프로토콜을 제시하고자 한다. 이러한 대규모 하드웨어에 의하여 구현되는 시스템에서 주요 설계 요소들에 따른 시스템 성능 분석과 비교는 설계 단계에서 필수적인 과정이다. 그러나 하드웨어를 만들어서 분석하는 방법은 시간과 비용이 많이 소요되기 때문에 소프트웨어 시뮬레이션 방법이 널리 사용되고 있다. 본 연구팀에서는 시뮬레이션 전용언어인 SLAM II를 이용하여 다중프로세서 시스템의 시뮬레이터를 개발하고, 버스중재 프로토콜(bus arbitration protocol)을 용이하게 변경할 수 있도록 하여 각각의 성능을 비교하였다. 이 연구에서 비교된 프로토콜들은 고정-우선순위 방식(fixed-priority scheme), FIFO(first-in first-out) 방식, 라운드-로빈 방식(round-robin scheme), 및 회전-우선순위 방식(rotating-priority scheme) 등이다. 실험은 시스템의 주요 요소들인 프로세서와 기억장치 모듈 및 버스의 수들을 변경시킴으로써 다양한 시스템 환경에 대한 분석을 시도하였다. 작업 부하가 되는 기하장치 액세스 요구간 시간가격(inter-memory access request time interval)은 필요에 따라서 고정값 또는 확률 분포함수를 사용하였다. 특히, 실행될 프로그램의 특성에 따라 각 프로토콜의 성능이 다르게 나타날 수 있음을 검증하였으며, 기억장치의 지역성(memory locality)에 대한 프로토콜들의 성능도 비교하였다.

  • PDF

TinyECCK : Efficient Implementation of Elliptic Curve Cryptosystem over GF$(2^m)$ on 8-bit Micaz Mote (TinyECCK : 8 비트 Micaz 모트에서 GF$(2^m)$상의 효율적인 타원곡선 암호 시스템 구현)

  • Seo, Seog-Chung;Han, Dong-Guk;Hong, Seok-Hie
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.3
    • /
    • pp.9-21
    • /
    • 2008
  • In this paper, we revisit a generally accepted opinion: implementing Elliptic Curve Cryptosystem (ECC) over GF$(2^m)$ on sensor motes using small word size is not appropriate because partial XOR multiplication over GF$(2^m)$ is not efficiently supported by current low-powered microprocessors. Although there are some implementations over GF$(2^m)$ on sensor motes, their performances are not satisfactory enough due to the redundant memory accesses that result in inefficient field multiplication and reduction. Therefore, we propose some techniques for reducing unnecessary memory access instructions. With the proposed strategies, the running time of field multiplication and reduction over GF$(2^{163})$ can be decreased by 21.1% and 24.7%, respectively. These savings noticeably decrease execution times spent in Elliptic Curve Digital Signature Algorithm (ECDSA) operations (Signing and verification) by around $15{\sim}19%$.

Design of an Optimized GPGPU for Data Reuse in DeepLearning Convolution (딥러닝 합성곱에서 데이터 재사용에 최적화된 GPGPU 설계)

  • Nam, Ki-Hun;Lee, Kwang-Yeob;Jung, Jun-Mo
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.664-671
    • /
    • 2021
  • This paper proposes a GPGPU structure that can reduce the number of operations and memory access by effectively applying a data reuse method to a convolutional neural network(CNN). Convolution is a two-dimensional operation using kernel and input data, and the operation is performed by sliding the kernel. In this case, a reuse method using an internal register is proposed instead of loading kernel from a cache memory until the convolution operation is completed. The serial operation method was applied to the convolution to increase the effect of data reuse by using the principle of GPGPU in which instructions are executed by the SIMT method. In this paper, for register-based data reuse, the kernel was fixed at 4×4 and GPGPU was designed considering the warp size and register bank to effectively support it. To verify the performance of the designed GPGPU on the CNN, we implemented it as an FPGA and then ran LeNet and measured the performance on AlexNet by comparison using TensorFlow. As a result of the measurement, 1-iteration learning speed based on AlexNet is 0.468sec and the inference speed is 0.135sec.

Analysis of Unmet Healthcare Needs and Risk Factors to Improve the Life Care of Osteoporosis Patients (골다공증 환자의 라이프 케어 증진을 위한 미충족 의료실태와 위험요인 분석)

  • Park, Hyeon-Hee
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.2
    • /
    • pp.225-235
    • /
    • 2020
  • Purpose: This study is a descriptive and secondary analytical study that uses panel data to analysis of unmet healthcare needs and risk factors for improving life care of osteoporosis patients. Methods: The subjects of this study were 941 patients who were diagnosed with osteoporosis using Korea Medical Panel 2015 data(β-version 1.0). Data analysis was performed using Chi-Square and logistic regression using SPSS/win 22.0. Results: The unmet healthcare needs of osteoporosis patients were 22.6%. The factors of unmet healthcare needs were education level and age in Model I of demographic factors, and eating problems, memory problems, activity limitation, and disability in Model II. In Model III, which added socio-psychological factors, eating problems, memory problems, Total family income, and pain/Discomfort were identified. Conclusion: Based on the results of this study, it should be considered in the planning of medical policies to improve the life care of osteoporosis patients, and it is necessary to improve access to medical services and to prevent and mediate realistically to reduce unmet healthcare needs.

4-way Search Window for Improving The Memory Bandwidth of High-performance 2D PE Architecture in H.264 Motion Estimation (H.264 움직임추정에서 고속 2D PE 아키텍처의 메모리대역폭 개선을 위한 4-방향 검색윈도우)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.6
    • /
    • pp.6-15
    • /
    • 2009
  • In this paper, a new 4-way search window is designed for the high-performance 2D PE architecture in H.264 Motion Estimation(ME) to improve the memory bandwidth. While existing 2D PE architectures reuse the overlapped data of adjacent search windows scanned in 1 or 3-way, the new window utilizes the overlapped data of adjacent search windows as well as adjacent multiple scanning (window) paths to enhance the reusage of retrieved search window data. In order to scan adjacent windows and multiple paths instead of single raster and zigzag scanning of adjacent windows, bidirectional row and column window scanning results in the 4-way(up. down, left, right) search window. The proposed 4-way search window could improve the reuse of overlapped window data to reduce the redundancy access factor by 3.1, though the 1/3-way search window redundantly requires $7.7{\sim}11$ times of data retrieval. Thus, the new 4-way search window scheme enhances the memory bandwidth by $70{\sim}58%$ compared with 1/3-way search window. The 2D PE architecture in H.264 ME for 4-way search window consists of $16{\times}16$ pe array. computing the absolute difference between current and reference frames, and $5{\times}16$ reusage array, storing the overlapped data of adjacent search windows and multiple scanning paths. The reference data could be loaded upward and downward into the new 2D PE depending on scanning direction, and the reusage array is combined with the pe array rotating left as well as right to utilize the overlapped data of adjacent multiple scan paths. In experiments, the new implementation of 4-way search window on Magnachip 0.18um could deal with the HD($1280{\times}720$) video of 1 reference frame, $48{\times}48$ search area and $16{\times}16$ macroblock by 30fps at 149.25MHz.

Generating Local Addresses for Block-Cyclic Distributed Array (블록-순환으로 분배된 배열의 지역 주소 생성)

  • Kwon, Oh-Young;Kim, Tae-Geun;Han, Tack-Don;Yang, Sung-Bong;Kim, Shin-Dug
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.11
    • /
    • pp.2835-2844
    • /
    • 1998
  • Most data parallel languages provide the block-cyclic distribution (cyclic(k)) that is one of the most general regular distributions. In order to generate local addresses for an array section A(l:h:s) with block-cyclic distribution, efficient compiling methods or run-time methods are required. In this paper, two local address generation methods for the block-cyclic distribution are presented. One is a simple scan method that is modified from the virtual-block scheme. The other is a linear-time ${\Delta}M$ table that contains the local memory access information construction method. This method is simpler than other algorithms for generating a ${\Delta}M$ table. Experimental results show that a simple that a simple scan method has poor performance but a linear-time ${\Delta}M$ table generation method is faster than other algorithms in ${\Delta}M$ table generation time and access time for 10,000 array elements.

  • PDF

Design of RFID Packaging for Construction Materials (건축자재용 RFID 패키징 설계)

  • Shin, Jae-Hui;Hwang, Suk-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.6
    • /
    • pp.923-931
    • /
    • 2013
  • RFID (Radio Frequency Identification), which is a kind of the electronic tag, is a wireless access device using the radio frequency for recognizing the ID information. It has a variety of application such as the bus card, gate access card, distribution industry, and management of construction materials. The performance and size of RFID depend on the penetrability, recognition ratio, memory size, multi tag recognition, external pollution dust, and exterior impact, and RFID requires the packaging to protect itself considered above factors. Recently, RFID is diversely employed to effectively manage construction materials and the RFID packaging, which is robust to the external impact, is required to attach RFID on construction materials. In this paper, we propose the construction material RFID packaging designed to be robust for the external impact and to be practicable for change of the broken RFID. For the change of RFID, we separate the cast and body of the packaging. Also, we present the detail drawing for the proposed construction material RFID packaging and implement the performance evaluation of the packaging manufactured using 3D printer.

Web-Based Distributed Visualization System for Large Scale Geographic Data (대용량 지형 데이터를 위한 웹 기반 분산 가시화 시스템)

  • Hwang, Gyu-Hyun;Yun, Seong-Min;Park, Sang-Hun
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.6
    • /
    • pp.835-848
    • /
    • 2011
  • In this paper, we propose a client server based distributed/parallel system to effectively visualize huge geographic data. The system consists of a web-based client GUI program and a distributed/parallel server program which runs on multiple PC clusters. To make the client program run on mobile devices as well as PCs, the graphical user interface has been designed by using JOGL, the java-based OpenGL graphics library, and sending the information about current available memory space and maximum display resolution the server can minimize the amount of tasks. PC clusters used to play the role of the server access requested geographic data from distributed disks, and properly re-sample them, then send the results back to the client. To minimize the latency happened in repeatedly access the distributed stored geography data, cache data structures have been maintained in both every nodes of the server and the client.

Design of High-Speed EEPROM IP Based on a BCD Process (BCD 공정기반의 고속 EEPROM IP 설계)

  • Jin, RiJun;Park, Heon;Ha, Pan-Bong;Kim, Young-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.5
    • /
    • pp.455-461
    • /
    • 2017
  • In this paper, a local DL (Data Line) sensing method with smaller parasitic capacitance replacing the previous distributed DB sensing method with large parasitic capacitance is proposed to reduce the time to transfer BL (Bit Line) voltage to DL in the read mode. A new BL switching circuit turning on NMOS switches faster is also proposed. Furthermore, the access time is reduced to 35.63ns from 40ns in the read mode and thus meets the requirement since BL node voltage is clamped at 0.6V by a DL clamping circuit instead of precharging the node to VDD-VT and a differential amplifier are used. The layout size of the designed 512Kb EEPROM memory IP based on a $0.13{\mu}m$ BCD is $923.4{\mu}m{\times}1150.96{\mu}m$ ($=1.063mm^2$).

A 0.8-V Static RAM Macro Design utilizing Dual-Boosted Cell Bias Technique (이중 승압 셀 바이어스 기법을 이용한 0.8-V Static RAM Macro 설계)

  • Shim, Sang-Won;Jung, Sang-Hoon;Chung, Yeon-Bae
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.1
    • /
    • pp.28-35
    • /
    • 2007
  • In this paper, an ultra low voltage SRAM design method based on dual-boosted cell bias technique is described. For each read/write cycle, the wordline and cell power node of the selected SRAM cells are boosted into two different voltage levels. This enhances SNM(Static Noise Margin) to a sufficient amount without an increase of the cell size, even at sub 1-V supply voltage. It also improves the SRAM circuit speed owing to increase of the cell read-out current. The proposed design technique has been demonstrated through 0.8-V, 32K-byte SRAM macro design in a $0.18-{\mu}m$ CMOS technology. Compared to the conventional cell bias technique, the simulation confirms an 135 % enhancement of the cell SNM and a 31 % faster speed at 0.8-V supply voltage. This prototype chip shows an access time of 23 ns and a power dissipation of $125\;{\mu}W/Hz$.