• Title/Summary/Keyword: Memory reduction

Search Result 471, Processing Time 0.027 seconds

GPU-Based Acceleration of Quantum-Inspired Evolutionary Algorithm (GPU를 이용한 Quantum-Inspired Evolutionary Algorithm 가속)

  • Ryoo, Ji-Hyun;Park, Han-Min;Choi, Ki-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.1-9
    • /
    • 2012
  • Quantum-Inspired Evolutionary Algorithm(QEA) contains sufficient data-level parallelism to be naturally accelerated on GPUs. For an efficient reduction of execution time, however, careful task-mapping should be done to properly reflect the characteristics of CPU and GPU. Furthermore, when deciding which part of the application should run on GPU, we need to consider the data transfer between CPU and GPU memory spaces as well as the data-level parallelism. In addition, the usage of zero-copy host memory, proper choice of the execution configuration, and thread organization considering memory coalescing is important to further reduce the execution time. With all these techniques, we could run QEA 3.69 times faster on average in comparison with the multi-threading CPU for the case of 0-1 knapsack problem with 30,000 items.

The Experimental Study on the Effects of Hangbujapalmultang on Enhancing Learning and Memory in Rats with Radial Arm Maze (향부자팔물탕(香附子八物湯)이 흰쥐의 방사형 미로학습(迷路學習)과 기억(記億)에 미치는 영향(影響))

  • Ryu Jae-Myun;Kim Jong-Woo;Whang Wei-Wan;Kim Hyun-Taek;Lee Hong-Jae
    • Journal of Oriental Neuropsychiatry
    • /
    • v.9 no.2
    • /
    • pp.45-51
    • /
    • 1998
  • Purpose : This study has an experiment on finding how Hyangbujapamultang advanced the learning and memory of rat to find the method to improve the failure of memory which is the symptom of dementia.Method : In the experiment, rats were divided the control group (14 rat) which medicate the excipient into the sample group (17 rat) which medicates Hyangbujapalmutang. And the learning ability test and the memorv test was practiced to using the task of radial arm maze.The learning ability test had the presupposition that, when a rat which frequents 8 tracks makes am error not exceeding one time for 3 days without a break, it passes the test.First experiment compared total days when the control group passed the test with total days when the sample group it.The memory test practiced after 24 hours when the learning ability test was over. When a rat frequents 4 tracks, the gates is cut off during 30 seconds. Here the number of error at the control group with that of the sample group.Result: In the learning ability test, the sample group needed 5.82${\pm}$0.37 days to pass the test and the control group needed 6.43${\pm}$0.67 days. In the memory test, the sample group errored 0.29${\pm}$0.37 times and the control group errored 1.86${\pm}$0.78 times.Conclusion : In the learning ability test, the sample group passed the test earlier than the control group, but any statistical correlationship couldn't be found in it. In the memory test, the sample group had the pregnant reduction of the number of error in comparison with the control group.

  • PDF

A Word Spacing System based on Syllable Patterns for Memory-constrained Devices (메모리 제약적 기기를 위한 음절 패턴 기반 띄어쓰기 시스템)

  • Kim, Shin-Il;Yang, Seon;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.8
    • /
    • pp.653-658
    • /
    • 2010
  • In this paper, we propose a word spacing system which can be performed with just a small memory. We focus on significant memory reduction while maintaining the performance of the system as much as the latest studies. Our proposed method is based on the theory of Hidden Markov Model. We use only probability information not adding any rule information. Two types of features are employed: 1) the first features are the spacing patterns dependent on each individual syllable and 2) the second features are the values of transition probability between the two syllable-patterns. In our experiment using only the first type of features, we achieved a high accuracy of more than 91% while reducing the memory by 53% compared with other systems developed for mobile application. When we used both types of features, we achieved an outstanding accuracy of more than 94% while reducing the memory by 76% compared with other system which employs bigram syllables as its features.

Memory Access Reduction Scheme for H.264/AVC Decoder Motion Compensation (H.264/AVC 디코더의 움직임 보상을 위한 메모리 접근 감소 기법)

  • Park, Kyoung-Oh;Hong, You-Pyo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.4C
    • /
    • pp.349-354
    • /
    • 2009
  • In this paper, a new motion compensation scheme to reduce external memory access frequency which is one of the major bottlenecks for real-time decoding is proposed. Most H.264/AVC decoders store reference pictures in external memories due to the large size and reference blocks are read into the decoder core as needed during decoding. If the reference data access is done for each reference block in decoding sequence, the memory bandwidth can be unacceptable for real-time decoding. This paper presents a memory access scheme for motion compensation to read as many reference data as possible with reduced memory access frequency by analyzing reference data access pattern for each macroblock. Experimental results show that the proposed motion compensation scheme leads to approximately 30% improvement in memory bandwidth requirement.

Performance Enhancement of Embedded Software Using Register Promotion (레지스터 프로모션을 이용한 내장형 소프트웨어의 성능 향상)

  • Lee Jong-Yeol
    • The KIPS Transactions:PartA
    • /
    • v.11A no.5
    • /
    • pp.373-382
    • /
    • 2004
  • In this paper, a register promotion technique that translates memory accesses to register accesses is presented to enhance embedded software performance. In the proposed method, a source code is profiled to generate a memory trace. From the profiling results, target functions with high dynamic call counts are selected, and the proposed register promotion technique is applied only to the target functions to save the compilation time. The memory trace of the target functions is searched for the memory accesses that result in cycle count reduction when replaced by register accesses, and they are translated to register accesses by modifying the intermediate code and allocating promotion registers. The experiments on MediaBench and DSPstone benchmark programs show that the proposed method increases the performance by 14% and 18% on the average for ARM and MCORE, respectively.

Memory Delay Comparison between 2D GPU and 3D GPU (2차원 구조 대비 3차원 구조 GPU의 메모리 접근 효율성 분석)

  • Jeon, Hyung-Gyu;Ahn, Jin-Woo;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.7
    • /
    • pp.1-11
    • /
    • 2012
  • As process technology scales down, the number of cores integrated into a processor increases dramatically, leading to significant performance improvement. Especially, the GPU(Graphics Processing Unit) containing many cores can provide high computational performance by maximizing the parallelism. In the GPU architecture, the access latency to the main memory becomes one of the major reasons restricting the performance improvement. In this work, we analyze the performance improvement of the 3D GPU architecture compared to the 2D GPU architecture quantitatively and investigate the potential problems of the 3D GPU architecture. In general, memory instructions account for 30% of total instructions, and global/local memory instructions constitutes 60% of total memory instructions. Therefore, the performance of the 3D GPU is expected to be improved significantly compared to the 2D GPU by reducing the delay of memory instructions. However, according to our experimental results, the 3D architecture improves the GPU performance only by 2% compared to the 2D architecture due to the memory bottleneck, since the performance reduction due to memory bottleneck in the 3D GPU architecture increases by 245% compared to the 2D architecture. This paper provides the guideline for suitable memory design by analyzing the efficiency of the memory architecture in 3D GPU architecture.

Optimized Hardware Design of Deblocking Filter for H.264/AVC (H.264/AVC를 위한 디블록킹 필터의 최적화된 하드웨어 설계)

  • Jung, Youn-Jin;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.20-27
    • /
    • 2010
  • This paper describes a design of 5-stage pipelined de-blocking filter with power reduction scheme and proposes a efficient memory architecture and filter order for high performance H.264/AVC Decoder. Generally the de-blocking filter removes block boundary artifacts and enhances image quality. Nevertheless filter has a few disadvantage that it requires a number of memory access and iterated operations because of filter operation for 4 time to one edge. So this paper proposes a optimized filter ordering and efficient hardware architecture for the reduction of memory access and total filter cycles. In proposed filter parallel processing is available because of structured 5-stage pipeline consisted of memory read, threshold decider, pre-calculation, filter operation and write back. Also it can reduce power consumption because it uses a clock gating scheme which disable unnecessary clock switching. Besides total number of filtering cycle is decreased by new filter order. The proposed filter is designed with Verilog-HDL and functionally verified with the whole H.264/AVC decoder using the Modelsim 6.2g simulator. Input vectors are QCIF images generated by JM9.4 standard encoder software. As a result of experiment, it shows that the filter can make about 20% total filter cycles reduction and it requires small transposition buffer size.

Memory Reduction of IFFT Using Combined Integer Mapping for OFDM Transmitters (CIM(Combined Integer Mapping)을 이용한 OFDM 송신기의 IFFT 메모리 감소)

  • Lee, Jae-Kyung;Jang, In-Gul;Chung, Jin-Gyun;Lee, Chul-Dong
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.10
    • /
    • pp.36-42
    • /
    • 2010
  • FFT(Fast Fourier Transform) processor is one of the key components in the implementation of OFDM systems for many wireless standards such as IEEE 802.22. To improve the performances of FFT processors, various studies have been carried out to reduce the complexities of multipliers, memory interface, control schemes and so on. While the number of FFT stages increases logarithmically $log_2N$) as the FFT point-size (N) increases, the number of required registers (or, memories) increases linearly. In large point-size FFT designs, the registers occupy more than 70% of the chip area. In this paper, to reduce the memory size of IFFT for OFDM transmitters, we propose a new IFFT design method based on a combined mapping of modulated data, pilot and null signals. The proposed method focuses on reducing the sizes of the registers in the first two stages of the IFFT architectures since the first two stages require 75% of the total registers. By simulations of 2048-point IFFT design for cognitive radio systems, it is shown that the proposed IFFT design method achieves more than 38.5% area reduction compared with previous IFFT designs.

Comparison of Fatigue Strength Criteria for TiNi/Al6061-T6 and TiNi/Al2024-T4 Shape Memory Alloy Composite (TiNi/Al6061-T6과 TiNi/Al2024-T4 형상기억복합재료에 대한 피로강도기준의 비교)

  • Jo, Young-Jik;Park, Young-Chul
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.33 no.2
    • /
    • pp.99-107
    • /
    • 2009
  • This study produced a design curve and fatigue limit for a variation in volume ratio and reduction ratio of TiNi/Al composites. In many cases, stress-life curve does not indicate fatigue limit, so it was presented by probabilistic-stress-life curve. Goodman diagram was used to analyze the fatigue strength of materials with a finite life determined by repeated load and the fatigue strength of endurance limit with an infinite life. The fatigue experiment was conducted using the scenk-type plane bending specimen in same shape. The result of the fatigue test, which had been conducted under consistent stress amplitude, was examined. (i) The optimal condition for TiNi/Al in accordance with hot pressing (ii) Impacts of fatigue limit caused by a variation in reduction ratio and volume ratio of TiNi/Al composites (iii) Probability distribution for fatigue limit of TiNi/Al2024 and TiNi/Al6061.

Lightweight control system that can be mounted on micro-controller (초소형 마이크로 컨트롤러에 탑재 가능한 경량화 컨트롤 시스템)

  • Kim, Doan;Kim, Mingyu;Min, Chanhong;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.501-502
    • /
    • 2018
  • Traditional miniature micro-controllers focus on communication only because of memory capacity is low due to small size. Due to this, it is difficult to mount the UI for user convenience in the control system and a lot of functions cannot be added. To solve this problem, this paper proposes a weight reduction and encoding method for the control system. In addition, it is increased user convenience by overcoming the problem of difficulty in Korean system in the existing system. Also, we can add various functions to the memory space secured by system weight reduction.

  • PDF