• Title/Summary/Keyword: 메모리 요구량

Search Result 269, Processing Time 0.041 seconds

A Post-mortem Detection Tool of First Races to Occur in Shared-Memory Programs with Nested Parallelism (내포병렬성을 가진 공유메모리 프로그램에서 최초경합의 수행후 탐지도구)

  • Kang, Mun-Hye;Sim, Gab-Sig
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.17-24
    • /
    • 2014
  • Detecting data races is important for debugging shared-memory programs with nested parallelism, because races result in unintended non-deterministic executions of the program. It is especially important to detect the first occurred data races for effective debugging, because the removal of such races may make other affected races disappear or appear. Previous dynamic detection tools for first race detecting can not guarantee that detected races are unaffected races. Also, the tools does not consider the nesting levels or need support of other techniques. This paper suggests a post-mortem tool which collects candidate accesses during program execution and then detects the first races to occur on the program after execution. This technique is efficient, because it guarantees that first races reported by analyzing a nesting level are the races that occur first at the level, and does not require more analyses to the higher nesting levels than the current level.

Design and Implementation of Human-Detecting Radar System for Indoor Security Applications (실내 보안 응용을 위한 사람 감지 레이다 시스템의 설계 및 구현)

  • Jang, Daeho;Kim, Hyeon;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.783-790
    • /
    • 2020
  • In this paper, the human detecting radar system for indoor security applications is proposed, and its FPGA-based implementation results are presented. In order to minimize the complexity and memory requirements of the computation, the top half of the spectrogram was used to extract features, excluding the feature extraction techniques that require complex computation, feature extraction techniques were proposed considering classification performance and complexity. In addition, memory requirements were minimized by designing a pipeline structure without storing the entire spectrogram. Experiments on human, dog and robot cleaners were conducted for classification, and 96.2% accuracy performance was confirmed. The proposed system was implemented using Verilog-HDL, and we confirmed that a low-area design using 1140 logics and 6.5 Kb of memory was possible.

Optimized Binary-Search-on- Range Architecture for IP Address Lookup (IP 주소 검색을 위한 최적화된 영역분할 이진검색 구조)

  • Park, Kyong-Hye;Lim, Hye-Sook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.12B
    • /
    • pp.1103-1111
    • /
    • 2008
  • Internet routers forward an incoming packet to an output port toward its final destination through IP address lookup. Since each incoming packet should be forwarded in wire-speed, it is essential to provide the high-speed search performance. In this paper, IP address lookup algorithms using binary search are studied. Most of the binary search algorithms do not provide a balanced search, and hence the required number of memory access is excessive so that the search performance is poor. On the other hand, binary-search-on-range algorithm provides high-speed search performance, but it requires a large amount of memory. This paper shows an optimized binary-search-on-range structure which reduces the memory requirement by deleting unnecessary entries and an entry field. By this optimization, it is shown that the binary-search-on-range can be performed in a routing table with a similar or lesser number of entries than the number of prefixes. Using real backbone routing data, the optimized structure is compared with the original binary-search-on-range algorithm in terms of search performance. The performance comparison with various binary search algorithms is also provided.

An Efficient and Scalable 30-WT Compression Scheme (효율적이고 확장가능한 30-WT 압축기법)

  • 김성민;박시용;이승원;이화세;정기동
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04d
    • /
    • pp.614-616
    • /
    • 2003
  • 기존의 비디오 코딩에서는 연속된 프레임의 시간적인 상관성을 제거하기 위한 방법으로 이전 프레임의 정보를 이용하여 현재 프레임을 예측하는 움직임 예측기법을 많이 사용하고 있다. 정지 화상에 비해서 대용량의 특성을 지니는 비디오 데이터는 이런 움직임 예측을 통해서 대부분의 압축이 일어나게 된다. 하지만 움직임 예측기법은 않은 계산과정을 요구하므로, 전체적인 부호기 복잡도를 높이는 단점을 지닌다. 반면 30-WT는 움직임 예측을 하지 않으므로, 부호기의 복잡도를 줄일 수 있다. 하지만. 기존의 30-WT기법들은 부호화를 위한 메모리 요구사항과 복호를 위한 수신측의 지연시간이 가장 큰 단점으로 지적되었다. 따라서, 본 논문에서는 메모리 요구사항과 수신측의 지연시간을 최소로 할 수 있는 효율적이고, 확장가능한 3D-WT 기법을 소개한다.

  • PDF

Index block mapping for flash memory system (플래쉬 메모리 시스템을 위한 인덱스 블록 매핑)

  • Lee, Jung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.8
    • /
    • pp.23-30
    • /
    • 2010
  • Flash memory is non-volatile and can retain data even after system is powered off. Besides, it has many other features such as fast access speed, low power consumption, attractive shock resistance, small size, and light-weight. As its price decreases and capacity increases, the flash memory is expected to be widely used in consumer electronics, embedded systems, and mobile devices. Flash storage systems generally adopt a software layer, called FTL. In this research, we proposed a new FTL mechanism for overcoming the major drawback of conventional block mapping algorithm. In addition to the block mapping table, a index block mapping table with a small size is used to indicate sector location. The proposed indexed block mapping algorithm by adding a small size. By the simulation result, the proposed FTL provides an enhanced speed than a conventional hybrid mapping algorithm by around 45% in average, and the requirement of mapping memory is also reduced by around 12%.

Quantization of LPC Coefficients Using a Multi-frame AR-model (Multi-frame AR model을 이용한 LPC 계수 양자화)

  • Jung, Won-Jin;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.2
    • /
    • pp.93-99
    • /
    • 2012
  • For speech coding, a vocal tract is modeled using Linear Predictive Coding (LPC) coefficients. The LPC coefficients are typically transformed to Line Spectral Frequency (LSF) parameters which are advantageous for linear interpolation and quantization. If multidimensional LSF data are quantized directly using Vector-Quantization (VQ), high rate-distortion performance can be obtained by fully utilizing intra-frame correlation. In practice, since this direct VQ system cannot be used due to high computational complexity and memory requirement, Split VQ (SVQ) is used where a multidimensional vector is split into multilple sub-vectors for quantization. The LSF parameters also have high inter-frame correlation, and thus Predictive SVQ (PSVQ) is utilized. PSVQ provides better rate-distortion performance than SVQ. In this paper, to implement the optimal predictors in PSVQ for voice storage devices, we propose Multi-Frame AR-model based SVQ (MF-AR-SVQ) that considers the inter-frame correlations with multiple previous frames. Compared with conventional PSVQ, the proposed MF-AR-SVQ provides 1 bit gain in terms of spectral distortion without significant increase in complexity and memory requirement.

An Image Interpolation Method using an Improved Least Square Estimation (개선된 Least Square Estimation을 이용한 영상 보간 방법)

  • Lee Dong Ho;Na Seung Je
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.10C
    • /
    • pp.1425-1432
    • /
    • 2004
  • Because of the high performance with the edge regions, the existing LSE(Least Square Estimation) method provides much better results than other methods. However, since it emphasizes not oがy edge components but also noise components, some part of interpolated images looks like unnatural. It also requires very high computational complexity and memory for implementation. We propose a new LSE interpolation method which requires much lower complexity and memory, but provides better performance than the existing method. To reduce the computational complexity, we propose and adopt a simple sample window and a direction detector to reduce the size of memory without blurring image. To prevent from emphasizing noise components, the hi-linear interpolation method is added in the LSE formula. The simulation results show that the proposed method provides better subjective and objective performance with love. complexity than the existing method.

An Effective Face Authentication Method for Resource - Constrained Devices (제한된 자원을 갖는 장치에서 효과적인 얼굴 인증 방법)

  • Lee Kyunghee;Byun Hyeran
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1233-1245
    • /
    • 2004
  • Though biometrics to authenticate a person is a good tool in terms of security and convenience, typical authentication algorithms using biometrics may not be executed on resource-constrained devices such as smart cards. Thus, to execute biometric processing on resource-constrained devices, it is desirable to develop lightweight authentication algorithm that requires only small amount of memory and computation. Also, among biological features, face is one of the most acceptable biometrics, because humans use it in their visual interactions and acquiring face images is non-intrusive. We present a new face authentication algorithm in this paper. Our achievement is two-fold. One is to present a face authentication algorithm with low memory requirement, which uses support vector machines (SVM) with the feature set extracted by genetic algorithms (GA). The other contribution is to suggest a method to reduce further, if needed, the amount of memory required in the authentication at the expense of verification rate by changing a controllable system parameter for a feature set size. Given a pre-defined amount of memory, this capability is quite effective to mount our algorithm on memory-constrained devices. The experimental results on various databases show that our face authentication algorithm with SVM whose input vectors consist of discriminating features extracted by GA has much better performance than the algorithm without feature selection process by GA has, in terms of accuracy and memory requirement. Experiment also shows that the number of the feature ttl be selected is controllable by a system parameter.

A Design of a Flash Memory Swapping File System using LFM (LFM 기법을 이용한 플래시 메모리 스와핑 파일 시스템 설계)

  • Han, Dae-Man;Koo, Yong-Wan
    • Journal of Internet Computing and Services
    • /
    • v.6 no.4
    • /
    • pp.47-58
    • /
    • 2005
  • There are two major type of flash memory products, namely, NAND-type and NOR-type flash memory. NOR-type flash memory is generally deployed as ROM BIOS code storage because if offers Byte I/O and fast read operation. However, NOR-type flash memory is more expensive than NAND-type flash memory in terms of the cost per byte ratio, and hence NAND type flash memory is more widely used as large data storage such as embedded Linux file systems. In this paper, we designed an efficient flash memory file system based an Embedded system and presented to make up for reduced to Swapping a weak System Performance to flash file system using NAND-type flash memory, then proposed Swapping algorithm insured to an Execution time. Based on Implementation and simulation studies, Then, We improved performance bases on NAND-type flash memory to the requirement of the embedded system.

  • PDF

CUDA based Lossless Asynchronous Compression of Ultra High Definition Game Scenes using DPCM-GR (DPCM-GR 방식을 이용한 CUDA 기반 초고해상도 게임 영상 무손실 비동기 압축)

  • Kim, Youngsik
    • Journal of Korea Game Society
    • /
    • v.14 no.6
    • /
    • pp.59-68
    • /
    • 2014
  • Memory bandwidth requirements of UHD (Ultra High Definition $4096{\times}2160$) game scenes have been much more increasing. This paper presents a lossless DPCM-GR based compression algorithm using CUDA for solving the memory bandwidth problem without sacrificing image quality, which is modified from DDPCM-GR [4] to support bit parallel pipelining. The memory bandwidth efficiency increases because of using the shared memory of CUDA. Various asynchronous transfer configurations which can overlap the kernel execution and data transfer between host and CUDA are implemented with the page-locked host memory. Experimental results show that the maximum 31.3 speedup is obtained according to CPU time. The maximum 30.3% decreases in the computation time among various configurations.