• Title/Summary/Keyword: Cache memory

Search Result 412, Processing Time 0.02 seconds

Accelerating Medical Image Processing on Integrated GPU Using OpenCL (OpenCL을 이용한 내장형 GPU에서의 의학영상처리 가속화)

  • Kim, Beom-Jun;Shin, Byeong-seok
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.2
    • /
    • pp.1-10
    • /
    • 2017
  • A variety of filters are applied to improve the quality of noise and low resolution medical images. This is necessary to reduce the radiation dose of the patient and to improve the utilization of the conventional spherical imaging equipment. In the conventional method, it is common to perform filtering using the CPU of the PC. However, it is difficult to produce results in real time by applying various calculations and filters to high-resolution human images using only the CPU performance of a PC used in a hospital. In this paper, we analyze the structure and performance of Intel integrated GPU in CPU and propose a method to perform image filtering using OpenCL parallel processing function. By applying complex filters with high computational complexity to medical images, high quality images can be generated in real time.

The Node Scheduling of Multi-Threaded Process for CC-NUMA System (CC-NUMA 시스템을 위한 다중 스레드 프로세스의 노드 스케줄링 설계 및 구현)

  • Kim, Jeong-Nyeo;Kim, Hae-Jin;Lee, Cheol-Hoon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.2
    • /
    • pp.488-496
    • /
    • 2000
  • this paper describes the design and implementation of node scheduling for MX Server that is CC-NUMA System COMSIX, the operating system of MX Server, is designed to suit for CC-NUMA Architecture. MX Server consists of up to 8 nodes, and each node is connected by SCI ring. This node scheduling scheme considers data locality for performance improvement of Oracle8i DBMS on the CC-NUMA architecture. For DBMS such as Oracle8i, a multi-threaded process may be run to tie on particular disk. We have developed a CG binding function that the multi-threaded process bound the node. Currently, We don't have an available CC-NUMA Platform. Instead of MX Server, we developed the Node scheduling scheme for multi-threaded process to suit server platform on the PC test-bed and tested completely.

  • PDF

A Study on the Circuit Design Method of CNTFET SRAM Considering Carbon Nanotube Density (탄소나노튜브 밀도를 고려한 CNTFET SRAM 디자인 방법에 관한 연구)

  • Cho, Geunho
    • Journal of IKEEE
    • /
    • v.25 no.3
    • /
    • pp.473-478
    • /
    • 2021
  • Although CNTFETs have attracted great attention due to their ability to increase semiconductor device performance by about 13 times, the commercialization of CNTFETs has been challenging because of the immature deposition process of CNTs. To overcome these difficulties, circuit design method considering the known limitations of the CNTFET manufacturing process is receiving increasing attention. SRAM is a major element constituting microprocessor and is regularly and repeatedly positioned in the cache memory; so, it has the advantage that CNTs can be more easily and densely deposited in SRAM than other circuit blocks. In order to take these advantages, this paper presents a circuit design method for SRAM cells considering CNT density and then evaluates its performance improvement using HSPICE simulation. As a result of simulation, it is found that when CNTFET is applied to SRAM, the gate width can be reduced by about 1.7 times and the read speed also can be improved by about 2 times when the CNT density was increased in the same gate width.

Design of Caching Scheme for Mobile Underground Geospatial Information Map System (모바일용 지하공간정보지도 관리 시스템에서 응답속도 향상을 위한 캐싱 기법)

  • Kim, Yong-Tae;Kouh, Hoon-Joon
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.1
    • /
    • pp.7-14
    • /
    • 2022
  • Unlike general maps, the underground geospatial Information is a system made to view underground information in a 3D shape. This system is managed by a tile maps to lighten the data. But there are various underground structures in the basement, and the structures are made of 3D data, so the data size is large. Therefore, when a client mobile program requests a tile map, the service server fetches the requested tile map from the DB server and transmits ti to the client, but there is a transmission delay time problem. In this paper, we design the tile cache method to improve the request response speed for the tile map data provided to the client in the mobile underground geospatial information system. We propose a method in which a service server predicts and prefetchs the next tile map while the client is viewing tile map, and stores the prefetching data in the memory of client mobile terminal. Then, the transmission delay time problem can be solved.

Low-Complexity Deeply Embedded CPU and SoC Implementation (낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현)

  • Park, Chester Sungchung;Park, Sungkyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.3
    • /
    • pp.699-707
    • /
    • 2016
  • This paper proposes a low-complexity central processing unit (CPU) that is suitable for deeply embedded systems, including Internet of things (IoT) applications. The core features a 16-bit instruction set architecture (ISA) that leads to high code density, as well as a multicycle architecture with a counter-based control unit and adder sharing that lead to a small hardware area. A co-processor, instruction cache, AMBA bus, internal SRAM, external memory, on-chip debugger (OCD), and peripheral I/Os are placed around the core to make a system-on-a-chip (SoC) platform. This platform is based on a modified Harvard architecture to facilitate memory access by reducing the number of access clock cycles. The SoC platform and CPU were simulated and verified at the C and the assembly levels, and FPGA prototyping with integrated logic analysis was carried out. The CPU was synthesized at the ASIC front-end gate netlist level using a $0.18{\mu}m$ digital CMOS technology with 1.8V supply, resulting in a gate count of merely 7700 at a 50MHz clock speed. The SoC platform was embedded in an FPGA on a miniature board and applied to deeply embedded IoT applications.

A Kernel Module to Support High-Performance Intra-Node Communication for Multi-Core Systems (멀티 코어 시스템을 위한 고속 노드내 통신 지원 모듈)

  • Jin, Hyun-Wook;Kang, Hyun-Goo;Kim, Jong-Soon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.407-415
    • /
    • 2007
  • In parallel cluster computing systems, the efficiency of communication between computing nodes is one of important factors that decide overall system performance. Accordingly, many researchers have studied on high-performance inter-node communication. The recently launched multi-core processor, however. increases the importance of intra-node communication as well because the more the number of cores in a node, the more the number of parallel processes running in the same node. Though there have been studies on intra-node communications, these have limited considerations on the state-of-the-art systems. In this paper, we propose a Linux kernel module that minimizes the number of data copy by exploiting the memory mapping mechanism for high-performance intra-node communication. The proposed kernel module supports the Linux kernel version 2.6. The performance measurements over a multi-core system present that the proposed kernel module can achieve lower latency up to 62% and higher throughput up to 144% than an existing kernel module approach. In addition, the measurements reveal that the performance of intra-node communication can vary significantly based on whether the cores that run the communication processes are belong to the same processor package (i.e., sharing the L2 cache).

The Bit-Map Trip Structure for Giga-Bit Forwarding Lookup in High-Speed Routers (고속 라우터의 기가비트 포워딩 검색을 위한 비트-맵 트라이 구조)

  • Oh, Seung-Hyun;Ahn, Jong-Suk
    • Journal of KIISE:Information Networking
    • /
    • v.28 no.2
    • /
    • pp.262-276
    • /
    • 2001
  • Recently much research for developing forwarding table that support fast router without employing both special hardware and new protocols. This article introduces a new forwarding data structure based on the software to enable forwarding lookup to be penormed at giga-bit speed. The forwarding table is known as a bottleneck of the routers penormance due to its high complexity proportional to the forwarding table size. The recent research that based on the software uses a Patricia trie and its variants. and also uses a hash function with prefix length key and others. The proposed forwarding table structure construct a forwarding table by the bit stream array in which it constructs trie from routing table prefix entries and it represents each pointer pointing the child node and the associated forwarding table entry with one bit The trie structure and routing prefix pointer need a large memory when representing those by linked-list or array. but in the proposed data structure, the needed memory size is small enough since it represents information with one bit. Additionally, by use a lookup method that start searching at desired middle level we can shorten the search path. The introduced data structure. called bit-map trie shows that we can implement a fast forwarding engine on the conventional Pentium processor by reducing the backbone routing table fits into Level 2 cache of Pentium II processor and shortens the searching path. Our experiments to evaluate the performance of proposed method show that this bit-map trie accomplishes 5.7 million lookups per second.

  • PDF

A Dynamic Transaction Routing Algorithm with Primary Copy Authority (주사본 권한을 이용한 동적 트랜잭션 분배 알고리즘)

  • Kim, Ki-Hyung;Cho, Hang-Rae;Nam, Young-Hwan
    • The KIPS Transactions:PartD
    • /
    • v.10D no.7
    • /
    • pp.1067-1076
    • /
    • 2003
  • Database sharing system (DSS) refers to a system for high performance transaction processing. In DSS, the processing nodes are locally coupled via a high speed network and share a common database at the disk level. Each node has a local memory and a separate copy of operating system. To reduce the number of disk accesses, the node caches database pages in its local memory buffer. In this paper, we propose a dynamic transaction routing algorithm to balance the load of each node in the DSS. The proposed algorithm is novel in the sense that it can support node-specific locality of reference by utilizing the primary copy authority assigned to each node; hence, it can achieve better cache hit ratios and thus fewer disk I/Os. Furthermore, the proposed algorithm avoids a specific node being overloaded by considering the current workload of each node. To evaluate the performance of the proposed algorithm, we develop a simulation model of the DSS, and then analyze the simulation results. The results show that the proposed algorithm outperforms the existing algorithms in the transaction processing rate. Especially the proposed algorithm shows better performance when the number of concurrently executed transactions is high and the data page access patterns of the transactions are not equally distributed.

A Study on the Digital Forensics Artifacts Collection and Analysis of Browser Extension-Based Crypto Wallet (브라우저 익스텐션 기반 암호화폐 지갑의 디지털 포렌식 아티팩트 수집 및 분석 연구)

  • Ju-eun Kim;Seung-hee Seo;Beong-jin Seok;Heoyn-su Byun;Chang-hoon Lee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.3
    • /
    • pp.471-485
    • /
    • 2023
  • Recently, due to the nature of blockchain that guarantees users' anonymity, more and more cases are being exploited for crimes such as illegal transactions. However, cryptocurrency is protected in cryptocurrency wallets, making it difficult to recover criminal funds. Therefore, this study acquires artifacts from the data and memory area of a local PC based on user behavior from four browser extension wallets (Metamask, Binance, Phantom, and Kaikas) to track and retrieve cryptocurrencies used in crime, and analyzes how to use them from a digital forensics perspective. As a result of the analysis, the type of wallet and cryptocurrency used by the suspect was confirmed through the API name obtained from the browser's cache data, and the URL and wallet address used for the remittance transaction were obtained. We also identified Client IDs that could identify devices used in cookie data, and confirmed that mnemonic code could be obtained from memory. Additionally, we propose an algorithm to measure the persistence of obtainable mnemonic code and automate acquisition.

An Application-Specific and Adaptive Power Management Technique for Portable Systems (휴대장치를 위한 응용프로그램 특성에 따른 적응형 전력관리 기법)

  • Egger, Bernhard;Lee, Jae-Jin;Shin, Heon-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.8
    • /
    • pp.367-376
    • /
    • 2007
  • In this paper, we introduce an application-specific and adaptive power management technique for portable systems that support dynamic voltage scaling (DVS). We exploit both the idle time of multitasking systems running soft real-time tasks as well as memory- or CPU-bound code regions. Detailed power and execution time profiles guide an adaptive power manager (APM) that is linked to the operating system. A post-pass optimizer marks candidate regions for DVS by inserting calls to the APM. At runtime, the APM monitors the CPU's performance counters to dynamically determine the affinity of the each marked region. for each region, the APM computes the optimal voltage and frequency setting in terms of energy consumption and switches the CPU to that setting during the execution of the region. Idle time is exploited by monitoring system idle time and switching to the energy-wise most economical setting without prolonging execution. We show that our method is most effective for periodic workloads such as video or audio decoding. We have implemented our method in a multitasking operating system (Microsoft Windows CE) running on an Intel XScale-processor. We achieved up to 9% of total system power savings over the standard power management policy that puts the CPU in a low Power mode during idle periods.