• Title/Summary/Keyword: 이종 메모리

Search Result 124, Processing Time 0.021 seconds

Improvement of Address Pointer Assignment in DSP Code Generation (DSP용 코드 생성에서 주소 포인터 할당 성능 향상 기법)

  • Lee, Hee-Jin;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.1
    • /
    • pp.37-47
    • /
    • 2008
  • Exploitation of address generation units which are typically provided in DSPs plays an important role in DSP code generation since that perform fast address computation in parallel to the central data path. Offset assignment is optimization of memory layout for program variables by taking advantage of the capabilities of address generation units, consists of memory layout generation and address pointer assignment steps. In this paper, we propose an effective address pointer assignment method to minimize the number of address calculation instructions in DSP code generation. The proposed approach reduces the time complexity of a conventional address pointer assignment algorithm with fixed memory layouts by using minimum cost-nodes breaking. In order to contract memory size and processing time, we employ a powerful pruning technique. Moreover our proposed approach improves the initial solution iteratively by changing the memory layout for each iteration because the memory layout affects the result of the address pointer assignment algorithm. We applied the proposed approach to about 3,000 sequences of the OffsetStone benchmarks to demonstrate the effectiveness of the our approach. Experimental results with benchmarks show an average improvement of 25.9% in the address codes over previous works.

An Adaptive Buffer Tuning Mechanism for striped transport layer connection on multi-homed mobile host (멀티홈 모바일 호스트상에서 스트라이핑 전송계층 연결을 위한 적응형 버퍼튜닝기법)

  • Khan, Faraz-Idris;Huh, Eui-Nam
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.199-211
    • /
    • 2009
  • Recent advancements in wireless networks have enabled support for mobile applications to transfer data over heterogeneous wireless paths in parallel using data striping technique [2]. Traditionally, high performance data transfer requires tuning of multiple TCP sockets, at sender's end, based on bandwidth delay product (BDP). Moreover, traditional techniques like Automatic TCP Buffer Tuning (ATBT), which balance memory and fulfill network demand, is designed for wired infrastructure assuming single flow on a single socket. Hence, in this paper we propose a buffer tuning technique at senders end designed to ensure high performance data transfer by striping data at transport layer across heterogeneous wireless paths. Our mechanism has the capability to become a resource management system for transport layer connections running on multi-homed mobile host supporting features for wireless link i.e. mobility, bandwidth fluctuations, link level losses. We show that our proposed mechanism performs better than ATBT, in efficiently utilizing memory and achieving aggregate throughput.

  • PDF

Design and Simulation of ARM Processor using VHDL (VHDL을 이용한 ARM 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.5
    • /
    • pp.229-235
    • /
    • 2018
  • As of in the year of 2016, 40 million ARM processors are being shipped everyday and more than 86 billion ARM processors are mounted in mobile communications, consumer electronics, enterprises, and embedded systems. Nationally, we are capable of designing high-end memory semiconductors, but not in processors, resulting in unbalance. Generally, highly expensive software programs are necessary for designing processors which makes it difficult to set up proper environments. However, ModelSim simulator provided by Altera is free and everybody can use it. In this paper, the VHDL language which is widely used in Europe, universities, and research centers around the world for the ASIC design is selected for designing 32-bit ARM processor and simulated by ModelSim. As a result, 37 instructions of ARMv4 has been successfully executed.

Design and Simulation of ARM Processor with Interrupts (인터럽트 기능을 갖는 ARM 프로세서의 설계 및 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.6
    • /
    • pp.183-189
    • /
    • 2019
  • Despite its low cost, ARM is widely used in smartphones, digital cameras, home network devices, and wireless technologies because of its low power consumption and reliable performance. The domestic memory semiconductor design technology is in the world's highest level, but that of the processor is far less than that, which results in the technology unbalance between the memory and the processor. When designing a processor, exception and interrupt capabilities are requisite, but this is often omitted in the research stage. However, exception processing and interrupts must be included in order for the processor to function fully. In this paper, we design a 32-bit ARMv4 family of processors with exception handling and interrupts using VHDL and verify with ModelSim. As a result, ARM's exception and interrupts are successfully performed.

Implementation and Performance Evaluation of a Software-based DSM Sytem for a Windows-NT Workstations Cluster (Windows-NT 워크스테이션 클러스터를위한 소프트웨어 기반 분산 공유 메모리 시스템의 구현 및 성능 평가)

  • Lee, Jong-U
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.2
    • /
    • pp.176-184
    • /
    • 1999
  • 지금까지의 소프트웨어 기반 분산 공유 메모리(이하 DSM이라 칭함)시스템은 유닉스 워크스테이션 클러스터를 목표로 하는 것이 대부분이었다. 그러나 현재 Windows-NT 는 서버급 시스템과 PC 모두를 위한 운영체제로서 유닉스와 더불어 널리 사용되고 있는 실정이다. 본 논문에서는 Windows-NT 워크스테이션 클러스터 환경을 위한 DSM 시스템을 구현하고, 구현된 DSM 시스템의 성능 평가 결과를 제시한다. 구현된 DSM 시스템은 Win32 API와 표준 실행-시간 라이브러리를 이용해 구현되었기 때문에 모든 Windows-NT 워크스테이션에서 실행 가능하며 , 프로그래머는 몇 라인의 코드 추가만으로 DSM 시스템 상에서 수행되는 병렬 응용 프로그램을 작성할 수 있다. 워크스테이션 간의 상호연결망으로 범용성을 위해 이더넷 LAN을 지원하였고, 아울러 성능 향상을 위해 기가비트 SAN(System Area Network)도 지원하였다. 기가비트 SAN을 위한 하드웨어로는 Dolphin 사의 PCI-SCI 타입 제품인 Clustar를 사용하였다. 우리는 성능 평가를 통해, 구현된 DSM 시스템이 정확히 동작함은 물론 확장성이 뛰어나다는 것을 확인하였다. 특히 , 기가비트 SAN을 사용할 경우 일부 병렬 벤치 마크 프로그램에서는 노드 수 증가에 따라 성능이 거의 선형적으로 향상된다는 것을 알 수 있었다. 본 논문이 기여하는 바는 Windows-NT 기반 소프트웨어 DSM 시스템의 원천 기술을 확보함으로써 향후 Windows-NT 워크스테이션 클러스터 환경에서의 분산 및 병렬 처리 연구에 도움을 줄 수 있다는 점이다.

FPGA Implementation and Verification of A Pipelined 32-bit ARM Processor (파이프라인 방식의 32 비트 ARM 프로세서에 대한 FPGA 구현 및 검증)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.105-110
    • /
    • 2022
  • Domestically, we are capable of designing high-end memory semiconductors, but not in processors, resulting in unbalance. Using Vivado as a development enivronment and implementing the processor on a Xilinx FPGA reduces time and cost dramatically. In this paper, the popular language VHDL which is widely used in Europe, universities, and research centers around the world for the digital system design is used for designing a pipelined 32-bit ARM processor, implemented on FPGA and verified by Integrated Logic Analyzer. As a result, the ARM processor implemented on FPGA could execute ARM instructions successfully.

CTIS: Cross-platform Tester Interface Software for Memory Semiconductor (메모리 반도체 검사 장비 인터페이스를 위한 크로스플랫폼 소프트웨어 기술)

  • Kim, Dong Su;Kang, Dong Hyun;Lee, Eun Seok;Lee, Kyu Sung;Eom, Young Ik
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.10
    • /
    • pp.645-650
    • /
    • 2015
  • Tester Interface Software (TIS) provides all software functions that are necessary for a testing device to perform the test process on a memory semiconductor package from the time the device is put into the test equipment until the device is discharged from the equipment. TIS should perform the same work over all types of equipment regardless of their tester models. However, TIS has been developed and managed independently of the tester models because there are various equipment and computer models that are used in the test process. Therefore, more maintenance, time and cost are required for development, which adversely affects the quality of the software, and the problem becomes more serious when the new tester model is introduced. In this paper, we propose the Cross-platform Tester Interface Software (CTIS) framework, which can be integrated and operated on heterogeneous equipment and OSs.

Comparison of Parallel Computation Performances for 3D Wave Propagation Modeling using a Xeon Phi x200 Processor (제온 파이 x200 프로세서를 이용한 3차원 음향 파동 전파 모델링 병렬 연산 성능 비교)

  • Lee, Jongwoo;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.21 no.4
    • /
    • pp.213-219
    • /
    • 2018
  • In this study, we simulated 3D wave propagation modeling using a Xeon Phi x200 processor and compared the parallel computation performance with that using a Xeon CPU. Unlike the 1st generation Xeon Phi coprocessor codenamed Knights Corner, the 2nd generation x200 Xeon Phi processor requires no additional communication between the internal memory and the main memory since it can run an operating system directly. The Xeon Phi x200 processor can run large-scale computation independently, with the large main memory and the high-bandwidth memory. For comparison of parallel computation, we performed the modeling using the MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) libraries. Numerical examples using the SEG/EAGE salt model demonstrated that we can achieve 2.69 to 3.24 times faster modeling performance using the Xeon Phi with a large number of computational cores and high-bandwidth memory compared to that using the 12-core CPU.

Chip Implementation of 830-Mb/s/pin Transceiver for LPDDR2 Memory Controller (LPDDR2 메모리 컨트롤러를 위한 830-Mb/s/pin 송수신기 칩 구현)

  • Jong-Hyeok, Lee;Chang-Min, Song;Young-Chan, Jang
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.659-670
    • /
    • 2022
  • An 830-Mb/s/pin transceiver for a controller supporting ×32 LPDDR2 memory is designed. The transmitter consists of eight unit circuits has an impedance in the range of 34Ω ∽ 240Ω, and its impedance is controlled by an impedance correction circuit. The transmitted DQS signal has a phase shifted by 90° compared to the DQ signals. In the receive operation, the read time calibration is performed by per-pin skew calibration and clock-domain crossing within a byte. The implemented transceiver for the LPDDR2 memory controller is designed by using a 55-nm process using a 1.2V supply voltage and has a maximum signal transmission rate of 830 Mb/s/pin. The area and power consumption of each lane are 0.664 mm2 and 22.3 mW, respectively.

Cache Sensitive T-tree Main Memory Index for Range Query Search (범위질의 검색을 위한 캐시적응 T-트리 주기억장치 색인구조)

  • Choi, Sang-Jun;Lee, Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.10
    • /
    • pp.1374-1385
    • /
    • 2009
  • Recently, advances in speed of the CPU have for out-paced advances in memory speed. Main-memory access is increasingly a performance bottleneck for main-memory database systems. To reduce memory access speed, cache memory have incorporated in the memory subsystem. However cache memories can reduce the memory speed only when the requested data is found in the cache. We propose a new cache sensitive T-tree index structure called as $CST^*$-tree for range query search. The $CST^*$-tree reduces the number of cache miss occurrences by loading the reduced internal nodes that do not have index entries. And it supports the sequential access of index entries for range query by connecting adjacent terminal nodes and internal index nodes. For performance evaluation, we have developed a cost model, and compared our $CST^*$-tree with existing CST-tree, that is the conventional cache sensitive T-tree, and $T^*$-tree, that is conventional the range query search T -tree, by using the cost model. The results indicate that cache miss occurrence of $CST^*$-tree is decreased by 20~30% over that of CST-tree in a single value search, and it is decreased by 10~20% over that of $T^*$-tree in a range query search.

  • PDF