• Title/Summary/Keyword: 인 메모리

Search Result 195, Processing Time 0.023 seconds

TP-Sim: A Trace-driven Processing-in-Memory Simulator (TP-Sim: 트레이스 기반의 프로세싱 인 메모리 시뮬레이터)

  • Jeonggeun Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.78-83
    • /
    • 2023
  • This paper proposes a lightweight trace-driven Processing-In-Memory (PIM) simulator, TP-Sim. TP-Sim is a General Purpose PIM (GP-PIM) simulator that evaluates various PIM system performance-related metrics. Based on instruction and memory traces extracted from the Intel Pin tool, TP-Sim can replay trace files for multiple models of PIM architectures to compare its performance. To verify the availability of TP-Sim, we estimated three different system configurations on the STREAM benchmark. Compared to the traditional Host CPU-only systems with conventional memory hierarchy, simple GP-PIM architecture achieved better performance; even the Host CPU has the same number of in-order cores. For further study, we also extend TP-Sim as a part of a heterogeneous system simulator that contains CPU, GPGPU, and PIM as its primary and co-processors.

  • PDF

A method of Securing Mass Storage for SQL Server by Sharing Network Disks - on the Amazon EC2 Windows Environments - (네트워크 디스크를 공유하여 SQL 서버의 대용량 스토리지 확보 방법 - Amazon EC2 Windows 환경에서 -)

  • Kang, Sungwook;Choi, Jungsun;Choi, Jaeyoung
    • Journal of Internet Computing and Services
    • /
    • v.17 no.2
    • /
    • pp.1-9
    • /
    • 2016
  • Users are provided infrastructure such as CPU, memory, network, and storage as IaaS (Infrastructure as a Service) service on cloud computing environments. However storage instances cannot support the maximum storage capacity that SQL servers can use, because the capacity of instances provided by service providers is usually limited. In this paper, we propose a method of securing mass storage capacity for SQL servers by sharing network disks with limited storage capacity. We confirmed through experiments that it is possible to secure mass storage capacity, which exceeds the maximum storage capacity provided by an instance with Amazon EBS on Amazon EC2 Windows environments, and it is possible to improve the overall performance of the SQL servers by increasing the disk capacity and performance.

Real-Time Compressed Video Acquisition System for Stereo 360 VR (Stereo 360 VR을 위한 실시간 압축 영상 획득 시스템)

  • Choi, Minsu;Paik, Joonki
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.965-973
    • /
    • 2019
  • In this paper, Stereo 4K@60fps 360 VR real-time video capture system which consists of video stream capture, video encoding and stitching module is been designed. The system captures stereo 4K@60fps 360 VR video by stitching 6 of 2K@60fps stream which are captured through HDMI interface from 6 cameras in real-time. In video capture phase, video is captured from each camera using multi-thread in real-time. In video encoding phase, raw frame memory transmission and parallel encoding are used to reduce the resource usage in data transmission between video capture and video stitching modules. In video stitching phase, Real-time stitching is secured by stitching calibration preprocessing.

Searching Algorithms for Protein Sequences and Weighted Strings (단백질 시퀀스와 가중치 스트링에 대한 탐색 알고리즘)

  • Kim, Sung-Kwon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.8
    • /
    • pp.456-462
    • /
    • 2002
  • We are developing searching algorithms for weighted strings such as protein sequences. Let${\sum}$ be an alphabet and for each $a{\in}{\sum}$ its weight ${\mu}(a)$ is given. Given a string $A=a_1a_2…a_n\; with each ai{\in}{\sum}$, a substring<$A(i.j)=a_ia_{i+1}…a_j$ has weight ${\in}(A(i.j))={\in}(a_i)+{\in}(a_i+1)+…+{\in}(a_j)$.The problem we are dealing with is to preprocess A to build a searching structure, and later, given a query weight M, the structure is used to answer the question of whether there is a substring A(i,j) such that$M={\in}(A(i,j))$.In this paper an algorithm that improves over the previous result will be presented. The previously best known algorithm answers a query in $0(\frac{nlog\;logn}{log\; n})$time using a searching structure that requires O(n) amount of memory. Our algorithm reduces the memory requirement to $0(\frac{n}{log\; n})$ while achieving the same query answer time.

Characteristic of the Class Library for Embedded Java System (내장형 자바 시스템을 위한 클래스 라이브러리의 특성)

  • 양희재
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.4
    • /
    • pp.788-797
    • /
    • 2003
  • Class library is one of the most crucial element of Java runtime environment in addition to Java virtual machine. In particular, embedded Java system depends heavily on the class library due to having a low bandwidth communication link and a small amount of memory which are a common restriction of embedded system. It is therefore quite necessary to find the characteristic of the class library for embedded Java system to build an efficient Java runtime environment. In this paper we have analyzed the characteristic of the class library for embedded system. The analysis includes sorts of classes in the library, typical size of the file which contains the class, and the composition of constant pool which is a major part of the file. We also have found typical number of field and method a class contains, the sizes of stack and local variable array each method requires, and the length of bytecode in the method. The result of this study can be used to estimate the startup time for class loading and the size of memory to create an instance of class which are a mandatory information to design an efficient embedded Java virtual machine.

Real-time Implementation of a GSM-EFR Speech Coder on a 16 Bit Fixed-point DSP (16 비트 고정 소수점 DSP를 이용한 GSM-EFR 음성 부호화기의 실시간 구현)

  • 최민석;변경진;김경수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.7
    • /
    • pp.42-47
    • /
    • 2000
  • This paper describes a real-time implementation of a GSM-EFR (Global System for Mobil communications Enhanced Full Rate) speech coder using OakDSP core; a 16bit fixed-point Digital Signal Processor (DSP) by DSP Group, Inc. The real-time implemented speech coder required about 24MIPS for computation and 7.06K words and 12.19K words for code and data memory, respectively. The implemented GSM-EFR speech coder passes all of test vectors provided by ETSI (European Telecommunication Standard Institute), and perceptual speech quality measurement using MNB algorithm shows that the quality of the GSM-EFR speech coder is similar to the one of 32kbps ADPCM. The real-time implemented GSM-EFR speech coder which is the highest bit-rate mode of the GSM-AMR speech coder will be used as the basic structure of the GSM-AMR speech coder which is embedded in MODEM ASIC of IMT2000 asynchronous mode mobile station.

  • PDF

Performance Analysis on Various Design Issues of Turbo Decoder (다양한 Design Issue에 대한 터보 디코더의 성능분석)

  • Park Taegeun;Kim Kiwhan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.12A
    • /
    • pp.1387-1395
    • /
    • 2004
  • Turbo decoder inherently requires large memory and intensive hardware complexity due to iterative decoding, despite of excellent decoding efficiency. To decrease the memory space and reduce hardware complexity, various design issues have to be discussed. In this paper, various design issues on Turbo decoder are investigated and the tradeoffs between the hardware complexity and the performance are analyzed. Through the various simulations on the fixed-length analysis, we decided 5-bits for the received data, 6-bits for a priori information, and 7-bits for the quantization state metric, so the performance gets close to that of infinite precision. The MAX operation which is the main function of Log-MAP decoding algorithm is analyzed and the error correction term for MAX* operation can be efficiently implemented with very small hardware overhead. The size of the sliding window was decided as 32 to reduce the state metric memory space and to achieve an acceptable BER.

(Turbo Decoder Design with Sliding Window Log Map for 3G W-CDMA) (3세대 이동통신에 적합한 슬라이딩 윈도우 로그 맵 터보 디코더 설계)

  • Park, Tae-Gen;Kim, Ki-Hwan
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.9 s.339
    • /
    • pp.73-80
    • /
    • 2005
  • The Turbo decoders based on Log-MAP decoding algorithm inherently requires large amount of memory and intensive complexity of hardware due to iterative decoding, despite of excellent decoding efficiency. To decrease the large amount of memory and reduce hardware complexity, the result of previous research. And this paper design the Turbo decoder applicable to the 3G W-CDMA systems. Through the result of previous research, we decided 5-bits for the received data 6-bits for a priori information, and 7-bits for the quantization state metrics. The error correction term for $MAX^{*}$ operation which is the main function of Log-MAP decoding algorithm is implemented with very small hardware overhead. The proposed Turbo decoder is synthesized in $0.35\mu$m Hynix CMOS technology. The synthesized result for the Turbo decoder shows that it supports a maximum 9Mbps data rate, and a BER of $10^{-6}$ is achieved(Eb/No=1.0dB, 5 iterations, and the interleaver size $\geq$ 2000).

Design and Performance Evaluation of MIN for Nonuniform Traffic (비균등 트래픽을 위한 MIN의 설계 및 성능 평가)

  • Choe, Chang-Hun;Kim, Seong-Cheon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.6
    • /
    • pp.1-9
    • /
    • 2000
  • This paper presents a Cluster Oriented Multistage Interconnection Network called COMR. COMR can be constructed suitable for the parallel application with localized communication by providing the shortcut path inside the processor-memory cluster which has frequent data communication. We evaluate the performance of COMR with respect to probability of acceptance, bandwidth, cost-effectiveness and average distance under varying degrees of localized communication. According to the result of analysis for performance evaluation, COMR shows higher performance than the regular MINs of the same network size in the highly localized communication. In the worst case, the diameter of an N$\times$N COMR is only n+1 which has only one stage more as compared the MIN with the same network size. Therefore COMR can be used as an attractive interconnection network for parallel applications with not only the localized communication distribution but also the uniform distribution in shared-memory multiprocessor system.

  • PDF

Design of Viterbi Decoders Using a Modified Register Exchange Method (변형된 레지스터 교환 방식의 비터비 디코더 설계)

  • 이찬호;노승효
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.1
    • /
    • pp.36-44
    • /
    • 2003
  • This paper proposes a Viterbi decoding scheme without trace-back operations to reduce the amount of memory storing the survivor path information, and to increase the decoding speed. The proposed decoding scheme is a modified register exchange scheme, and is verified by a simulation to give the same results as those of the conventional decoders. It is compared with the conventional decoding schemes such as the trace-back and the register exchange scheme. The memory size of the proposed scheme is reduced to 1/(5 x constraint length) of that of the register exchange scheme, and the throughput is doubled compared with that of the trace-back scheme. A decoder with a code rate of 2/3, a constraint length, K=3 and a trace-back depth of 15 is designed using VHDL and implemented in an FPGA. It is also shown that the modified register exchange scheme can be applied to a block decoding scheme.