• Title/Summary/Keyword: CPU 시간

Search Result 518, Processing Time 0.025 seconds

A channel Routing System using CMOS Standard Cell Library (CMOS 표준 Cell Library를 이용하는 수평 트랙 배선 시스템)

  • 정태성;경종민
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.22 no.1
    • /
    • pp.68-74
    • /
    • 1985
  • In this Paper, we present a non-doglegging channel routing system for If layout using standard cells. This system produces a final two-layer wiring pattern in the horizontal track between two rows, each of which is a linear placement of standard cells of identical heights, satisfying the given net list specification. The layout of CMOS cell library Including nine primitive cells used in this paper is represented in CIF (Caltech Intermediate Form) using λ(Lambda) of 2 microns in Mead-Conway layout representation scheme. The cell dimension and 1/0 characteristics such as name, position and layer type of the pins are stored in Component Library to be used in the channel routing progranl, CROUT. 4 subprogram, NET-PLOT, was used to report a schemdtic layout result, and another subprogram, NETCIF was used to with a full-fledged final layout representation in GIF, A test run for realizing a dynamicmaster-slave D flip-flop with set/reset using primitive cells was shown to take 4 CPU seconds on VAX 11/780.

  • PDF

Fast Non-Adjacent Form (NAF) Conversion through a Bit-Stream Scan (비트열 스캔을 통한 고속의 Non-Adjacent Form (NAF) 변환)

  • Hwang, Doo-Hee;Shin, Jin-Myeong;Choi, Yoon-Ho
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.537-544
    • /
    • 2017
  • As a special form of the signed-digit representation, the NAF(non-adjacent form) minimizes the hamming weight by reducing the average density of the non-zero bits from the binary representation of the positive integer k. Due to this advantage, the NAF is used in various fields; in particular, it is actively used in cryptology. The existing NAF-conversion algorithm, however, is problematic because the conversion speed decreases when the LSB(least significant bit) frequently becomes "1" during the binary positive integer conversion process. This paper suggests a method for the improvement of the NAF-conversion speed for which the problems that occur in the existing NAF-conversion process are solved. To verify the performance improvement of the algorithm, the CPU cycle for the various inputs were measured on the ATmega128, a low-performance 8-bit microprocessor. The results of this study show that, compared with the existing algorithm, the suggested algorithm not only improved the processing speed of the major patterns by 20% or more on average, but it also reduced the NAF-conversion time by 13% or more.

Neural networks optimization for multi-dimensional digital signal processing in IoT devices (IoT 디바이스에서 다차원 디지털 신호 처리를 위한 신경망 최적화)

  • Choi, KwonTaeg
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1165-1173
    • /
    • 2017
  • Deep learning method, which is one of the most famous machine learning algorithms, has proven its applicability in various applications and is widely used in digital signal processing. However, it is difficult to apply deep learning technology to IoT devices with limited CPU performance and memory capacity, because a large number of training samples requires a lot of memory and computation time. In particular, if the Arduino with a very small memory capacity of 2K to 8K, is used, there are many limitations in implementing the algorithm. In this paper, we propose a method to optimize the ELM algorithm, which is proved to be accurate and efficient in various fields, on Arduino board. Experiments have shown that multi-class learning is possible up to 15-dimensional data on Arduino UNO with memory capacity of 2KB and possible up to 42-dimensional data on Arduino MEGA with memory capacity of 8KB. To evaluate the experiment, we proved the effectiveness of the proposed algorithm using the data sets generated using gaussian mixture modeling and the public UCI data sets.

A New Overlap Save Algorithm for Fast Convolution (고속 컨벌루션을 위한 새로운 중첩보류기법)

  • Kuk, Jung-Gap;Cho, Nam-Ik
    • Journal of Broadcast Engineering
    • /
    • v.14 no.5
    • /
    • pp.543-550
    • /
    • 2009
  • The most widely used block convolution method is the overlap save algorithm (OSA), where a block of M data to be convolved with a filter is concatenated with the previous block and 2M-point FFT and multiplications are performed for this overlapped block. By discarding half of the results, we obtain linear convolution results from the circular convolution. This paper proposes a new transform which reduces the block size to only M for the block convolution. The proposed transform can be implemented as the M multiplications followed by M-point FFT Hence, existing efficient FFT libraries and hardware can be exploited for the implementation of proposed method. Since the required transform size is half that of the conventional method, the overall computational complexity is reduced. Also the reduced transform size results in the reduction of data access time and cash miss-hit ratio, and thus the overall CPU time is reduced. Experiments show that the proposed method requires less computation time than the conventional OSA.

Efficient Maximum Intensity Projection using SIMD Instruction and Streaming Memory Transfer (단일 명령 복수 데이터 연산과 순차적 메모리 참조를 이용한 효율적인 최대 휘소 투영 볼륨 가시화)

  • Kye, Hee-Won
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.4
    • /
    • pp.512-520
    • /
    • 2009
  • Maximum intensity projection (MIP) is a volume rendering method which extracts maximum values along the viewing direction through volume data. It visualizes high-density structures, such as angio-graphic datasets so that it is frequently used in medical imaging systems. We have proposed an efficient two-step MIP acceleration method that uses the recent CPUs. First, we exploited SIMD instructions to reduce conditional branch instructions which take up a considerable part of whole rendering process, so that we improved rendering speed. Second, we proposed a new method, which accesses volume and image data successively by modifying the shear-warp rendering. This method improves memory access patterns so that cache misses are reduced. Using the current CPUs, our method improved the rendering speed by a factor of 7 than that of the shear-warp rendering.

  • PDF

Cross Compressed Replication Scheme for Large-Volume Column Storages (대용량 컬럼 저장소를 위한 교차 압축 이중화 기법)

  • Byun, Siwoo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.5
    • /
    • pp.2449-2456
    • /
    • 2013
  • The column-oriented database storage is a very advanced model for large-volume data analysis systems because of its superior I/O performance. Traditional data storages exploit row-oriented storage where the attributes of a record are placed contiguously in hard disk for fast write operations. However, for search-mostly datawarehouse systems, column-oriented storage has become a more proper model because of its superior read performance. Recently, solid state drive using MLC flash memory is largely recognized as the preferred storage media for high-speed data analysis systems. In this paper, we introduce fast column-oriented data storage model and then propose a new storage management scheme using a cross compressed replication for the high-speed column-oriented datawarehouse system. Our storage management scheme which is based on two MLC SSD achieves superior performance and reliability by the cross replication of the uncompressed segment and the compressed segment under high workloads of CPU and I/O. Based on the results of the performance evaluation, we conclude that our storage management scheme outperforms the traditional scheme in the respect of update throughput and response time of the column segments.

Grid Acceleration Structure for Efficiently Tracing the Secondary Rays in Dynamic Scenes on Mobile Platforms (모바일 환경에서의 동적 장면의 효율적인 이차 광선 추적을 위한 격자 가속 구조)

  • Seo, Woong;Choi, Byeongjun;Ihm, Insung
    • Journal of KIISE
    • /
    • v.44 no.6
    • /
    • pp.573-580
    • /
    • 2017
  • Despite the recent remarkable advances in the computing power of mobile devices, the heat and battery problems still restrict their performances, particularly compared to PCs. Therefore, in the application of the ray-tracing technique for high-quality rendering, the consideration of a method that traces only the secondary rays while the effects of the primary rays are generated through rasterization-based OpenGL ES rendering is worthwhile. Given that most of the rendering time is for the secondary-ray processing in such a method, a new volume-grid technique for dynamic scenes that enhances the tracing performance of the secondary rays with a low coherence is proposed here. The proposed method attempts to model all of the possible spatial secondary rays in a fixed number of sampling rays, thereby alleviating the visitation problem regarding all of the cells along the ray in a uniform grid. Also, a hybrid rendering pipeline that speeds up the overall rendering performance by exploiting the mobile-device CPU and GPU is presented.

Performance Comparison between Haskell Eval Monad and Cloud Haskell (Haskell Eval 모나드와 Cloud Haskell 간의 성능 비교)

  • Kim, Yeoneo;An, Hyungjun;Byun, Sugwoo;Woo, Gyun
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.791-802
    • /
    • 2017
  • Competition in the modern CPU market has shifted from speeding up the clock speed of a single core to increasing the number of cores. As such, there is a growing interest in parallel programming to maximize the use of resources of many core processors. In this paper, we propose parallel programming models in Haskell to find an advisable parallel programming model for many-core environments. Specifically, we used Eval monad and Cloud Haskell to develop two versions of parallel programs: plagiarism detection and K-means. Then, we evaluated the performance of the developed programs in 32-core and 120-core environments. The results of our experiment show that the Eval monad is highly efficient in an environment with a small number of cores. On the other hand, the Cloud Haskell runtime shows 37% improvement over Eval monad and the scalability shows a 134% improvement over Eval monad as the number of cores increases.

Optimization of Graph Processing based on In-Storage Processing (스토리지 내 프로세싱 방식을 사용한 그래프 프로세싱의 최적화 방법)

  • Song, Nae Young;Han, Hyuck;Yeom, Heon Young
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.473-480
    • /
    • 2017
  • In recent years, semiconductor-based storage devices such as flash memory (SSDs) have been developed to high performance. In addition, a trend has been observed of optimally utilizing resources such as the central processing unit (CPU) and memory of the internal controller in the storage device according to the needs of the application. This concept is called In-Storage Processing (ISP). In a storage device equipped with the ISP function, it is possible to process part of the operation executed on the host system, thus reducing the load on the host. Moreover, since the data is processed in the storage device, the data transferred to the host are reduced. In this paper, we propose a method to optimize graph query processing by utilizing these ISP functions, and show that the optimized graph processing method improves the performance of the graph 500 benchmark by up to 20%.

Automatic Virtual Platform Generation for Fast SoC Verification (고속 SoC 검증을 위한 자동 가상 플랫폼 생성)

  • Jung, Jun-Mo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.5
    • /
    • pp.1139-1144
    • /
    • 2008
  • In this paper, we propose an automatic generation method of transaction level(TL) model from algorithmic model to verify system specification fast and effectively using virtual platform. The TL virtual platform including structural properties such as timing, synchronization and real-time is one of the effective verification frameworks. However, whenever change system specification or HW/SW mapping, we must rebuild virtual platform and additional design/verification time is required. And the manual description is very time-consuming and error-prone process. To solve these problems, we build TL library which consists of basic components of virtual platform such as CPU, memory, timer. We developed a set of design/verification tools in order to generate a virtual platform automatically. Our tools generate a virtual platform which consists of embedded real-time operating system (RTOS) and hardware components from an algorithmic modeling. And for communication between HW and SW, memory map and device drivers are generated. The effectiveness of our proposed framework has been successfully verified with a Joint Photographic Expert Group (JPEG) and H.264 algorithm. We claim that our approach enables us to generate an application specific virtual platform $100x{\tims}1000x$ faster than manual designs. Also, we can refine an initial platform incrementally to find a better HW/SW mapping. Furthermore, application software can be concurrently designed and optimized as well as RTOS by the generated virtual platform