• Title/Summary/Keyword: CPU 시간

Search Result 518, Processing Time 0.024 seconds

Basic Study on Performance Comparison of Structural Optimization Software Systems (구조최적설계 소프트웨어의 성능 비교에 대한 기초연구)

  • Choi, Wook Han;Huang, Cheng Guo;Park, Gyung-Jin;Kim, Tai-Kyung
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.38 no.12
    • /
    • pp.1403-1413
    • /
    • 2014
  • Structural optimization is widely accepted in industrial fields. Structural optimization pursues improved performance of the structures. Recently, structural optimization is actively utilized due to the well-developed commercial design software systems. Three popular commercial structural optimization systems are investigated and compared. They are MSC.Nastran, Genesis and OptiStruct. The performance of the systems is analyzed based on the quality of the optimum solution and the computational time. Linear static response size, shape and topology optimizations are explored and compared with some test examples. For fair comparison, the systems are run in the same environment and the optimization parameters affecting the performance are unified. The optimization results are analyzed and the performances and characteristics of each software system are discussed.

An Efficient Hardware Implementation of CABAC Using H/W-S/W Co-design (H/W-S/W 병행설계를 이용한 CABAC의 효율적인 하드웨어 구현)

  • Cho, Young-Ju;Ko, Hyung-Hwa
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.6
    • /
    • pp.600-608
    • /
    • 2014
  • In this paper, CABAC H/W module is developed using co-design method. After entire H.264/AVC encoder was developed with C using reference SW(JM), CABAC H/W IP is developed as a block in H.264/AVC encoder. Context modeller of CABAC is included on the hardware to update the changed value during binary encoding, which enables the efficient usage of memory and the efficient design of I/O stream. Hardware IP is co-operated with the reference software JM of H.264/AVC, and executed on Virtex-4 FX60 FPGA on ML410 board. Functional simulation is done using Modelsim. Compared with existing H/W module of CABAC with register-level design, the development time is reduced greatly and software engineer can design H/W module more easily. As a result, the used amount of slice in CABAC is less than 1/3 of that of CAVLC module. The proposed co-design method is useful to provide hardware accelerator in need of speed-up of high efficient video encoder in embedded system.

A Study of Purity-based Page Allocation Scheme for Flash Memory File Systems (플래시 메모리 파일 시스템을 위한 순수도 기반 페이지 할당 기법에 대한 연구)

  • Baek, Seung-Jae;Choi, Jong-Moo
    • The KIPS Transactions:PartA
    • /
    • v.13A no.5 s.102
    • /
    • pp.387-398
    • /
    • 2006
  • In this paper, we propose a new page allocation scheme for flash memory file system. The proposed scheme allocates pages by exploiting the concept of Purity, which is defined as the fraction of blocks where valid Pages and invalid Pages are coexisted. The Pity determines the cost of block cleaning, that is, the portion of pages to be copied and blocks to be erased for block cleaning. To enhance the purity, the scheme classifies hot-modified data and cold-modified data and allocates them into different blocks. The hot/cold classification is based on both static properties such as attribute of data and dynamic properties such as the frequency of modifications. We have implemented the proposed scheme in YAFFS and evaluated its performance on the embedded board equipped with 400MHz XScale CPU, 64MB SDRAM, and 64MB NAND flash memory. Performance measurements have shown that the proposed scheme can reduce block cleaning time by up to 15.4 seconds with an average of 7.8 seconds compared to the typical YAFFS. Also, the enhancement becomes bigger as the utilization of flash memory increases.

A Parallel Bulk Loading Method for $B^+$-Tree Using CUDA (CUDA를 활용한 병렬 $B^+$-트리 벌크로드 기법)

  • Sung, Joo-Ho;Lee, Yoon-Woo;Han, A;Choi, Won-Ik;Kwon, Dong-Seop
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.707-711
    • /
    • 2010
  • Most relational database systems provide $B^+$-trees as their main index structures, and use bulk-loading techniques for creating new $B^+$-trees on existing data from scratch. Although bulk loadings are more effective than inserting keys one by one, they are still time-consuming because they have to sort all the keys from large data. To improve the performance of bulk loadings, this paper proposes an efficient parallel bulk loading method for $B^+$-trees based on CUDA, which is a parallel computing architecture developed by NVIDIA to utilize computing powers of graphic processor units for general purpose computing. Experimental results show that the proposed method enhance the performance more than 70 percents compared to existing bulk loading methods.

An Implicit Integration Method for Joint Coordinate Subsystem Synthesis Method (조인트 좌표계를 이용한 부분시스템 합성방법의 내재적 적분기법)

  • Jo, Jun-Youn;Kim, Myoung-Ho;Kim, Sung-Soo
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.36 no.4
    • /
    • pp.437-442
    • /
    • 2012
  • To analyze a multibody system, this paper proposes an implicit numerical integration method for joint coordinates subsystem synthesis method. To verify the proposed method, a multibody model for an unmanned robot vehicle, which consists of six identical independent suspension systems, is developed. The symbolic method is applied to compute the system Jacobian matrix for the implicit integration method. The proposed method is also verified by performing rough terrain run-over simulation in comparison with the conventional implicit integration method. In addition, to evaluate the efficiency of the proposed method, the CPU time obtained by using this method is compared with that obtained by using the conventional implicit method.

Analysis of GPU Performance and Memory Efficiency according to Task Processing Units (작업 처리 단위 변화에 따른 GPU 성능과 메모리 접근 시간의 관계 분석)

  • Son, Dong Oh;Sim, Gyu Yeon;Kim, Cheol Hong
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.56-63
    • /
    • 2015
  • Modern GPU can execute mass parallel computation by exploiting many GPU core. GPGPU architecture, which is one of approaches exploiting outstanding computational resources on GPU, executes general-purpose applications as well as graphics applications, effectively. In this paper, we investigate the impact of memory-efficiency and performance according to number of CTAs(Cooperative Thread Array) on a SM(Streaming Multiprocessors), since the analysis of relation between number of CTA on a SM and them provides inspiration for researchers who study the GPU to improve the performance. Our simulation results show that almost benchmarks increasing the number of CTAs on a SM improve the performance. On the other hand, some benchmarks cannot provide performance improvement. This is because the number of CTAs generated from same kernel is a little or the number of CTAs executed simultaneously is not enough. To precisely classify the analysis of performance according to number of CTA on a SM, we also analyze the relations between performance and memory stall, dram stall due to the interconnect congestion, pipeline stall at the memory stage. We expect that our analysis results help the study to improve the parallelism and memory-efficiency on GPGPU architecture.

Improved Power Allocation to Enhance the Capacity in OFDMA System for Proportional Resource Allocation (Proportional 자원할당을 위한 OFDMA 시스템에서 채널 용량을 증대시키기 위한 향상된 전력 할당 기법)

  • Var, Puthnith;Shrestha, Robin;Kim, JaeMoung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.7
    • /
    • pp.580-591
    • /
    • 2013
  • The Orthogonal Frequency Division Multiple Access (OFDMA) is considered as a novel modulation and multiple access technique for 4th generation wireless systems. In this paper, we formulate a base station's power allocation algorithm for each user to maximize the user's sum rate, subject to constraints on total power, bit error rate, and rate proportionality among the users for a better proportional rate adaptive (RA) resource allocation method for OFDMA based system. We propose a novel power allocation method based on the proportion of subcarrier allocation and the user's normalized proportionality constant. We adapt a greedy algorithm and waterfilling technique for allocating the subcarriers among the users. In an end-to-end simulation, we validate that the proposed technique has higher system capacity and lower CPU execution times, while maintaining the acceptable rate proportionality among users.

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Fuzzy Logic-based Grid Job Scheduling Model for omputational Grid (계산 그리드를 위한 퍼지로직 기반의 그리드 작업 스케줄링 모델)

  • Park, Yang-Jae;Jang, Sung-Ho;Cho, Kyu-Cheol;Lee, Jong-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.5
    • /
    • pp.49-56
    • /
    • 2007
  • This paper deals with grid job allocation and grid resource scheduling to provide a stable and quicker job processing service to grid users. In this paper, we proposed a fuzzy logic-based grid job scheduling model for an effective job scheduling in computational grid environment. The fuzzy logic-based grid job scheduling model measures resource efficiency of all grid resources by a fuzzy logic system based on diverse input parameters like CPU speed and network latency and divides resources into several groups by resource efficiency. And, the model allocates jobs to resources of a group with the highest resource efficiency. For performance evaluation, we implemented the fuzzy logic-based grid job scheduling model on the DEVS modeling and simulation environment and measured reduction rates of turnaround time, job loss, and communication messages in comparison with existing job scheduling models such as the random scheduling model and the MCT(Minimum Completion time) model. Experiment results that the proposed model is useful to improve the QoS of the grid job processing service.

  • PDF

Image Mosaic from a Video Sequence using Block Matching Method (블록매칭을 이용한 비디오 시퀀스의 이미지 모자익)

  • 이지근;정성태
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.8
    • /
    • pp.1792-1801
    • /
    • 2003
  • In these days, image mosaic is getting interest in the field of advertisement, tourism, game, medical imaging, and so on with the development of internet technology and the performance of personal computers. The main problem of mage mosaic is searching corresponding points correctly in the overlapped area between images. However, previous methods requires a lot of CPU times and data processing for finding corresponding points. And they need repeated recording with a revolution of 360 degree around objects or background. This paper presents a new image mosaic method which generates a panorama image from a video sequence recorded by a general video camera. Our method finds the corresponding points between two successive images by using a new direction oriented 3­step block matching methods. Experimental results show that the suggested method is more efficient than the methods based on existing block matching algorithm, such as full search and K­step search algorithm.