Search | Korea Science

A Hybrid Value Predictor using Dynamic Classification in Superscalar Processors (슈퍼스칼라 프로세서에서 동적 분류를 사용한 하이브리드 결과 값 예측기)

Shin, Young-Ho;Yoon, Sung-Lyong;Park, Hong-Jun;Lee, Won-Mo;Kim, Ju-Ik;Cho, Young-Il
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.04a
- /
- pp.544-549
- /
- 2000
슈퍼스칼라 프로세서의 성능을 향상시키기 위해서는 데이터 종속성에 의한 장애를 제거해야 한다. 최근 여러 논문들은 이러한 데이터 종속성을 제거하기 위해서 명령의 결과 값을 예상하는 메커니즘이 연구되고 있다. 결과 값 예상 메커니즘 중 여러 예측기를 하이브리드해서 사용하는 방법은 각각 하나의 예측기만을 사용하는 방법보다 더 좋은 성능을 얻을 수 있다. 그러나 종전의 하이브리드 예측기는 명령어를 중복해서 저장하여 많은 하드웨어 크기를 요구한다. 본 논문에서는 여러 예측기의 장점을 이용하여 높은 성능을 얻을 수 있는 새로운 하이브리드 예측 메커니즘을 제안한다. 또한 예상하기 어려운 명령어를 동적으로 찾아내어 예상하지 않음으로서 잘못 예상한 misprediction 페널티를 줄이고 예상 정확도를 높인다. 시뮬레이션 결과 SPECint95 벤치마크 프로그램에 대해 제안한 하이브리드 예측기에서 예측율은 평균 79%에서 90%로 향상하였고, misprediction rate는 평균 12%에서 2%로 낮추었다
PDF

A Design of Multimedia Application SoC based with Processor using BTB (BTB를 이용한 프로세서 기반 멀티미디어 응용 SoC 설계)

Jung, Younjin;Lee, Byungyup;Ryoo, Kwangki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2009.10a
- /
- pp.397-400
- /
- 2009
This paper describes ASIC design of Multimedia application SoC platform based RISC processor with BTB(Branch Target Buffer). For performance enhancement of platform, we use a simple branch prediction scheme, BTB structure, that stores a target address for branch instruction to remove pipeline harzard. Also, the platform includes a number of peripheral such as VGA controller, AC97 controller, UART controller, SRAM interface and Debug interface. The platform is designed and verified on a Xilinx VERTEX-4 FPGA using a number of test programs for functional tests and timing constraints. Finally, the platform is implemented into a single ASIC chip which can be operated at 100MHz clock frequency using the Chartered 0.18um process. As a result of performance estimation, the proposed platform shows about 5~9% performance improvement in comparison with the previous SoC Platform.
PDF

A Validation Check of Simulation Model with the Model Transformation (모델변환에 의한 시뮬레이션 모델의 타당성 검사)

정영식
- Proceedings of the Korea Society for Simulation Conference
- /
- 1992.10a
- /
- pp.9-9
- /
- 1992
시뮬레이션(simulation)은 실 시스템(real system)의 효과적이고 효율적인 운영을 도모하기 위하여 실 시스템의 동작을 이해하고 분석, 예측, 평가하는 과학적인 문제해결 접근방법이다. 시뮬레이션 수행단계는 실 시스템의 행위를 정확히 반영하도록 타당한 모델을 구축하는 모델링 단계와 모델에 의도하는 명령어들을 컴퓨터 프로그램으로 작성하는 구현단계로 나누어진다. 시뮬레이션 모델은 시간, 상태, 확률변수, 상호규칙 등의 여러 관점에 따라 다양하게 존재하는데, DEVS(Descrete EVent system Specification) 모델은 연속적인 시간상에서 이산적으로 발생하는 사건에 따라 시스템의 상태를 분석할 수 있고 모델링 및 시뮬레이션 방법론의 형식화를 위한 견고한 이론적 기반을 제공하고 있다. 또한, DEVS 모델은 모듈적, 계층적 특성을 제공하고 집합론에 근거한 수학적 형식구조를 제공하여 실 시스템에 대한 체계적인 분석과정을 수행하게 되어 보다 현실적인 모델링을 가능하게 한다. 그러나 타당하지 못한 DEVS 모델이 구축되면 시뮬레이션을 통한 분석결과의 신뢰성이 떨어져 아무런 효과가 없고 경제적인 손실만이 따른다. DEVS 모델에 대한 기존의 타당성 검사가 많은 시간과 노력이 요구되고, 반복적인 DEVS 모델링 과정으로 인한 전문적이고 경험적인 지식을 요구한다. 또한, 모델설계자에 의해 설정된 실험 프레임하에서 DEVS 모델의 구성요소에 속하는 상태전이함수, 시간진행함수 및 출력함수에 대하여 commutative 성질의 보전성 검사가 어렵다는 문제점을 가지고 있다. 본 연구에서는 이와 같은 문제점을 해결하기 위하여, DEVS 모델에 대한 타당성 검사를 SPN(Stochastic Petri Net) 모델로 변환하여 SPN 모델을 이용하는 간단하고 효과적인 타당성 검사 방법을 제안한다. 먼저, DEVs 모델에 대한 개념과 기존의 DEVS 모델에 대한 타당성 검사 방법을 고찰하고 그 문제점에 대하여 자세히 설명한다. DEVS 모델의 타당성 검사에 이용하는 SPN 모델에 대한 개념과 DEVS 모델과 행위적으로 동등한 SNP 모델로 변환을 위한 관점을 제조명하다. 동일한 관점에서 두 모델의 상태표현이 같도록 DEVS 모델이 SPN 모델로 표현됨을 보이는 변환이론을 제시하고 변환이론을 바탕으로 모델 변환과정을 제시한다. 모델 변환이론과 변환고정을 기본으로 타당성 검사를 위한 새로운 동질함수(homogeneous function)를 정의하고 이와 함께 SPN 모델의 특성을 이용하여 DEVS 모델에 대한 타당성 검사 방법을 새롭게 제안한다.
PDF

Web-based Practice Education Supporting System for Computational Chemistry (웹기반 계산화학 실습교육 지원시스템 개발)

Ahn, Bu-Young;Lee, Jong-Suk Ruth;Cho, Kum-Won
- The Journal of Korean Institute for Practical Engineering Education
- /
- v.3 no.2
- /
- pp.18-26
- /
- 2011
Computational chemistry is one of the chemistry fields that deals with the theoretical chemistry problem using computer calculations and can be described as the chemistry lab moved on computer space. In line with recent enhancement of processing capability of computers, utilization of high performance computer cannot be overemphasized in the field of computational chemistry in performing complex calculation of huge molecular structure and simulation. While they have to use commands and consoles for high performance computer to execute complex calculation of huge molecular structure and simulation, most of students in natural science and engineering, who are not experts in computer technically, are likely to be unaware of UNIX. Under the circumstances, web-based educational support system for computational chemistry is needed to enable them to practice computational chemistry, even not knowing UNIX command. In this study, e-Chem, one of such educational support systems, is developed by using Liferay portal platform, which is a Java open source more oriented to standard and outstanding in its content management and collaboration function than other web portals. By using this system, even students who are not familiar with computer, are expected to take part in lab classes and save time learning Unix command and also enhance the learning efficiency by using familiar interface.
PDF

The Design of 32 Bit Microprocessor for Sequence Control Using FPGA (FPGA를 이용한 시퀀스 제어용 32비트 마이크로프로세서 설계)

Yang, Oh
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.40 no.6
- /
- pp.431-441
- /
- 2003
This paper presents the design of 32 bit microprocessor for a sequence control using a field programmable gate array(FPGA). The microprocessor was designed by a VHDL with top down method, the program memory was separated from the data memory for high speed execution of sequence instructions. Therefore it was possible that sequence instructions could be operated at the same time during the instruction fetch cycle. In order to reduce the instruction decoding time and the interface time of the data memory interface, an instruction code size was implemented by 32 bits. And the real time debug operation was implemented for easeful debugging the designed processor with a single step run, PC break point run, data memory break point run. Also in this designed microprocessor, pulse instructions, step controllers, master controllers, BM and BCD type arithmetic instructions, barrel shift instructions were implemented for sequence logic control. The FPGA was synthesized under a Xilinx's Foundation 4.2i Project Manager using a V600EHQ240 which contains 600,000 gates. Finally simulation and experiment were successfully performed respectively. For showing good performance, the designed microprocessor for the sequence logic control was compared with the H8S/2148 microprocessor which contained many bit instructions for sequence logic control. The designed processor for the sequence logic showed good performance.
PDF KSCI

Simple Recovery Mechanism for Branch Misprediction in Global-History-Based Branch Predictors Allowing the Speculative Update of Branch History (분기 히스토리의 모험적 갱신을 허용하는 전역 히스토리 기반 분기예측기에서 분기예측실패를 위한 간단한 복구 메커니즘)

Ko, Kwang-Hyun;Cho, Young-Il
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.6
- /
- pp.306-313
- /
- 2005
Conditional branch prediction is an important technique for improving processor performance. Branch mispredictions, however, waste a large number of cycles, inhibit out-of-order execution, and waste electric power on mis-speculated instructions. Hence, the branch predictor with higher accuracy is necessary for good processor performance. In global-history-based predictors like gshare and GAg, many mispredictions come from commit update of the history. Some works on this subject have discussed the need for speculative update of the history and recovery mechanisms for branch mispredictions. In this paper, we present a simple mechanism for recovering the branch history after a misprediction. The proposed mechanism adds an age_counter to the original predictor and doubles the size of the branch history register. The age_counter counts the number of outstanding branches and uses it to recover the branch history register. Simulation results on the Simplescalar 3.0/PISA tool set and the SPECINTgS benchmarks show that gshare and GAg with the proposed recovery mechanism improved the average prediction accuracy by 2.14$\%$ and 9.21$\%$, respectively and the average IPC by 8.75$\%$ and 18.08$\%$, respectively over the original predictor.
PDF KSCI

Accelerating Symmetric and Asymmetric Cryptographic Algorithms with Register File Extension for Multi-words or Long-word Operation (다수 혹은 긴 워드 연산을 위한 레지스터 파일 확장을 통한 대칭 및 비대칭 암호화 알고리즘의 가속화)

Lee Sang-Hoon;Choi Lynn
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.43 no.2 s.308
- /
- pp.1-11
- /
- 2006
In this paper, we propose a new register file architecture called the Register File Extension for Multi-words or Long-word Operation (RFEMLO) to accelerate both symmetric and asymmetric cryptographic algorithms. Based on the idea that most of cryptographic algorithms heavily use multi-words or long-word operations, RFEMLO allows multiple contiguous registers to be specified as a single operand. Thus, a single instruction can specify a SIMD-style multi-word operation or a long-word operation. RFEMLO can be applied to general purpose processors by adding instruction set for multi-words or long-word operands and functional units for additional instruction set. To evaluate the performance of RFEMLO, we use Simplescalar/ARM 3.0 (with gcc 2.95.2) and run detailed simulations on various symmetric and asymmetric cryptographic algorithms. By applying RFEMLO, we could get maximum 62% and 70% reductions in the total instruction count of symmetric and asymmetric cryptographic algorithms respectively. Also, performance results show that a speedup of 1.4 to 2.6 can be obtained in symmetric cryptographic algorithms and a speedup of 2.5 to 3.3 can be obtained for asymmetric cryptographic algorithms when we apply RFEMLO to a processor with an in-order pipeline. We also found that RFEMLO can effectively improve the performance of these cryptographic algorithms with much less cost compared to issue-width increase available in Superscalar implementations. Moreover, the RFEMLO can also be applied to Superscalar processor, leading to additional 83% and 138% performance gain in symmetric and asymmetric cryptographic algorithms.
PDF KSCI

A Hybrid Value Predictor using Static and Dynamic Classification in Superscalar Processors (슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 값 예측기)

김주익;박홍준;조영일
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.10
- /
- pp.569-578
- /
- 2003
Data dependencies are one of major hurdles to limit ILP(Instruction Level Parallelism), so several related works have suggested that the limit imposed by data dependencies can be overcome to some extent with use of the data value prediction. Hybrid value predictor can obtain the high prediction accuracy using advantages of various predictors, but it has a defect that same instruction has overlapping entries in all predictor. In this paper, we propose a new hybrid value predictor which achieves high performance by using the information of static and dynamic classification. The proposed predictor can enhance the prediction accuracy and efficiently decrease the prediction table size of predictor, because it allocates each instruction into single best-suited predictor during the fetch stage by using the information of static classification. Also, it can enhance the prediction accuracy because it selects a best- suited prediction method for the “Unknown”pattern instructions by using the dynamic classification mechanism. Simulation results based on the SimpleScalar/PISA tool set and the SPECint95 benchmarks show the average correct prediction rate of 85.1% by using the static classification mechanism. Also, we achieve the average correction prediction rate of 87.6% by using static and dynamic classification mechanism.
PDF KSCI

Optimal-synchronous Parallel Simulation for Large-scale Sensor Network (대규모 센서 네트워크를 위한 최적-동기식 병렬 시뮬레이션)

Kim, Bang-Hyun;Kim, Jong-Hyun
- Journal of KIISE:Computer Systems and Theory
- /
- v.35 no.5
- /
- pp.199-212
- /
- 2008
Software simulation has been widely used for the design and application development of a large-scale wireless sensor network. The degree of details of the simulation must be high to verify the behavior of the network and to estimate its execution time and power consumption of an application program as accurately as possible. But, as the degree of details becomes higher, the simulation time increases. Moreover, as the number of sensor nodes increases, the time tends to be extremely long. We propose an optimal-synchronous parallel discrete-event simulation method to shorten the time in a large-scale sensor network simulation. In this method, sensor nodes are partitioned into subsets, and each PC that is interconnected with others through a network is in charge of simulating one of the subsets. Results of experiments using the parallel simulator developed in this study show that, in the case of the large number of sensor nodes, the speedup tends to approach the square of the number of PCs participating in the simulation. In such a case, the ratio of the overhead due to parallel simulation to the total simulation time is so small that it can be ignored. Therefore, as long as PCs are available, the number of sensor nodes to be simulated is not limited. In addition, our parallel simulation environment can be constructed easily at the low cost because PCs interconnected through LAN are used without change.
PDF KSCI

Power consumption estimation of active RFID system using simulation (시뮬레이션을 이용한 능동형 RFID 시스템의 소비 전력 예측)

Lee, Moon-Hyoung;Lee, Hyun-Kyo;Lim, Kyoung-Hee;Lee, Kang-Won
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.20 no.8
- /
- pp.1569-1580
- /
- 2016
For the 2.4 GHz active RFID to be successful in the market, one of the requirements is the increased battery life. However, currently we do not have any accurate power consumption estimation method. In this study we develop a simulation model, which can be used to estimate power consumption of tag accurately. Six different simulation models are proposed depending on collision algorithm and query command method. To improve estimation accuracy, we classify tag operating modes as the wake-up receive, UHF receive, sleep timer, tag response, and sleep modes. Power consumption and operating time are identified according to the tag operating mode. Query command for simplifying collection and ack command procedure and newly developed collision control algorithm are used in the simulation. Other performance measures such as throughput, recognition time for multi-tags, tag recognition rate including power consumption are compared with those from the current standard ISO/IEC 18000-7.
https://doi.org/10.6109/jkiice.2016.20.8.1569 인용 PDF KSCI

Search Result 119, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)