• Title/Summary/Keyword: Algorithm Instruction

Search Result 155, Processing Time 0.026 seconds

Parallel Simulation of Bounded Petri Nets using Data Packing Scheme (데이터 중첩을 통한 페트리네트의 병렬 시뮬레이션)

  • 김영찬;김탁곤
    • Journal of the Korea Society for Simulation
    • /
    • v.11 no.2
    • /
    • pp.67-75
    • /
    • 2002
  • This paper proposes a parallel simulation algorithm for bounded Petri nets in a single processor, which exploits the SIMD(Single Instruction Multiple Data)-type parallelism. The proposed algorithm is based on a data packing scheme which packs multiple bytes data in a single register, thereby being manipulated simultaneously. The parallelism can reduce simulation time of bounded Petri nets in a single processor environment. The effectiveness of the algorithm is demonstrated by presenting speed-up of simulation time for two bounded Petri nets.

  • PDF

An analysis of fractional division instruction emphasizing algebraic thinking (대수적 사고를 강조한 분수 나눗셈 수업의 분석)

  • Cho, SeonMi;Pang, JeongSuk
    • The Mathematical Education
    • /
    • v.60 no.4
    • /
    • pp.409-429
    • /
    • 2021
  • This study investigated instructional methods for fractional division emphasizing algebraic thinking with sixth graders. Specifically, instructional elements for fractional division emphasizing algebraic thinking were derived from literature reviews, and the fractional division instruction was reorganized on the basis of key elements. The instructional elements were as follows: (a) exploring the relationship between a dividend and a divisor; (b) generalizing and representing solution methods; and (c) justifying solution methods. The instruction was analyzed in terms of how the key elements were implemented in the classroom. This paper focused on the fractional division instruction with problem contexts to calculate the quantity of a dividend corresponding to the divisor 1. The students in the study could explore the relationship between the two quantities that make the divisor 1 with different problem contexts: partitive division, determination of a unit rate, and inverse of multiplication. They also could generalize, represent, and justify the solution methods of dividing the dividend by the numerator of the divisor and multiplying it by the denominator. However, some students who did not explore the relationship between the two quantities and used only the algorithm of fraction division had difficulties in generalizing, representing, and justifying the solution methods. This study would provide detailed and substantive understandings in implementing the fractional division instruction emphasizing algebraic thinking and help promote the follow-up studies related to the instruction of fractional operations emphasizing algebraic thinking.

PC-Based Realtime Implementation of H.263 CODEC Using SIMD Method (SIMD기법에 의한 H.263 코덱의 PC기반 실시간 구현)

  • 하교동;남수영;김남철
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.947-950
    • /
    • 2001
  • This paper implements H.263 codec using SIMD(single instruction multiple data) method in real time based on PC. This system uses INS algorithm previously proposed by the authors as motion estimation module. SIMD method is used in DCT, IDCT, quantization, motion estimation, and display module. The developed algorithms are implemented using TMN5. Using the above algorithm, H.263 Codec can communicate more than 15 frames/sec in CIF resolution on a Pentium-IV 1.7GHz computer.

  • PDF

The Design of A Code Generator for RISC Architecture (RISC 아키텍춰의 코드 생성기 설계)

  • 박종덕;임인칠
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.8
    • /
    • pp.1221-1230
    • /
    • 1990
  • This paper presents a code generation method and an effective handling algorithm of ingeger constant multiplication for RISC machines in compiler design. As RISC Architectures usually use faster and more simply formed instructions than CISC's and most RISC processors do not have an integer multiplication instruction, it is required an effective algorithm to process integer multiplication. For the proposed code generator, Portable C Compiler(PCC) is redesigned to be suitable for an RISC machine, and composed an addition chain is built up to allow fast execution of constant multiplication, a part of integer one whicch appears very frequency in code generation phase.

  • PDF

A Program Code Compression Method with Very Fast Decoding for Mobile Devices (휴대장치를 위한 고속복원의 프로그램 코드 압축기법)

  • Kim, Yong-Kwan;Wee, Young-Cheul
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.11
    • /
    • pp.851-858
    • /
    • 2010
  • Most mobile devices use a NAND flash memory as their secondary memory. A compressed code of the firmware is stored in the NAND flash memory of mobile devices in order to reduce the size and the loading time of the firmware from the NAND flash memory to a main memory. In order to use a demand paging properly, a compressed code should be decompressed very quickly. The thesis introduces a new dictionary based compression algorithm for the fast decompression. The introduced compression algorithm uses a different method with the current LZ method by storing the "exclusive or" value of the two instructions when the instruction for compression is not equal to the referenced instruction. Therefore, the thesis introduces a new compression format that minimizes the bit operation in order to improve the speed of decompression. The experimental results show that the decoding time is reduced up to 5 times and the compression ratio is improved up to 4% compared to the zlib. Moreover, the proposed compression method with the fast decoding time leads to 10-20% speed up of booting time compared to the booting time of the uncompressed method.

TDES CODER USING SSE2 TECHNOLOGY

  • Koo, In-Hoi;Kim, Tae-Hoon;Ahn, Sang-Il
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.114-117
    • /
    • 2007
  • DES is an improvement of the algorithm Lucifer developed by IBM in the 1977. IBM, the National Security Agency (NSA) and the National Bureau of Standards (NBS now National Institute of Standards and Technology NIST) developed the DES algorithm. The DES has been extensively studied since its publication and is the most widely used symmetric algorithm in the world. But nowadays, Triple DES (TDES) is more widely used than DES especially in the application in case high level of data security is required. Even though TDES can be implemented based on standard algorithm, very high speed TDES codec performance is required to process when encrypted high resolution satellite image data is down-linked at high speed. In this paper, Intel SSE2 (Streaming SIMD (Single-Instruction Multiple-Data) Extensions 2 of Intel) is applied to TDES Decryption algorithm and proved its effectiveness in the processing time reduction by comparing the time consumed for two cases; original TDES Decryption and TDES Decryption with SSE2

  • PDF

A Custom Code Generation Technique for ASIPs from High-level Language (고급 언어에서 ASIP을 위한 전용 부호 생성 기술 연구)

  • Alam, S.M. Shamsul;Choi, Goangseog
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.3
    • /
    • pp.31-43
    • /
    • 2015
  • In this paper, we discuss a code generation technique for custom transport triggered architecture (TTA) from a high-level language structure. This methodology is implemented by using TTA-based Co-design Environment (TCE) tool. The results show how the scheduler exploits instruction level parallelism in the custom target architecture and source program. Thus, the scheduler generates parallel TTA instructions using lower cycle counts than the sequential scheduling algorithm. Moreover, we take Tensilica tool to make a comparison with TCE. Because of the efficiency of TTA, TCE takes less execution cycles compared to Tensilica configurations. Finally, this paper shows that it requires only 7 cycles to generate the parallel TTA instruction set for implementing Cyclic Redundancy Check (CRC) applications as an input design, and presents the code generation technique to move complexity from the processor software to hardware architecture. This method can be applicable lots of channel Codecs like CRC and source Codecs like High Efficiency Video Coding (HEVC).

Pair Register Allocation Algorithm for 16-bit Instruction Set Architecture (ISA) Processor (16비트 명령어 기반 프로세서를 위한 페어 레지스터 할당 알고리즘)

  • Lee, Ho-Kyoon;Kim, Seon-Wook;Han, Young-Sun
    • The KIPS Transactions:PartA
    • /
    • v.18A no.6
    • /
    • pp.265-270
    • /
    • 2011
  • Even though 32-bit ISA based microprocessors are widely used more and more, 16-bit ISA based processors are still being frequently employed for embedded systems. Intel 8086, 80286, Motorola 68000, and ADChips AE32000 are the representatives of the 16-bit ISA based processors. However, due to less expressiveness of the 16-bit ISA from its narrow bit width, we need to execute more 16-bit instructions for the same implementation compared to 32-bit instructions. Because the number of executed instructions is a very important factor in performance, we have to resolve the problem by improving the expressiveness of the 16-bit ISA. In this paper, we propose a new pair register allocation algorithm to enhance an original graph-coloring based register allocation algorithm. Also, we explain about both the performance result and further research directions.

A Fast SAD Algorithm for Area-based Stereo Matching Methods (영역기반 스테레오 영상 정합을 위한 고속 SAD 알고리즘)

  • Lee, Woo-Young;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.7 no.2
    • /
    • pp.8-12
    • /
    • 2012
  • Area-based stereo matchng algorithms are widely used for image analysis for stereo vision. SAD (Sum of Absolute Difference) algorithm is one of well known area-based stereo matchng algorithms with the characteristics of data intensive computing application. Therefore, it requires very high computation capabilities and its processing speed becomes very slow with software realization. This paper proposes a fast SAD algorithm utilizing SSE (Streaming SIMD Extensions) instructions based on SIMD (Single Instruction Multiple Data) parallism. CPU supporing SSE instructions has 16 XMM registers with 128 bits. For the performance evaluation of the proposed scheme, we compare the processing speed between SAD with/without SSE instructions. The proposed scheme achieves four times performance improvement over the general SAD, which shows the possibility of the software realization of real time SAD algorithm.

A New Synchronization Scheme for Parallel Processing on Perfectly Nested Do Loops (완전 중첩 루프에서 병렬처리를 위한 새로운 동기화 기법)

  • 이광형;황종선;박두순;김병수
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.10
    • /
    • pp.1-10
    • /
    • 1994
  • In most application programs, loops usually contain most of the computation in a program and are the most improtant source of parallelism. When loops are executed on multiprocessors, the cross iteration data dependences need to be enforced by synchronization between processors. In this paper, we propose a new synchronization scheme(Free/Hold) for reducing overgeads occured by synchronization variables in data oriented scheme and delay of time occured by synchronization instruction in statement oriented scheme. The Free/Hold mechanism enforces the correct execution order by inserting synchronization instruction between each instance with data dependence relationship using the RD(Real dependence Distance). We also present an algorithm for removing unnecessary dependences in one-to-many dependences.

  • PDF