Search | Korea Science

Loop unrolling and type casting operation for performance improvement in embedded system (임베디드 시스템에서의 성능 향상을 위한 루프 펼침과 형변환)

Sung, Woon;Shin, Dong-Young;Park, Joon-Seok
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2012.01a
- /
- pp.1-4
- /
- 2012
임베디드 시스템에서 최적화 기술의 성능은 크로스 컴파일러의 성능과 실행상황, 대상 하드웨어의 특징 등에 따라 좌우된다. 본 논문에서는 최적화 기술 중 루프 펼침과 형 변환을 이미지 처리 코드에 적용하여 성능을 측정하였다. 그 결과 기술을 적용하지 않은 성능을 기준으로 55%의 성능향상이 이루어졌다.
PDF

Portable Projection-Based Multimedia Display System (휴대형 프로젝션 기반의 멀티미디어 디스플레이 시스템)

Oh, Ji-Hyun;Lee, Moon-Hyun;Park, Han-Hoon;Kim, Jae-Soo;Park, Jong-Il
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2006.11a
- /
- pp.265-268
- /
- 2006
데스크탑 환경의 멀티미디어 디스플레이 시스템은 고해상도, 대화면의 영상을 제공해 줄 수 있는 반면 제약된 공간에서만 동작하므로 휴대할 수 없는 문체가 있다. PDA, PMP와 모바일 폰과 같은 휴대성을 가지는 멀티미디어 디스플레이 시스템은 해상도가 낮아 사용자에게 충분한 몰입감을 제공 해 주지 못한다. 본 논문에서는 기존의 데스크탑 환경에서 동작하는 프로젝션 기반의 증강현실 시스템을 모바일 플랫폼으로 확장한 프로젝션 기반의 휴대형 멀티미디어 디스플레이 시스템을 제안한다. 제안된 시스템은 PDA와 포켓 프로젝터를 결합한 것으로, PDA에서 전 처리된 멀티미디어 영상을 포켓 프로젝터를 이용하여 임의의 모양을 가지는 스크린에 왜곡 없이 영상을 표시해 줄 수 있다. 개발환경은 Window Mobile 5.0 기반의 ARM 플랫폼을 사용하는 PDA를 이용하였고, 시스템의 최적화를 위하여 x86 플랫폼에 최적화된 OpenCV 라이브러리를 모바일용으로 변환하였다. 또한 모바일 플랫폼에서는 부동소수점 연산으로 인한 시스템의 속도저하 문제가 발생하기 때문에 부동소수점 연산을 정수 연산으로 변환함으로써 처리 속도를 개선하였다. 프로젝션 기반의 디스플레이 시스템을 실현하기 위해서 필요한 기술적인 과제들을 모바일 환경에서 직접 처리해 봄으로써 휴대형 프로젝션 기반의 멀티미디어 시스템의 가능성을 제시한다.
PDF

Real-Time Implementation of the EHSX Speech Coder Using a Floating Point DSP (부동 소수점 DSP를 이용한 4kbps EHSX 음성 부호화기의 실시간 구현)

이인성;박동원;김정호
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.5
- /
- pp.420-427
- /
- 2004
This paper presents real time implementation of 4kbps EHSX (Enhanced Harmonic Stochastic Excitation) speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding for voiced frames and used the vector excitation coding with the structure of analysis-by-synthesis for unvoiced frames, respectively. For transition frames mixed with voiced and unvoiced signal, we use the time-separated transition coding. In this paper. we present the optimization methods of implementation speech coder on the EMS320C6701/sup (R)/ DSP. To reduce the complex for real-time implementation. we perform the optimization method in algorithm by replacing the complex sinusoidal synthesis method with IFFT. and we apply fully pipelines hand assembly coding after converting it from floating source to fixed source. To generate a more efficient code. we also make use or the available EMS320C6701/sup (R)/ resources such as Fastest67x library and memory organization.
PDF KSCI

Implementation of Music Signals Discrimination System for FM Broadcasting (FM 라디오 환경에서의 실시간 음악 판별 시스템 구현)

Kang, Hyun-Woo
- The KIPS Transactions:PartB
- /
- v.16B no.2
- /
- pp.151-156
- /
- 2009
This paper proposes a Gaussian mixture model(GMM)-based music discrimination system for FM broadcasting. The objective of the system is automatically archiving music signals from audio broadcasting programs that are normally mixed with human voices, music songs, commercial musics, and other sounds. To improve the system performance, make it more robust and to accurately cut the starting/ending-point of the recording, we also added a post-processing module. Experimental results on various input signals of FM radio programs under PC environments show excellent performance of the proposed system. The fixed-point simulation shows the same results under 3MIPS computational power.
https://doi.org/10.3745/KIPSTB.2009.16-B.2.151 인용 PDF KSCI

Performance Analysis of Error Correction Codes for 3GPP Standard (3GPP 규격 오류 정정 부호 기법의 성능 평가)

신나나;이창우
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.15 no.1
- /
- pp.81-88
- /
- 2004
Turbo code has been adopted in the 3GPP standard, since its performance is very close to the Shannon limit. However, the turbo decoder requires a lot of computations and the amount of the memory increases as the block size of turbo codes becomes larger. In order to reduce the complexity of the turbo decoder, the Log-MAP, the Max-Log-MAP and the sliding window algorithm have been proposed. In this paper, the performance of turbo codes adopted in the 3GPP standard is analyzed by using the floating point and the fixed point implementation. The efficient decoding method is also proposed. It is shown that the BER performance of the proposed method is close to that of the Log-MAP algorithm.
PDF KSCI

An exact floating point square root calculator using multiplier (곱셈기를 이용한 정확한 부동소수점 제곱근 계산기)

Cho, Gyeong-Yeon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.13 no.8
- /
- pp.1593-1600
- /
- 2009
There are two major algorithms to find a square root of floating point number, one is the Newton_Raphson algorithm and GoldSchmidt algorithm which calculate it approximately by iterating multiplications and the other is SRT algorithm which calculates it exactly by iterating subtractions. This paper proposes an exact floating point square root algorithm using only multiplication. At first an approximate inverse square root is calculated by Newton_Raphson algorithm, and then an exact square root algorithm by reducing an error in it and a compensation algorithm of it are proposed. The proposed algorithm is verified to calculate all of numbers in a single precision floating point number and 1 billion random numbers in a double precision floating point number. The proposed algorithm requires only the multipliers without another hardware, so it can be widely used in an embedded system and mobile production which requires an efact square root of floating point number.
https://doi.org/10.6109/JKIICE.2009.13.8.1593 인용 PDF KSCI

A Variable Latency Goldschmidt's Floating Point Number Divider (가변 시간 골드스미트 부동소수점 나눗셈기)

Kim Sung-Gi;Song Hong-Bok;Cho Gyeong-Yeon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.2
- /
- pp.380-389
- /
- 2005
The Goldschmidt iterative algorithm for a floating point divide calculates it by performing a fixed number of multiplications. In this paper, a variable latency Goldschmidt's divide algorithm is proposed, that performs multiplications a variable number of times until the error becomes smaller than a given value. To calculate a floating point divide '$\frac{N}{F}$', multifly '$T=\frac{1}{F}+e_t$' to the denominator and the nominator, then it becomes ’$\frac{TN}{TF}=\frac{N_0}{F_0}$'. And the algorithm repeats the following operations: ’$R_i=(2-e_r-F_i),\;N_{i+1}=N_i{\ast}R_i,\;F_{i+1}=F_i{\ast}R_i$, i$\in${0,1,...n-1}'. The bits to the right of p fractional bits in intermediate multiplication results are truncated, and this truncation error is less than ‘$e_r=2^{-p}$'. The value of p is 29 for the single precision floating point, and 59 for the double precision floating point. Let ’$F_i=1+e_i$', there is $F_{i+1}=1-e_{i+1},\;e_{i+1}',\;where\;e_{i+1}, If '$[F_i-1]<2^{\frac{-p+3}{2}}$ is true, ’$e_{i+1}<16e_r$' is less than the smallest number which is representable by floating point number. So, ‘$N_{i+1}$ is approximate to ‘$\frac{N}{F}$'. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation is derived from many reciprocal tables ($T=\frac{1}{F}+e_t$) with varying sizes. 1'he superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a divider. Also, it can be used to construct optimized approximate reciprocal tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc
PDF KSCI

Design of Square Root and Inverse Square Root Arithmetic Units for Mobile 3D Graphic Processing (모바일 3차원 그래픽 연산을 위한 제곱근 및 역제곱근 연산기 구조 및 설계)

Lee, Chan-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.3
- /
- pp.20-25
- /
- 2009
We propose hardware architecture of floating-point square root and inverse square root arithmetic units using lookup tables. They are used for lighting engines and shader processor for 3D graphic processing. The architecture is based on Taylor series expansion and consists of lookup tables and correction units so that the size of look-up tables are reduced. It can be applied to 32 bit floating point formats of IEEE-754 and reduced 24 bit floating point formats. The square root and inverse square root arithmetic units for 32 bit and 24 bit floating format number are designed as the proposed architecture. They can operation in a single cycle, and satisfy the precision of $10^{-5}$ required by OpenGL 1.x ES. They are designed using Verilog-HDL and the RTL codes are verified using an FPGA.
PDF KSCI

Design of Floating-point Processing Unit for Multi-chip Superscalar Microprocessor (다중 칩 수퍼스칼라 마이크로프로세서용 부동소수점 연산기의 설계)

이영상;강준우
- Proceedings of the IEEK Conference
- /
- 1998.10a
- /
- pp.1153-1156
- /
- 1998
We describe a design of a simple but efficient floatingpoint processing architecture expoiting concurrent execution of scalar instructions for high performance in general-purpose microprocessors. This architecture employs 3 stage pipeline asyncronously working with integer processing unit to regulate instruction flows between two arithmetic units.
PDF

A Design of Dual-Phase Instructions for a effective Logarithm and Exponent Arithmetic (효율적인 로그와 지수 연산을 위한 듀얼 페이즈 명령어 설계)

Kim, Chi-Yong;Lee, Kwang-Yeob
- Journal of IKEEE
- /
- v.14 no.2
- /
- pp.64-68
- /
- 2010
This paper proposes efficient log and exponent calculation methods using a dual phase instruction set without additional ALU unit for a mobile enviroment. Using the Dual Phase Instruction set, it extracts exponent and mantissa from expression of floating point and calculates 24bit single precision floating point of log approximation using the Taylor series expansion algorithm. And with dual phase instruction set, it reduces instruction excution cycles. The proposed Dual Phase architecture reduces the performance degradation and maintain smaller size.
PDF KSCI

Search Result 133, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)