Search | Korea Science

An Efficient Hardware Design for Scaling and Transform Coefficients Decoding (스케일링과 변환계수 복호를 위한 효율적인 하드웨어 설계)

Jung, Hongkyun;Ryoo, Kwangki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.16 no.10
- /
- pp.2253-2260
- /
- 2012
In this paper, an efficient hardware architecture is proposed for inverse transform and inverse quantization of H.264/AVC decoder. The previous inverse transform and quantization architecture has a different AC and DC coefficients decoding order. In the proposed architecture, IQ is achieved after IT regardless of the DC or AC coefficients. A common operation unit is also proposed to reduce the computational complexity of inverse quantization. Since division operation is included in the previous architecture, it will generate errors if the processing order is changed. In order to solve the problem, the division operation is achieved after IT to prevent errors in the proposed architecture. The architecture is implemented with 3-stage pipeline and a parallel vertical and horizontal IDCT is also implemented to reduce the operation cycle. As a result of analyzing the proposed ITIQ architecture operation cycle for one macroblock, the proposed one has improved by 45% than the previous one.
https://doi.org/10.6109/jkiice.2012.16.10.2253 인용 PDF KSCI

Computation Optimization of Color Conversion in JPEG Image Decoding (JPEG 영상 복원에서 컬러변환의 계산 최적화)

Kim, Young-Ju
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2009.01a
- /
- pp.241-244
- /
- 2009
최근 모바일폰에 500만 화소 이상의 카메라 모듈이 장착되는 등 모바일 장치에서 고해상도 영상의 인코딩 및 디코딩에 대한 요구가 크게 늘어남에 따라 저성능 시스템에서 실시간으로 동작하는 영상 코덱 구현에 대한 필요성이 증대되고 있다. 본 논문은 JPEG 디코딩의 마지막 단계인 컬러변환 과정에 대해 계산 복잡도를 최적화하는 기법을 제안하고 성능을 평가하였다. 제안된 기법은 JPEG 디코딩 과정에서 IDCT(Inverse Discrete Cosine Transform) 변환과 컬러변환 간의 선형성을 바탕으로 이들 연산 순서를 재배열함으로써 컬러변환 과정에서 요구되는 계산 횟수를 줄이고, 재배열된 부동소수점 연산에 대해 정수 맵핑을 적용하여 계산 복잡도를 줄임으로써 실행시간을 최적화하였다. 임베디드 시스템 개발 플랫폼에서의 성능 평가를 통해 제안된 기법이 기존의 컬러변환 기법들과 비교하여 실행시간을 크게 단축함을 얄 수 있었으나 복원 영상의 화질이 상대적으로 저하됨을 확인하였다.
PDF

A Design of Efficient Modular Multiplication based on Montgomery Algorithm (효율적인 몽고메리 모듈러 곱셈기의 설계)

Park, Hye-Young;Yoo, Kee-Young
- Proceedings of the Korea Information Processing Society Conference
- /
- 2004.05a
- /
- pp.1003-1006
- /
- 2004
본 논문에서는 몽고메리 모듈러 곱셈(Montgomery Modular Multiplication) 알고리즘을 이용하여 효율적인 모듈러 곱셈기를 제안한다. 본 논문에서 제안한 곱셈기는 프로그램 가능한 셀룰라 오토마타(Programmable Cellular Automata, PCA)를 기반의 구조로 설계되어 하드웨어 복잡도를 줄이고, 곱셈시 몽고메리 알고리즘을 이용하여 일반적인 나눗셈 없이 모듈러 연산을 수행하여 시간 복잡도를 최소화 한다. 제안된 곱셈기는 시간적, 공간적인 면에서 간단하고 효과적으로 구성되어 지수연산을 위한 하드웨어의 하부구조나 오류 수정 코드(Error Correcting Code)의 연산에서 효율적으로 이용될 수 있을 것이다.
PDF

An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder (HEVC 복호기의 연산 복잡도 감소를 위한 화면내 예측 하드웨어 구조 설계)

Jung, Hongkyun;Ryoo, Kwangki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.17 no.5
- /
- pp.1203-1212
- /
- 2013
In this paper, an intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder. The architecture uses shared operation units and common operation units and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter. The shared operation unit shares adders processing common equations to remove the computational redundancy. The unit computes an average value in DC mode for reducing the number of execution cycles in DC mode. In order to reduce operation units, the common operation unit uses one operation unit generating predicted pixels and filtered pixels in all prediction modes. In order to reduce processing time and operators, the decision algorithm uses only bit-comparators and the fast algorithm uses LUT instead of multiplication operators. The proposed architecture using four shared operation units and eight common operation units which can reduce execution cycles of intra prediction. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency are 40.5k and 164MHz, respectively. As the result of measuring the performance of the proposed architecture using the extracted data from HM 7.1, the execution cycle of the architecture is about 93.7% less than the previous design.
https://doi.org/10.6109/jkiice.2013.17.5.1203 인용 PDF KSCI

A Novel Fixed-Complexity Signal Detection Technique Using Lattice Reduction for Multiple Antenna Systems (다중 안테나 시스템을 위한 고정된 연산 복잡도를 갖는 격자 감소 기반 신호 검출 기법)

Yang, Yusik;Suh, Dong Geun;Kim, Jaekwon
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.38A no.1
- /
- pp.10-18
- /
- 2013
Recently, a fixed complexity LR(fcLR) technique was proposed. Also QR-LRL signal detection method was proposed in which all constellation symbols are tried as the symbol corresponding to the least reliable layer (LRL), thereby achieving high error performance. In this paper, we combine these two efficient methods to propose a novel detection method. When the LRL is disregarded in the process of LR, the worst case complexity of LR is significantly reduced. Also, the proposed method is shown to be superior to the conventional fcLR-based detection method from the perspective of error performance. Simulations are performed to demonstrate the efficacy of the proposed method.
https://doi.org/10.7840/kics.2013.38A.1.10 인용 PDF KSCI

Using Java Objects in C through the JNI Function Calls (JNI 잠수 호출을 통한 C에서의 자바 객체 사용)

이창환;오세만
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.04b
- /
- pp.340-342
- /
- 2002
JNI는 자바와 네이티브 코드간에 상호 연동을 위해서 사용되는 인터페이스이고, JNI를 이용하면 C에서 자바 객체를 사용할 수 있다 C에서 자바 객체에 대한 연산을 하기 위해서는 객체 연산의 종류에 따른 일정한 JNI 함수 호출 패턴을 이용해야 한다. 사용자가 직접 자바에 대한 연산을 기술하는 경우, 사용자는 복잡한 함수 호출 패턴을 익히고 패턴에 필요한 정보를 직접 입력해야 하며, 패턴의 잘못된 기술과 올바르지 않은 정보의 입력에 따른 오류 발생할 가능성이 높은 문제점이 있다. 본 논문에서는 자바에서 점(".") 연산자를 사용하여 객체에 대해 연산하는 것처럼 C에서도 점 연산자를 사용하여 자바 객체에 대한 연산할 수 있는 방법을 제안하고 구현하였다. 제안된 방법은 점 연산자를 사용한 자바 객체에 대한 연산을 같은 의미를 가지는 여러 JNI 함수 호출로 변환하는 것으로, 사용자가 직접 기술해서 발생하는 여러 문제점을 제거하여 사용의 복잡성과 오류 생성의 발생 가능성을 줄이는 장점을 가지고 있다.
PDF

Design of Parallel Inverse Quantization and Inverse Transform Architecture for High Performance H.264/AVC Decoder (고성능 H.264/AVC 복호기를 위한 병렬 역양자화 및 역변환 구조 설계)

Jung, Hong-Kyun;Ryoo, Kwang-Ki
- Proceedings of the KAIS Fall Conference
- /
- 2011.12b
- /
- pp.434-437
- /
- 2011
본 논문에서는 H.264/AVC 복호기의 성능을 향상시키기 위해 병렬 역양자화 구조와 역변환 구조를 제안한다. 제안하는 역양자화 구조는 공통 연산기를 사용하여 계산 복잡도를 감소시키고, 4개의 공통연산기를 사용하여 역양자화 수행 사이클 수를 1 사이클로 감소시킨다. 제안하는 역변환 구조는 4개의 변환 연산기를 사용하여 역변환 연산을 수행하는데 2 사이클이 소요된다. 또한 제안하는 구조는 역양자화 연산과 수평 역변환 연산을 동시에 수행하는 병렬 구조를 채택하여 역양자화 및 역변환 수행 사이클 수를 2 사이클로 감소시킨다. 제안하는 구조를 Magnachip 0.18um CMOS 공정 라이브러리를 이용하여 합성한 결과 1.5MHz의 동작 주파수에서 게이트 수는 14,173이고, 표준 참조 소프트웨어 JM 9.4에서 추출한 데이터를 이용하여 성능을 측정한 결과 제안하는 구조의 수행 사이클 수가 기존 구조 대비 38.74% 향상되었다.
PDF

A Study on Fixed-point Implementation of MPEG-1 Audio Decoder (MPEG-1 Audio Decoder의 고정소수점 구현에 관한 연구)

김선태
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10c
- /
- pp.213-215
- /
- 2000
디지털 신호처리 알고리즘의 구현은 속도나 메모리의 사용측면에서 고정 소수점 구현이 필요하다. 특히, 정수형 연산 프로세서에서는 소프트웨어에 의한 부동 소수점보다는 고정 소수점 구현이 훨씬 성능이 뛰어나다. 디지털 신호처리 알고리즘의 복잡함과 일반 프로세서의 처리능력의 부족으로 이제까지는 신호처리 알고리즘의 실시간 구현을 위하여 대개 전용 프로세서나 디지털 신호처리를 위한 전용 명령어가 하드웨어적으로 구현되어 있는 프로세서를 사용하여 왔다. 하지만 현재 범용 프로세서의 주파수 속도가 빨라짐에 따라 복잡한 디지털 신호처리 알고리즘을 실시간에 처리할 수 있게 되었다. 하지만 정수형 연산 프로세서에서의 부동 소수점 연산은 프로세서에서 실시간 처리에 많은 어려움을 주게 된다. 본 연구에서는 데이터 타입이 고정된 범용 정수형 연산 프로세서(ARM RISC 32bit CPU)를 가지고 부동 소수점 연산 알고리즘을 고정 소수점 연산형으로 바꾸어서 속도측면과 메모리 측면의 성능을 비교해 보았다.
PDF

224-bit ECC Processor supporting the NIST P-224 elliptic curve (NIST P-224 타원곡선을 지원하는 224-비트 ECC 프로세서)

Park, Byung-Gwan;Shin, Kyung-Wook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2017.05a
- /
- pp.188-190
- /
- 2017
투영(projective) 좌표계를 이용한 스칼라 곱셈(scalar multiplication) 연산을 지원하는 224-비트 타원곡선 암호(Elliptic Curve Cryptography; ECC) 프로세서의 설계에 대해 기술한다. 소수체 GF(p)상의 덧셈, 뺄셈, 곱셈 등의 유한체 연산을 지원하며, 연산량과 하드웨어 자원소모가 큰 나눗셈 연산을 제거함으로써 하드웨어 복잡도를 감소시켰다. 수정된 Montgomery ladder 알고리듬을 이용하여 스칼라 곱셈 연산을 제어하였으며, 단순 전력분석에 보다 안전하다. 스칼라 곱셈 연산은 최대 2,615,201 클록 사이클이 소요된다. 설계된 ECC-P224 프로세서는 Xilinx ISim을 이용한 기능검증을 하였다. Xilinx Virtex5 FPGA 디바이스 합성결과 7,078 슬라이스로 구현되었으며, 최대 79 MHz에서 동작하였다.
PDF

Systolic Architecture for Efficient Power-Sum Operation in GF(2$^{m}$ ) (GF(2$^{m}$ )상에서 효율적인 Power-Sum 연산을 위한 시스톨릭 구조의 설계)

김남연;김현성;이원호;김기원;유기영
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 2001.11a
- /
- pp.293-296
- /
- 2001
본 논문은 GF(2$^{m}$ )상에서 파워썸 연산을 수행하는데 필요한 새로운 알고리즘과 그에 따른 병렬 입/출력 구조를 제안한다. 새로운 알고리즘은 최상위 비트 우선 구조를 기반으로 하고, 제안된 구조는 기존의 구조에 비해 낮은 하드웨어 복잡도와 적은 지연을 가진다. 이는 역원과 나눗셈 연산을 위한 기본 구조로 사용될 수 있으며 암호 프로세서 칩 디자인의 기본 구조로 이용될 수 있고, 또한 단순성, 규칙성과 병렬성으로 인해 VLSI 구현에 적합하다.
PDF

Search Result 1,176, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)