• Title/Summary/Keyword: SIMD instruction

Search Result 81, Processing Time 0.023 seconds

Fall detection based on acceleration sensor attached to wrist using feature data in frequency space (주파수 공간상의 특징 데이터를 활용한 손목에 부착된 가속도 센서 기반의 낙상 감지)

  • Roh, Jeong Hyun;Kim, Jin Heon
    • Smart Media Journal
    • /
    • v.10 no.3
    • /
    • pp.31-38
    • /
    • 2021
  • It is hard to predict when and where a fall accident will happen. Also, if rapid follow-up measures on it are not performed, a fall accident leads to a threat of life, so studies that can automatically detect a fall accident have become necessary. Among automatic fall-accident detection techniques, a fall detection scheme using an IMU (inertial measurement unit) sensor attached to a wrist is difficult to detect a fall accident due to its movement, but it is recognized as a technique that is easy to wear and has excellent accessibility. To overcome the difficulty in obtaining fall data, this study proposes an algorithm that efficiently learns less data through machine learning such as KNN (k-nearest neighbors) and SVM (support vector machine). In addition, to improve the performance of these mathematical classifiers, this study utilized feature data aquired in the frequency space. The proposed algorithm analyzed the effect by diversifying the parameters of the model and the parameters of the frequency feature extractor through experiments using standard datasets. The proposed algorithm could adequately cope with a realistic problem that fall data are difficult to obtain. Because it is lighter than other classifiers, this algorithm was also easy to implement in small embedded systems where SIMD (single instruction multiple data) processing devices were difficult to mount.

Accelerating Symmetric and Asymmetric Cryptographic Algorithms with Register File Extension for Multi-words or Long-word Operation (다수 혹은 긴 워드 연산을 위한 레지스터 파일 확장을 통한 대칭 및 비대칭 암호화 알고리즘의 가속화)

  • Lee Sang-Hoon;Choi Lynn
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.2 s.308
    • /
    • pp.1-11
    • /
    • 2006
  • In this paper, we propose a new register file architecture called the Register File Extension for Multi-words or Long-word Operation (RFEMLO) to accelerate both symmetric and asymmetric cryptographic algorithms. Based on the idea that most of cryptographic algorithms heavily use multi-words or long-word operations, RFEMLO allows multiple contiguous registers to be specified as a single operand. Thus, a single instruction can specify a SIMD-style multi-word operation or a long-word operation. RFEMLO can be applied to general purpose processors by adding instruction set for multi-words or long-word operands and functional units for additional instruction set. To evaluate the performance of RFEMLO, we use Simplescalar/ARM 3.0 (with gcc 2.95.2) and run detailed simulations on various symmetric and asymmetric cryptographic algorithms. By applying RFEMLO, we could get maximum 62% and 70% reductions in the total instruction count of symmetric and asymmetric cryptographic algorithms respectively. Also, performance results show that a speedup of 1.4 to 2.6 can be obtained in symmetric cryptographic algorithms and a speedup of 2.5 to 3.3 can be obtained for asymmetric cryptographic algorithms when we apply RFEMLO to a processor with an in-order pipeline. We also found that RFEMLO can effectively improve the performance of these cryptographic algorithms with much less cost compared to issue-width increase available in Superscalar implementations. Moreover, the RFEMLO can also be applied to Superscalar processor, leading to additional 83% and 138% performance gain in symmetric and asymmetric cryptographic algorithms.

Fast Distributed Video Coding using Parallel LDPCA Encoding (LDPCA 병렬 부호화를 이용한 고속 분산비디오부호화)

  • Park, Jongbin;Kim, Jaehwan;Jeon, Byeungwoo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.11a
    • /
    • pp.136-137
    • /
    • 2010
  • 본 논문에서는 고속, 저전력 비디오 부호화에 적합한 변환영역 Wyner-Ziv 분산비디오부호화기를 더욱 고속화하기 위한 병렬처리 방법을 제안한다. 기존에는 변환영역 Wyner-Ziv 분산비디오부호화를 위해 양자화 정보를 비트플레인단위로 분해후 이를 순차적으로 LDPCA 부호화하여 전체 부호화기 연산량에서 LDPCA의 복잡도가 약 54% 정도 차지하였고, 이는 고비트율로 부호화 할수록 더욱 증가하였다. 제안방법은 이를 개선하기 위해 여러 개의 비트플레인을 하나의 심벌 (symbol)로 묶어서 LDPCA 부호화를 수행하여 한 번의 연산으로 여러 개의 데이터를 동시에 처리할 수 있게 한다. 일종의 단일 명령 복수 데이터 처리 (SIMD, Single instruction, multiple data)에 의한 고속화 방법이다. 이를 통해 제안방법은 기존의 순차적 처리 방법에 비해 저비트율에서는 8배, 고비트율에서는 55배까지 LDPCA의 부호화 속도를 향상시켰다. 결과적으로 전체 부호화에서 LDPCA의 상대적인 복잡도 비율은 4%정도로 낮아지게 되었으며 Wyner-Ziv 영상의 부호화 속도도 약 1.5 ~ 2배까지 향상되었다. 제안방법은 LDPCA를 사용하는 다른 Wyner-Ziv 분산비디오부호화 구조에도 적용 가능할 것으로 기대한다.

  • PDF

Design and implementation of the SliM image processor chip (SliM 이미지 프로세서 칩 설계 및 구현)

  • 옹수환;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.10
    • /
    • pp.186-194
    • /
    • 1996
  • The SliM (sliding memory plane) array processor has been proposed to alleviate disadvantages of existing mesh-connected SIMD(single instruction stream- multiple data streams) array processors, such as the inter-PE(processing element) communication overhead, the data I/O overhead and complicated interconnections. This paper presents the deisgn and implementation of SliM image processor ASIC (application specific integrated circuit) chip consisting of mesh connected 5 X 5 PE. The PE architecture implemented here is quite different from the originally proposed PE. We have performed the front-end design, such as VHDL (VHSIC hardware description language)modeling, logic synthesis and simulation, and have doen the back-end design procedure. The SliM ASIC chip used the VTI 0.8$\mu$m standard cell library (v8r4.4) has 55,255 gates and twenty-five 128 X 9 bit SRAM modules. The chip has the 326.71 X 313.24mil$^{2}$ die size and is packed using the 144 pin MQFP. The chip operates perfectly at 25 MHz and gives 625 MIPS. For performance evaluation, we developed parallel algorithms and the performance results showed improvement compared with existing image processors.

  • PDF

Architecture Exploration of Optimal Many-Core Processors for a Vector-based Rasterization Algorithm (래스터화 알고리즘을 위한 최적의 매니코어 프로세서 구조 탐색)

  • Son, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.1
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we implement and evaluate the performance of a vector-based rasterization algorithm for 3D graphics by using a SIMD (single instruction multiple data) many-core processor architecture. In addition, we evaluate the impact of a data-per-processing elements (DPE) ratio that is defined as the amount of data directly mapped to each processing element (PE) within many-core in terms of performance, energy efficiency, and area efficiency. For the experiment, we utilize seven different PE configurations by varying the DPE ratio (or the number PEs), which are implemented in the same 130 nm CMOS technology with a 500 MHz clock frequency. Experimental results indicate that the optimal PE configuration is achieved as the DPE ratio is in the range from 16,384 to 256 (or the number of PEs is in the range from 16 and 1,024), which meets the requirements of mobile devices in terms of the optimal performance and efficiency.

Design and Implementation of Realtime MPEG-2 to MPEG-4 Transcoder (실시간 MPEG-2 to MPEG-4 트랜스코더의 설계 및 구현)

  • Kim Je Woo;Kim Yong-Hwan;Kim Tae-Wan;Choi Beong-Ho
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2003.11a
    • /
    • pp.143-146
    • /
    • 2003
  • 최근 디지털 당송과 이동통신 단말기의 대중화가 이루어짐에 따라 고화질 고해상도의 멀티미디어 컨텐츠의 이동통신 단말기에서의 재생 서비스에 대한 수요가 증가하고 있다 이동통신 단말기에서 멀티미디어 컨텐츠 재생 서비스를 제공하기 위해서는 디지털 방송 컨텐츠를 단말기에 적합한 컨텐츠로 변환할 필요가 있다. 본 논문은 디지털 방송 규격인 MPEG-2 컨텐츠를 이동통신 단말기에서 지원하는 MPEG-4 SP(Simple Profile) 컨텐츠로 실시간으로 변환하는 트랜스 코더에 대한 설계와 구현 기술을 제안한다. 구현된 트래스코더는 화질 유지와 계산량 감소를 위한 적응적 움직임벡터 재구성, 매크로블록 모드 선택, 그리고 움직임벡터 scaling 등의 알고리즘을 포함하고, 인텔사에서 제공하는 SIMD(Single Instruction Multiple Data) 명령어를 이용하여 최적화되었다. 트랜스코더는 30fps, 8Mbps, $720\times480$ 해상도의 멀티미디어 컨텐츠를 다양한 비트율의 30fps, $352\times240$ 해상도의 MPEG-4 컨텐츠로 실시간 변환할 수 있다.

  • PDF

Design of an Image Processing ASIC Architecture using Parallel Approach with Zero or Little (통신부담을 감소시킨 영상처리를 위한 병렬처리 방식 ASIC구조 설계)

  • 안병덕;정지원;선우명훈
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.10
    • /
    • pp.2043-2052
    • /
    • 1994
  • This paper proposes a new parallel ASIC architecture for real-time image processing to reduce inter-processing element (inter-PE) communication overhead, called a Sliding Memory Plane (SliM) Image Processor. The Slim Image Processor consists of $3\times3$ processing elements (PEs) connected by a mesh topology. With easy scalability due to the topology. a set of SliM Image Processors can form a mesh-connected SIMD parallel architecture. called the SliM Array Processor. The idea of sliding means that all pixels are slided into all neighboring PEs without interrupting PEs and without a coprocessor or a DMA controller. Since the inter-PE communication and computation occur simultaneously. the inter-PE communication overhead, significant disadvantage of existing machines greatly diminishes. Two I/O planes provide a buffering capability and reduce the date I/O overhead. In addition, using the by-passing path provides eight-way connectivity even with four links. with these salient features. SliM shows a significant performance improvement. This paper presents architectures of a PE and the SliM Image Processor, and describes the design of an instruction set.

  • PDF

A Study on Ray Tracing Method for Wave Propagation Prediction with Acceleration Methods (가속 방법을 이용하는 전파 광선 추적법에 관한 연구)

  • Kwon, Se-Woong;Moon, Hyun-Wook;Oh, Jae-Rim;Lim, Jae-Woo;Bae, Seok-Hee;Kim, Young-Gyu;Park, Joung-Soo;Yoon, Young-Joong
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.20 no.5
    • /
    • pp.471-479
    • /
    • 2009
  • In this paper, we proposed an improved ray tracing method with an amelioration of visible tree structure, a visible face determination method, and non-uniform random test point method. In a proposed visible tree structure, it reduces tree nodes by means of merging similar nodes. In a visible face determination method, it shows that a ray hit test with a packet ray method can reduce a test time. A ray tracing method involving with a packet ray hit test method can improve a tree construction time up to 3.3 times than a ray tracing method with a single ray hit test method. Furthermore, by seeding a non-uniform and random test point on a face, tree construction time is improved up to 1.11 times. Received powers from the proposed ray tracing results and measured results have good agreement with 1.9 dB RMS error.

Design Space Exploration of Many-Core Processor for High-Speed Cluster Estimation (고속의 클러스터 추정을 위한 매니코어 프로세서의 디자인 공간 탐색)

  • Seo, Jun-Sang;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.1-12
    • /
    • 2014
  • This paper implements and improves the performance of high computational subtractive clustering algorithm using a single instruction, multiple data (SIMD) based many-core processor. In addition, this paper implements five different processing element (PE) architectures (PEs=16, 64, 256, 1,024, 4,096) to select an optimal PE architecture for the subtractive clustering algorithm by estimating execution time and energy efficiency. Experimental results using two different medical images and three different resolutions ($128{\times}128$, $256{\times}256$, $512{\times}512$) show that PEs=4,096 achieves the highest performance and energy efficiency for all the cases.

Multi-Sever based Distributed Coding based on HEVC/H.265 for Studio Quality Video Editing

  • Kim, Jongho;Lim, Sung-Chang;Jeong, Se-Yoon;Kim, Hui-Yong
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.201-208
    • /
    • 2018
  • High Efficiency Video Coding range extensions (HEVC RExt) is a kind of extension model of HEVC. HEVC RExt was specially designed for dealing the high quality images. HEVC RExt is very essential for studio editing which handle the very high quality and various type of images. There are some problems to dealing these massive data in studio editing. One of the most important procedure is re-encoding and decoding procedure during the editing. Various codecs are widely used for studio data editing. But most of the codecs have common problems to dealing the massive data in studio editing. First, the re-encoding and decoding processes are frequently occurred during the studio data editing and it brings enormous time-consuming and video quality loss. This paper, we suggest new video coding structure for the efficient studio video editing. The coding structure which is called "ultra-low delay (ULD)". It has the very simple and low-delayed referencing structure. To simplify the referencing structure, we can minimize the number of the frames which need decoding and re-encoding process. It also prevents the quality degradation caused by the frequent re-encoding. Various fast coding algorithms are also proposed for efficient editing such as tool-level optimization, multi-serve based distributed coding and SIMD (Single instruction, multiple data) based parallel processing. It can reduce the enormous computational complexity during the editing procedure. The proposed method shows 9500 times faster coding speed with negligible loss of quality. The proposed method also shows better coding gain compare to "intra only" structure. We can confirm that the proposed method can solve the existing problems of the studio video editing efficiently.