• Title/Summary/Keyword: 연산시간 감소

Search Result 400, Processing Time 0.032 seconds

A Low-Complexity Alamouti Space-Time Transmission Scheme for Asynchronous Cooperative Systems (비동기 협력 통신 시스템을 위한 저복잡도 Alamouti 시공간 전송 기법)

  • Lee, Young-Po;Chong, Da-Hae;Lee, Young-Yoon;Song, Chong-Han;Yoon, Seok-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.5C
    • /
    • pp.479-486
    • /
    • 2010
  • In this paper, we propose a novel low-complexity Alamouti coded orthogonal frequency division multiplexing (OFDM) scheme for asynchronous cooperative communications. Exploiting the combination of OFDM symbols at the source node and simple operations including sign change and complex product at the relay node, the proposed scheme can achieve cooperative diversity gain without use of time-reversion and shifting operations that the conventional scheme proposed by Li and Xia needs. In addition, by using the cyclic prefix (CP) removal and insertion operations at the relay node, the proposed scheme does not suffer from a considerable degradation of bit-error-rate (BER) performance even though perfect timing synchronization is not achieved at the relay node. From the simulation results, it is demonstrated that the BER performance of the proposed scheme is much superior to that of the conventional scheme in the presence of timing synchronization error at the relay node. It is also shown that the proposed scheme obtains two times higher diversity gain compared with the conventional scheme at the cost of half reduction in transmission efficiency.

Applying Static Analysis to Improve Performance of Programs using Flash Memory Storage (플래시 메모리 저장 장치를 사용하는 프로그램의 성능 향상을 위한 정적 분석 기법의 응용)

  • Paik, Joon-Young;Cho, Eun-Sun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.12
    • /
    • pp.1177-1187
    • /
    • 2010
  • Flash memory becomes popular storage for small devices due to its efficiency, portability, low power consumption and large capacity. Unlike on hard disks, however, write operation on flash memory is much more expensive than read operation, so that it is critical for performance enhancement to reduce the number of executions of write operation. This paper proposes static analysis to rewrite a program to reduce the total number of write operations by merging writable data in a minimum number of pages. To achieve this, we collect information about writable areas by static analysis, and about frequently executed paths by profiling for practicality, and combine both to rewrite the application program to reallocate data. The performance enhancement gained from the proposed methods is shown using a FAST simulator.

2D/3D image Conversion Method using Simplification of Level and Reduction of Noise for Optical Flow and Information of Edge (Optical flow의 레벨 간소화 및 노이즈 제거와 에지 정보를 이용한 2D/3D 변환 기법)

  • Han, Hyeon-Ho;Lee, Gang-Seong;Lee, Sang-Hun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.2
    • /
    • pp.827-833
    • /
    • 2012
  • In this paper, we propose an improved optical flow algorithm which reduces computational complexity as well as noise level. This algorithm reduces computational time by applying level simplification technique and removes noise by using eigenvectors of objects. Optical flow is one of the accurate algorithms used to generate depth information from two image frames using the vectors which track the motions of pixels. This technique, however, has disadvantage of taking very long computational time because of the pixel-based calculation and can cause some noise problems. The level simplifying technique is applied to reduce the computational time, and the noise is removed by applying optical flow only to the area of having eigenvector, then using the edge image to generate the depth information of background area. Three-dimensional images were created from two-dimensional images using the proposed method which generates the depth information first and then converts into three-dimensional image using the depth information and DIBR(Depth Image Based Rendering) technique. The error rate was obtained using the SSIM(Structural SIMilarity index).

Area Efficient Hardware Design for Performance Improvement of SAO (SAO의 성능개선을 위한 저면적 하드웨어 설계)

  • Choi, Jisoo;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.391-396
    • /
    • 2013
  • In this paper, for HEVC decoding, an SAO hardware design with less processing time and reduced area is proposed. The proposed SAO hardware architecture introduces the design processing $8{\times}8$ CU to reduce the hardware area and uses internal registers to support $64{\times}64$ CU processing. Instead of previous top-down block partitioning, it uses bottom-up block partitioning to minimize the amount of calculation and processing time. As a result of synthesizing the proposed architecture with TSMC $0.18{\mu}m$ library, the gate area is 30.7k and the maximum frequency is 250MHz. The proposed SAO hardware architecture can process the decode of a macroblock in 64 cycles.

An Efficient Concurrency Control Algorithm for Multi-dimensional Index Structures (다차원 색인구조를 위한 효율적인 동시성 제어기법)

  • 김영호;송석일;유재수
    • Journal of KIISE:Databases
    • /
    • v.30 no.1
    • /
    • pp.80-94
    • /
    • 2003
  • In this paper. we propose an enhanced concurrency control algorithm that minimizes the query delay efficiently. The factors that delay search operations and deteriorate the concurrency of index structures are node splits and MBR updates in multi dimensional index structures. In our algorithm, to reduce the query delay by split operations, we optimize exclusive latching time on a split node. It holds exclusive latches not during whole split time but only during physical node split time that occupies small part of whole split time. Also to avoid the query delay by MBR updates we introduce partial lock coupling(PLC) technique. The PLC technique increases concurrency by using lock coupling only in case of MBR shrinking operations that are less frequent than MBR expansion operations. For performance evaluation, we implement the proposed algorithm and one of the existing link technique-based algorithms on MIDAS-III that is a storage system of a BADA-III DBMS. We show through various experiments that our proposed algorithm outperforms the existing algorithm In terms of throughput and response time.

Hardware Design of Arccosine Function for Mobile Vector Graphics Processor (모바일 벡터 그래픽 프로세서용 역코사인 함수의 하드웨어 설계)

  • Choi, Byeong-Yoon;Lee, Jong-Hyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.4
    • /
    • pp.727-736
    • /
    • 2009
  • In this paper, the $arccos(cos^{-1})$ arithmetic unit for mobile graphics accelerator is designed. The mobile vector graphics applications need tight area, execution time, power dissipation, and accuracy constraints compared to desktop PC applications. The designed processor adopts 2nd-order polynomial approximation scheme based on IEEE floating point data format to satisfy speed and accuracy conditions and reduces area via hardware sharing structure. The arccosine processor consists of 15,280 gates and its estimated operating frequency is about 125Mhz at operating condition of $0.35{\mu}m$ CMOS technology. Because the processor can execute arccosine function within 7 clock cycles, it has about 17 MOPS(million arccos operations per second) execution rate and can be applicable to mobile OpenVG processor. And because of its flexible architecture, it can be applicable to the various transcendental functions such as exponential, trigonometric and logarithmic functions via replacement of ROM and minor hardware modification.

Reduction of Computing Time in Aircraft Control by Delta Operating Singular Perturbation Technique (델타연산자 섭동방법에 의한 항공기 동력학의 연산시간 감소)

  • Sim, Gyu Hong;Sa, Wan
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.31 no.3
    • /
    • pp.39-49
    • /
    • 2003
  • The delta operator approach and the singular perturbation technique are introduced. The former reduces the round-off error in the numerical computation. The latter reduces computing time by decoupling the original system into the fast and slow sub-systems. The aircraft dynamics consists of the Phugoid and short-period motions whether its model is longitudinal or lateral. In this paper, an approximated solutions of lateral dynamic model of Beaver obtained by using those two methods in compared with the exact solution. For open-loop system and closed-loop system, and approximated solution gets identical to the exact solution with only one iteration and without iteration, respectively. Therefore, it is shown that implementing those approaches is very effective in the flight dynamic and control.

Multiple Supply Voltage Scheduling Techniques for Minimal Energy Consumption (에너지 소모 최소화를 위한 다중 전압 스케줄링 기법)

  • Jeong, Woo-Sung;Shin, Hyun-Chul
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.9
    • /
    • pp.49-57
    • /
    • 2009
  • In this paper, we propose a multiple voltage scheduling method which reduces energy consumption considering both timing constraints and resource constraints. In the other multiple voltage scheduling techniques, high voltage is assigned to operations in the longest path and low voltage is assigned to operations that are not on the longest path. However, in those methods, voltages are assigned to specific operations restrictively. We use a simulated annealing technique, in which several voltages are assigned to specific operations flexibly regardless of whether they are on the longest path. In this paper, a post processing algorithm is proposed to further reduce the energy consumption. In some cases, designers may want to reduce the level shifters. To make tradeoff between the total energy and the number (or energy) of level shifters weighted term can be added to the cost function. When the level shifter energy is weighted six times, for example, the number of level shifters is reduced by about 24% and their energy consumption is reduced by about 20%.

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing (터커 분해 및 은닉층 병렬처리를 통한 임베디드 시스템의 다중 DNN 가속화 기법)

  • Kim, Ji-Min;Kim, In-Mo;Kim, Myung-Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.842-849
    • /
    • 2022
  • With the development of deep learning technology, there are many cases of using DNNs in embedded systems such as unmanned vehicles, drones, and robotics. Typically, in the case of an autonomous driving system, it is crucial to run several DNNs which have high accuracy results and large computation amount at the same time. However, running multiple DNNs simultaneously in an embedded system with relatively low performance increases the time required for the inference. This phenomenon may cause a problem of performing an abnormal function because the operation according to the inference result is not performed in time. To solve this problem, the solution proposed in this paper first reduces the computation by applying the Tucker decomposition to DNN models with big computation amount, and then, make DNN models run in parallel as much as possible in the unit of hidden layer inside the GPU. The experimental result shows that the DNN inference time decreases by up to 75.6% compared to the case before applying the proposed technique.

Design of Parallel Decimal Multiplier using Limited Range of Signed-Digit Number Encoding (제한된 범위의 Signed-Digit Number 인코딩을 이용한 병렬 십진 곱셈기 설계)

  • Hwang, In-Guk;Kim, Kanghee;Yoon, WanOh;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.3
    • /
    • pp.50-58
    • /
    • 2013
  • In this paper, parallel decimal fixed-point multiplier which uses the limited range of Singed-Digit number encoding and the reduction step is proposed. The partial products are generated without carry propagation delay by encoding a multiplicand and a multiplier to the limited range of SD number. With the limited range of SD number, the proposed multiplier can improve the partial product reduction step by increasing the number of possible operands for multi-operand SD addition. In order to estimate the proposed parallel decimal multiplier, synthesis is implemented using Design Compiler with SMIC 180nm CMOS technology library. Synthesis results show that the delay of proposed parallel decimal multiplier is reduced by 4.3% and the area by 5.3%, compared to the existing SD parallel decimal multiplier. Despite of the slightly increased delay and area of partial product generation step, the total delay and area are reduced since the partial product reduction step takes the most proportion.