• Title/Summary/Keyword: floating-point

Search Result 495, Processing Time 0.029 seconds

Design of a High-Performance Mobile GPGPU with SIMT Architecture based on a Small-size Warp Scheduler (작은 크기의 Warp 스케쥴러 기반 SIMT구조 고성능 모바일 GPGPU 설계)

  • Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.25 no.3
    • /
    • pp.479-484
    • /
    • 2021
  • This paper proposed and designed a structure to achieve high performance with a small number of cores in GPGPU with SIMT structure. GPGPU for application to mobile devices requires a structure to increase performance compared to power consumption. In order to reduce power consumption, the number of cores decreased, but to improve performance, the size of the warp scheduler for managing threads was set to 4, which was greatly reduced than 32 of general GPGPU. Reducing warp size can reduce the number of idle cycles in pipelines and efficiently apply memory latency to reduce miss penalty when accessing cache memory. The designed GPGPU measured computational performance using a test program that includes floating point operations and measured power consumption through a 28nm CMOS process to obtain 104.5GFlops/Watt as a performance per power. The results of this paper showed about four times better performance per power compared to Tegra K1 of Nvidia

Adaptive Wavelet Transform for Hologram Compression (홀로그램 압축을 위한 적응적 웨이블릿 변환)

  • Kim, Jin-Kyum;Oh, Kwan-Jung;Kim, Jin-Woong;Kim, Dong-Wook;Seo, Young-Ho
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.143-154
    • /
    • 2021
  • In this paper, we propose a method of compressing digital hologram standardized data provided by JPEG Pleno. In numerical reconstruction of digital holograms, the addition of random phases for visualization reduces speckle noise due to interference and doubles the compression efficiency of holograms. Holograms are composed of completely complex floating point data, and due to ultra-high resolution and speckle noise, it is essential to develop a compression technology tailored to the characteristics of the hologram. First, frequency characteristics of hologram data are analyzed using various wavelet filters to analyze energy concentration according to filter types. Second, we introduce the subband selection algorithm using energy concentration. Finally, the JPEG2000, SPIHT, H.264 results using the Daubechies 9/7 wavelet filter of JPEG2000 and the proposed method are used to compress and restore, and the efficiency is analyzed through quantitative quality evaluation compared to the compression rate.

Evaluation of Punching Shear Safety of a Two-Way Void Plywood Slab System with Form (거푸집 패널이 부착된 2방향 중공슬래브의 뚫림 전단 안전성 평가)

  • Hur, Moo-Won;Woo, Hyung-Sik;Park, Jung-Min;Kang, Hyun-Wook;Park, Tae-Won
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.25 no.5
    • /
    • pp.182-189
    • /
    • 2021
  • VPS(Void Plywood Slab System, VPS) has optimized the shape of the hollow material. In addition, it has a function to prevent the floating of the hollow material and the separation due to the working load. In this study, the punching shear capacity of flat plate was performed using Void Plywood Slab System with form work panel proposed in the previous study. As a result of the test, the strength of the VSPS specimen in which the hollow material was placed beyond 2.0 times the column width from the loading point was reduced by 9.4% compared to the reference specimen. However, the strength value was about 1.57 times higher than the design value suggested by KBC 2016. It was found that there was no change in stiffness compared to the reference specimen until shear failure occurred in the VSPS specimen in which the hollow material was placed. It can be seen that this experiment is being destroyed by shear as the flexural reinforcing bars are sufficiently reinforced.

Performance Analysis of Short Baseline Integer PPP (IPPP) for Time Comparison

  • Lee, Young Kyu;Yang, Sung-hoon;Lee, Ho Seong;Lee, Jong Koo;Hwang, Sang-wook;Rhee, Joon Hyo
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.10 no.4
    • /
    • pp.379-385
    • /
    • 2021
  • In order to synchronize a remote system time to the reference time like Coordinated Universal Time (UTC), it is required to compare the time difference between the two clocks. GNSS Precise Point Positioning (PPP) is one of the most general geodetic positioning methods and can be used for time and frequency transfer applications which require more precise time comparison performance than GNSS code. However, the PPP technique has a main drawback of day-boundary discontinuity which comes from the PPP model that the code measurements are applied to resolve the floating carrier-phase ambiguities. The Integer PPP (IPPP) technique is one of the methods which has been studied to compensate the day-boundary discontinuities exited in the conventional PPP. In this paper, we investigate the time and frequency capabilities of PPP and IPPP by using the measurement data obtained from two time transfer receivers which are closely located and using common reference 1 Pulse Per Second (PPS) and RF signals. From the experiment, it is investigated that the IPPP method can effectively compensate the day-boundary discontinuities without producing frequency offset. However, the PPP method can generating frequency offset which can severely degrade the time comparison performance with long-term period data.

A Study on LIT Girder Performance Improvement (LIT 거더 성능 개선에 대한 연구)

  • Kim, Sung;Park, Sungjin
    • Journal of Urban Science
    • /
    • v.11 no.2
    • /
    • pp.19-24
    • /
    • 2022
  • Conventional RC beams for crossing small and medium-sized rivers do not have a cross-sectional area, so the floating debris is accumulated and disasters such as damage to bridges occur. To improve this, the PSC method was invented. However, this also had problems such as transverse curvature, increase in dead weight due to cross-sectional shape, and negative moment generated during serialization, so it was necessary to develop a new type of girder. Therefore, it was intended to propose a LIT(Leton Interaction Thrust) girder bridge that is safer and has better performance than the conventional PSC girder with improved section efficiency. Unlike existing girder bridges, the LIT girder has the feature that the change in the strands of the entire girder occurs only in the vertical direction when the first tension is applied because the tendon arrangement is symmetrical by applying the raised portion. In addition, slab continuation generates a secondary moment that is advantageous to the continuous point, effectively controlling the negative moment and preventing the corrosion of the tendon. The dimensions of the cross section were determined, and the arrangement of the strands was designed to conduct structural analysis and detailed analysis. As a result of the structural analysis, the stress of the girder showed results within the allowable compressive stress, and the deflection showed the result within the allowable deflection. showed results. In addition, a detailed analysis was performed to examine the stress distribution around the girder body and the anchorage area and the stress distribution of the embossed portion, and as a result, the stress of the girder body due to the tension force showed a stable level.

A Comprehensive Survey of Lightweight Neural Networks for Face Recognition (얼굴 인식을 위한 경량 인공 신경망 연구 조사)

  • Yongli Zhang;Jaekyung Yang
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.55-67
    • /
    • 2023
  • Lightweight face recognition models, as one of the most popular and long-standing topics in the field of computer vision, has achieved vigorous development and has been widely used in many real-world applications due to fewer number of parameters, lower floating-point operations, and smaller model size. However, few surveys reviewed lightweight models and reimplemented these lightweight models by using the same calculating resource and training dataset. In this survey article, we present a comprehensive review about the recent research advances on the end-to-end efficient lightweight face recognition models and reimplement several of the most popular models. To start with, we introduce the overview of face recognition with lightweight models. Then, based on the construction of models, we categorize the lightweight models into: (1) artificially designing lightweight FR models, (2) pruned models to face recognition, (3) efficient automatic neural network architecture design based on neural architecture searching, (4) Knowledge distillation and (5) low-rank decomposition. As an example, we also introduce the SqueezeFaceNet and EfficientFaceNet by pruning SqueezeNet and EfficientNet. Additionally, we reimplement and present a detailed performance comparison of different lightweight models on the nine different test benchmarks. At last, the challenges and future works are provided. There are three main contributions in our survey: firstly, the categorized lightweight models can be conveniently identified so that we can explore new lightweight models for face recognition; secondly, the comprehensive performance comparisons are carried out so that ones can choose models when a state-of-the-art end-to-end face recognition system is deployed on mobile devices; thirdly, the challenges and future trends are stated to inspire our future works.

MP3 Encoder Chip Design Based on HW/SW Co-Design (하드웨어 소프트웨어 Co-Design을 통한 MP3 부호화 칩 설계)

  • Park Jong-In;Park Ju Sung;Kim Tae-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.2
    • /
    • pp.61-71
    • /
    • 2006
  • An MP3 encoder chip has been designed and fabricated with the hardware and software co-design concepts. In the aspect of the software. the calculation cycles of the distortion control loop. which requires most of the calculation cycles in MP3 encoding procedure. have been reduced to $67\%$ of the original algorithm through the 'scale factor Pre-calculation'. By using a floating Point 32 bit DSP core and designing the FFT block with the hardware. we can get the additional reduction of the calculation cycles in addition to the software optimization. The designed chip has been verified using HW emulation and fabricated via 0.25um CMOS technology The fabricated chip has the size of $6.2{\time}6.2mm^2$ and operates normally on the test board in the qualitative and quantitative aspect.

Three-Dimensional Convolutional Vision Transformer for Sign Language Translation (수어 번역을 위한 3차원 컨볼루션 비전 트랜스포머)

  • Horyeor Seong;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.3
    • /
    • pp.140-147
    • /
    • 2024
  • In the Republic of Korea, people with hearing impairments are the second-largest demographic within the registered disability community, following those with physical disabilities. Despite this demographic significance, research on sign language translation technology is limited due to several reasons including the limited market size and the lack of adequately annotated datasets. Despite the difficulties, a few researchers continue to improve the performacne of sign language translation technologies by employing the recent advance of deep learning, for example, the transformer architecture, as the transformer-based models have demonstrated noteworthy performance in tasks such as action recognition and video classification. This study focuses on enhancing the recognition performance of sign language translation by combining transformers with 3D-CNN. Through experimental evaluations using the PHOENIX-Wether-2014T dataset [1], we show that the proposed model exhibits comparable performance to existing models in terms of Floating Point Operations Per Second (FLOPs).

A Study on Characteristics of Jinsatak(陳士鐸)'s Clinic Theory (진사탁(陳士鐸) 임상 이론의 특징에 관한 연구)

  • Jeong, Kyung-Ho;Kim, Ki-Wook;Park, Hyun-Guk
    • Journal of Korean Medical classics
    • /
    • v.22 no.3
    • /
    • pp.31-51
    • /
    • 2009
  • The characteristics of Jin's ideas on clinic theory can be arranged as follows. 1. Jin emphasized warming and tonifying[溫補] in treatment and the part that shows this the best is the taking care of[調理] the Vital gate[命門], kidney, liver, and spleen. His ideas were based on his understanding of a human life's origin, and was influenced by Seolgi(薛己), Joheon-ga(趙獻可) and Janggaebin(張介賓)'s Vital gate and source Gi theory(元氣說) so scholastically, he has that in common with them but was later criticized by later doctors such as Oksamjon(玉三尊) as an 'literary doctor(文字醫)' who followed the ideas of "Uigwan(醫貫)". 2. The warming and tonifying school[溫補學派], who were influenced by Taoism, said in their theory of disease outbreak[發病學說] that since one must not hurt one's Yin essence and Yang fire [陰精陽火] there is more deficiency than excess, so that was why they used tonifying methods. Jin was also like them and this point of view is universal in internal medicine, gynecology, pediatric medicine and surgery and so on. 3. Jin, who saw the negative form of pulse diagnosis[診脈] emphasized following symptoms over pulse diagnosis using the spirit of ‘finding truth based on truth[實事求是]' in "Maekgyeolcheonmi(脈訣闡微)", but emphasized 'the combination of pulse and symptoms[脈證合參]'. He understood pulse diagnosis as a defining tool for symptoms, and in "Seoksilbirok(石室秘錄)" simplified pulse diagnosis into 10 methods : floating/sunken(浮沉), slow/fast(遲數), large/fine(大小), vacuous/replete(虛實) and slippery/rough(滑澀). 4. Jin used 'large formulas(大方)' a lot that usually featured a large dose, and in " Bonchosinpyeon(本草新編)" he thought of the seven formulas(七方) and ten preparations(十劑) as the standard when using medicine. He did away with old customs and presented a 'new(新)' and 'extra(奇)' point of view. He especially used a lot of Insam(人蔘) when tonifying Gi and Geumeunhwa(金銀花) when treating sores and ulcers. 5. In the area of surgery Jin gave priority to the early finding and treatment of disease with internal treatment[內治] and was against the overuse of acupuncture. However records of surgical measures in a special situation like lung abscesses(肺癰) and liver abscesses(肝癰), and anesthetic measures using 'Manghyeongju(忘形酒)' and 'Singoiyak(神膏異藥)' and opening the abdomen or skull, and organ transplants using a dog's tongue are important data. 6. Jin stated the diseases of Gi and blood broadly. Especially in the principles of treating blood, blood diseases had to be forwarded[順] and Gi regulation[理氣] was the number one priority and stated the following two treatments. First, in "Jeonggiinhyeolpyeon(精氣引血篇)" of volume 6 of "Oegyeongmieon(外經微言)", for the rules for treating blood he stated the pattern identification of finding Gi in blood and blood in Gi. Second, he emphasized Gi regulation(理氣) in blood diseases and stated that the Gi must be tonifyed after finding the source of the loss of blood.

  • PDF

Real-Time Implementation of Acoustic Echo Canceller for Mobile Handset Using TeakLite DSP Core (Teaklite DSP Core 를 이용한 이동통신 단말기용 음향반향제거기의 실시간 구현)

  • Gwon, Hong-Seok;Kim, Si-Ho;Jang, Byeong-Uk;Bae, Geon-Seong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.2
    • /
    • pp.128-136
    • /
    • 2002
  • In this paper, we developed an acoustic echo canceller in real-time using TeakLite DSP Core, which will be placed in the vocoder chip of a mobile handset. Considering the limited computational capacity given to the acoustic echo canceller in a vocoder chip, we employed a FIR-type adaptive filter using a conventional NLMS algorithm. To begin with, we designed and implemented an acoustic echo canceller with floating-point format C-source code, and then converted it into fixed-point format through integer simulation. Then we programmed and optimized it in the assembler level to make it run ill real-time. After optimization procedure, the implemented echo canceller has approximately 624 words of program memory and 811 words of data memory. With 8 KHz sampling rate and 256 filter taps in the echo canceller that corresponds to 32 msec of echo delay, it requires 14.12 MIPS of computational capacity. For coverage of 16 msec echo delay, i.e., 128 filter taps, 9 MIPS is requited.