Search | Korea Science

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
- Journal of KIISE:Software and Applications
- /
- v.36 no.4
- /
- pp.253-260
- /
- 2009
Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.
PDF KSCI

발효 콩 추출물의 항돌연변이원성 효과

이효진;문선영;전윤영;최승필;이득식;함승시
- Proceedings of the Korean Society of Postharvest Science and Technology of Agricultural Products Conference
- /
- 2003.04a
- /
- pp.146.2-147
- /
- 2003
콩 발효식품은 예로부터 단백질 식품원으로서 뿐만 아니라, 식생활에서 없어서는 안되는 매우 중요한 식품 중의 하나였다. 발효식품에 대한 연구가 부진하였던 과거에는 콩 발효식품은 하나의 식품군으로서의 중요성을 가질 뿐 큰 관심의 대상은 아니었다. 그러나 최근에는 많은 연구자들이 콩 발효시 생성되는 기능성 성분 및 생리활성 효과를 점차 밝혀냄으로서 주목을 받기 시작하였다. 따라서 본 실험에서도 콩 발효에 의한 생리활성 효과를 알아보기 위해 Ames법에 의한 항돌연변이원 효과를 실험하였다. 콩 발효는 국산콩을 이용하여 메주에서 분리한 Bacillus sp. 와 Aspergillus sp.를 복합 발효시켜 동결건조 후, 분쇄하여 실험에 사용하였다. 제조된 발효 콩 분말은 일반분석을 행하였으며, 70% 에탄올로 3회 추출하여 감압농축 후, hexane, chloroform ethyl acetate, butanol 및 aqueous로 분획하여 동결 건조시킨 후, S. typhimurium TA98 및 TA100 균주를 이용한 유전자 복귀 돌연변이 시험을 실시하였다. 그 결과, 70% 에탄을 추출물과 각각의 분획물 자체의 돌연변이원성은 없었다. 또한 항돌연변이원 실험에서는 발암물질로서 직접 돌연변이원인 4NQO와 MNNG, 간접 돌연변이원인 Trp-P-1을 이용하였다. 특히 이들 발암물질 중 MNNG(0.4 $\mu\textrm{g}$/plate)의 경우 TA100 균주에서 ehtyl acetate 분획물에서 다른 분획물보다 높은 86.6%의 억제 효과를 나타내었으며, 대부분의 분획물에서도 70%이상의 억제효과를 나타내었다. 또한 각 분획물에서 농도 의존적으로 억제효과 역시 높았으며, 분획물에 따라 서로 다른 억제효과를 나타내었다.아 저장할 때 대비 저온저장고에서는 111일 동안에 11.7%의 중량감모가 발생하였으나, 신기술투입 저온저장고에서는 5.6%의 중량감모만이 발생하여 약 50%의 중량감모를 줄일 수 있었으며, 배의 색깔이나 경도도 대비구 보다 우수하였다. 4. 배를 비닐로 포장하여 대비 저온저장고에 저장한 경우와 비닐로 포장하지 않고 신기술투입 저온저장고에 저장한 경우를 비교할 때 11월~다음해 1월 까지는 중량감모, 과피색깔 및 경도에 큰 차이가 없었으나, 2월부터는 비닐로 포장하여 대비 저온저장고에 저장한 배의 품질변화가 급격히 증가되어 중량감모, 과피색깔 및 경도가 신기술 투입시 보다 급속하게 나빠졌다.를 저장 25일 경과시까지 유지하였다. 수확 시 높은 품온을 갖고 있는 과일을 산지에서 예냉 처리를 한 후 저온 냉장차를 이용하여 유통한다면 관행 유통 구조보다 고품질의 포도를 유통시킬 수 있는 것으로 사료되며 앞으로는 완숙된 고 당도(12.0~15.0Bx)$^{\circ}$ 포도를 수확 한 즉시 예냉 처리하고 저온 유통한다면 보다 신선한 과일을 소비자에게 전달 할 수 있을 것이다.갈변물질이 생성되었다. 이와 같은 결과로 볼 때, BAAG의 처리는 BAAC의 경우보다 가격은 저렴하면서도 항균력은 우수한 천연 항균복합제재로써 농산물 식품원료에 적용하여 선도유지 기간을 연장할 수 있는 효과를 기대할 수 있었다. 과일 등의 포장제로서 이용할 가능성을 확인하였다.로 [-wh] 겹의문사는 복수 의미를 지닐 수 없 다. 그러면 단수 의미는 어떻게 생성되는가\ulcorner 본 논문에서는 표면적 형태에도 불구하고 [-wh]의미의 겹의문사는 병렬적 관계의 합성어가 아니라 내부구조를 지니지 않은 단순한 단어(minimal $X^{0}$
PDF

2-D DCT/IDCT Processor Design Reducing Adders in DA Architecture (DA구조 이용 가산기 수를 감소한 2-D DCT/IDCT 프로세서 설계)

Jeong Dong-Yun;Seo Hae-Jun;Bae Hyeon-Deok;Cho Tae-Won
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.43 no.3 s.345
- /
- pp.48-58
- /
- 2006
This paper presents 8x8 two dimensional DCT/IDCT processor of adder-based distributed arithmetic architecture without applying ROM units in conventional memories. To reduce hardware cost in the coefficient matrix of DCT and IDCT, an odd part of the coefficient matrix was shared. The proposed architecture uses only 29 adders to compute coefficient operation in the 2-D DCT/IDCT processor, while 1-D DCT processor consists of 18 adders to compute coefficient operation. This architecture reduced 48.6% more than the number of adders in 8x8 1-D DCT NEDA architecture. Also, this paper proposed a form of new transpose network which is different from the conventional transpose memory block. The proposed transpose network block uses 64 registers with reduction of 18% more than the number of transistors in conventional memory architecture. Also, to improve throughput, eight input data receive eight pixels in every clock cycle and accordingly eight pixels are produced at the outputs.
PDF KSCI

Millimeter-wave Broadband Amplifier integrating Shunt Peaking Technology with Cascode Configuration (Cascode 구조에 Shunt Peaking 기술을 접목시킨 밀리미터파 광대역 Amplifier)

Kwon, Hyuk-Ja;An, Dan;Lee, Mun-Kyo;Lee, Sang-Jin;Moon, Sung-Woon;Baek, Tae-Jong;Park, Hyun-Chang;Rhee, Jin-Koo
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.43 no.10 s.352
- /
- pp.90-97
- /
- 2006
We report our research work on the millimeter-wave broadband amplifier integrating the shunt peaking technology with the cascode configuration. The millimeter-wave broadband cascode amplifier on MIMIC technology was designed and fabricated using $0.1{\mu}m\;{\Gamma}-gate$ GaAs PHEMT, CPW, and passive library. The fabricated PHEMT has shown a transconductance of 346.3 mS/mm, a current gain cut off frequency ($f_T$) of 113 GHz, and a maximum oscillation frequency ($f_{max}$) of 180 GHz. To prevent oscillation of designed cascode amplifier, a parallel resistor and capacitor were connected to drain of common-gate device. For expansion of the bandwidth and flatness of the gain, we inserted the short stub into bias circuits and the compensation transmission line between common-source device and common-gate device, and then their lengths were optimized. Also, the input and output stages were designed using the matching method to obtain the broadband characteristic. From the measurement, we could confirm to extend bandwidth and flat gain by integrating the shunt peaking technology with the cascode configuration. The cascode amplifier shows the broadband characteristic from 19 GHz to 53.5 GHz. Also, the average gain of this amplifier is about 6.5 dB over the bandwidth.
PDF KSCI

Design and Implementation of the Central Queue Based Loop Scheduling Method (중앙 큐 기반의 루프 스케쥴링 기법의 설계 및 구현)

Kim, Hyun-Chul;Kim, Hyo-Cheol;Yoo, Kee-Young
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.38 no.5
- /
- pp.16-26
- /
- 2001
In this paper, we present a new scheduling method called CDSS(Carried-Dependence Self-Scheduling) for efficiently execution of the loop with intra dependency between iterations based on the central queue. We also implemented it on shared memory system using Java language. Also, we study the modification that converts the existing self-scheduling method based on the central task queue for parallel loops onto the same form applied to loop with loop-carried dependences. The proposed method is self scheduling and assigns the loops in three-level considering the synchronization point according to the dependence distance of the loops. To adapt the proposed scheme and modified methods into various platforms, including a uni-processor system, we use threads for implementation. Compared to other assignment algorithms with various changes of application and system parameters, CDSS is found to be more efficient than other methods in overall execution time including scheduling overheads. CDSS shows improved performance over modified SS, Factoring, GSS and CSS by about 0.02, 40.5, 46.1 and 53.6%, respectively. In CDSS, we achieve the best performance on varying application programs using a few threads, which equal the dependence distance.
PDF

A Design of Adaptive Channel Estimate Algorithm for ICS Repeater (ICS 중계기를 위한 적응형 채널추정 알고리듬 설계)

Lee, Suk-Hui;Song, Ho-Sup;Bang, Sung-Il
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.46 no.3
- /
- pp.19-25
- /
- 2009
In this thesis, design effective elimination interference algorithm of ICS repeat system for repeater that improve frequency efficiency. Error convergence speed and accuracy of LMS Algorithm are influenced by reference signal. For improve LMS Algorithm, suggest Adaptive channel estimate algorithm. For using channel characteristic, adaptive channel estimate algorithm make reference signal similar interference signal by convolution operation and complement LMS algorithm demerit. For make channel similar piratical channel, apply Jake's Rayleigh multi-path model that random five path with 130Hz Doppler frequency. LMS algorithm and suggested adaptive channel estimate algorithm that have 16 taps apply to ICS repeat system under Rayleigh multi-path channel, so simulate with MATLAB. According to simulate, ICS repeat system with LMS algorithm show -40dB square error convergent after 150 datas iteration and ICS repeat system with adaptive channel estimate algorithm show -80dB square error convergent after 200 datas iteration. Analyze simulation result, suggested adaptive channel estimate algorithm show more three times iteration performance than LMS algorithm, and 40dB accuracy.
PDF KSCI

A Digital Input Class-D Audio Amplifier (디지털 입력 시그마-델타 변조 기반의 D급 오디오 증폭기)

Jo, Jun-Gi;Noh, Jin-Ho;Jeong, Tae-Seong;Yoo, Chang-Sik
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.47 no.11
- /
- pp.6-12
- /
- 2010
A sigma-delta modulator based class-D audio amplifier is presented. Parallel digital input is serialized to two-bit output by a fourth-order digital sigma-delta noise shaper. The output of the digital sigma-delta noise shaper is applied to a fourth-order analog sigma-delta modulator whose three-level output drives power switches. The pulse density modulated (PDM) output of the power switches is low-pass filtered by an LC-filter. The PDM output of the power switches is fed back to the input of the analog sigma-delta modulator. The first integrator of the analog sigma-delta modulator is a hybrid of continuous-time (CT) and switched-capacitor (SC) integrator. While the sampled input is applied to SC path, the continuous-time feedback signal is applied to CT path to suppress the noise of the PDM output. The class-D audio amplifier is fabricated in a standard $0.13-{\mu}m$ CMOS process and operates for the signal bandwidth from 100-Hz to 20-kHz. With 4-${\Omega}$ load, the maximum output power is 18.3-mW. The total harmonic distortion plus noise and dynamic range are 0.035-% and 80-dB, respectively. The modulator consumes 457-uW from 1.2-V power supply.
PDF KSCI

Efficient polynomial exponentiation in $GF(2^m)$with a trinomial using weakly dual basis ($GF(2^m)$에서 삼항 기약 다항식을 이용한 약한 쌍대 기저 기반의 효율적인 지수승기)

Kim, Hee-Seok;Chang, Nam-Su;Lim, Jong-In;Kim, Chang-Han
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.8
- /
- pp.30-37
- /
- 2007
An exponentiation in $GF(2^m)$ is a basic operation for several algorithms used in cryptography, digital signal processing, error-correction code and so on. Existing hardware implementations for the exponentiation operation organize by Right-to-Left method since a merit of parallel circuit. Our paper proposes a polynomial exponentiation structure with a trinomial that is organized by Left-to-Right method and that utilizes a weakly dual basis. The basic idea of our method is to decrease time delay using precomputation tables because one of two inputs in the Left-to-Right method is fixed. Since $T_{sqr}$ (squarer time delay) + $T_{mul}$(multiplier time delay) of ow method is smaller than $T_{mul}$ of existing methods, our method reduces time delays of existing Left-to-Right and Right-to-Left methods by each 17%, 10% for $x^m+x+1$ (irreducible polynomial), by each 21%, 9% $x^m+x^k+1(1, by each 15%, 1% for $x^m+x^{m/2}+1$.
PDF KSCI

De-duplication of Parity Disk in SSD-Based RAID System (SSD 기반의 RAID 시스템에서 패리티 디스크의 중복 제거)

Yang, Yu-Seok;Lee, Seung-Kyu;Kim, Deok-Hwan
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.1
- /
- pp.105-113
- /
- 2013
RAID systems have been widely used by connecting several disks in parallel structure. to resolve the delay and bottleneck of data I/O. Recently, SSD based RAID systems are emerging since SSDs have better I/O performance than HDD. However, endurance and power consumption problems due to frequent write operation in SSD based RAID system should be resolved. In this paper, we propose a de-duplication method of parity disk in SSD based RAID system with expensive update cost. The proposed method segments chunk of parity data into small pieces and removes duplicate data, therefore, it can reduce wear-leveling and power consumption by decreasing write operation for duplicated parity data. Experimental results show that bit update rate of the proposed method is 16% in total disk, 31% in parity disk less than that of existing method in RAID-6 system using EVENODD erasure code, and the power consumption of the proposed method is 30% less than that of existing method. Besides the proposed method is 12% in total disk, 32% in parity disk less than that of existing method in RAID-5 system, and the power consumption of the proposed method is 36% less than that of existing method.
https://doi.org/10.5573/ieek.2013.50.1.105 인용 PDF KSCI

Design of High-Power and High-Efficiency Broadband Amplifier Using 1:4 Transmission Line Transformer (1:4 전송 선로 트랜스포머를 이용한 고출력 고효율 광대역 전력 증폭기의 설계)

Kim, Kyung-Won;Seo, Min-Cheol;Cho, Jae-Yong;Yoo, Sung-Cheol;Kim, Min-Su;Kim, Hyung-Cheol;Oh, Jun-Hee;Sim, Jae-Woo;Yang, Youn-Goo
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.21 no.2
- /
- pp.121-128
- /
- 2010
This paper presents a design of a 100 W high-efficiency power amplifier, whose operational frequency band expands from 30 to 512 MHz, using negative feedback network, push-pull structure, broadband RF choke, and transmission line transformer for balun configuration. The push-pull amplifier has been tuned for higher output power using a shunt capacitor as a matching component at its load especially for high-frequency region. The implemented power amplifier exhibited a very flat power gain of $18.34{\pm}0.9\;dB$ throughout the operating frequency band and very high power-added efficiency(PAE) of greater than 40% at an output power of 100 W. It also showed second- and third-harmonic distortion levels of below -34 dBc and -12 dBc, respectively, through the entire operating frequency band.
https://doi.org/10.5515/KJKIEES.2010.21.2.121 인용 PDF KSCI

Search Result 1,180, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)