Search | Korea Science

Analysis of GPU Performance and Memory Efficiency according to Task Processing Units (작업 처리 단위 변화에 따른 GPU 성능과 메모리 접근 시간의 관계 분석)

Son, Dong Oh;Sim, Gyu Yeon;Kim, Cheol Hong
- Smart Media Journal
- /
- v.4 no.4
- /
- pp.56-63
- /
- 2015
Modern GPU can execute mass parallel computation by exploiting many GPU core. GPGPU architecture, which is one of approaches exploiting outstanding computational resources on GPU, executes general-purpose applications as well as graphics applications, effectively. In this paper, we investigate the impact of memory-efficiency and performance according to number of CTAs(Cooperative Thread Array) on a SM(Streaming Multiprocessors), since the analysis of relation between number of CTA on a SM and them provides inspiration for researchers who study the GPU to improve the performance. Our simulation results show that almost benchmarks increasing the number of CTAs on a SM improve the performance. On the other hand, some benchmarks cannot provide performance improvement. This is because the number of CTAs generated from same kernel is a little or the number of CTAs executed simultaneously is not enough. To precisely classify the analysis of performance according to number of CTA on a SM, we also analyze the relations between performance and memory stall, dram stall due to the interconnect congestion, pipeline stall at the memory stage. We expect that our analysis results help the study to improve the parallelism and memory-efficiency on GPGPU architecture.
PDF KSCI

Torus Ring : Improving Performance of Interconnection Networks by Modifying Hierarchical Ring (Torus Ring : 계층 링 구조의 변형을 통한 상호 연결망의 성능 개선)

Kwak, Jong-Wook;Ban, Hyong-Jin;Jhon, Chu-Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.5
- /
- pp.196-208
- /
- 2005
In multiprocessor systems, interconnection network design is critical for overall system performance. Popular interconnection networks, which are generally considered, are meshes, rings, and hierarchical rings. In this paper, we propose (')Torus Ring('), which is a modified version of hierarchical ring. Torus Ring has the same complexity as the hierarchical rings, but the only difference is the way it connects the local rings. It has an advantage over the hierarchical rings when the destination of a packet is the neighbor local ring in the reverse direction. Though the average number of hops in Torus Ring is equal to that of the hierarchical rings when assuming the uniform distribution of each transaction, the benefits of the number of hops are expected to be larger because of the spatial locality in the real environment of parallel programming. In the simulation results, latencies in the interconnection network are reduced by up to 19$\%$, and the execution times are reduced by up to 10$\%$.
PDF KSCI

QoS and SLA Aware Web Service Composition in Cloud Environment

Wang, Dandan;Ding, Hao;Yang, Yang;Mi, Zhenqiang;Liu, Li;Xiong, Zenggang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.10 no.12
- /
- pp.5231-5248
- /
- 2016
As a service-oriented paradigm, web service composition has obtained great attention from both academia and industry, especially in the area of cloud service. Nowadays more and more web services providing the same function but different in QoS are available in cloud, so an important mission of service composition strategy is to select the optimal composition solution according to QoS. Furthermore, the selected composition solution should satisfy the service level agreement (SLA) which defines users' request for the performance of composite service, such as price and response time. A composite service is feasible only if its QoS satisfies user's request. In order to obtain composite service with the optimal QoS and avoid SLA violations simultaneously, in this paper we first propose a QoS evaluation method which takes the SLA satisfaction into account. Then we design a service selection algorithm based on our QoS evaluation method. At last, we put forward a parallel running strategy for the proposed selection algorithm. The simulation results show that our approach outperforms existing approaches in terms of solutions' optimality and feasibility. Through our running strategy, the computation time can be reduced to a large extent.
https://doi.org/10.3837/tiis.2016.12.006 인용 PDF KSCI

Design and Implementation of Acoustic Echo Canceller (Acoustic Echo Canceller 설계 및 구현)

장수안;문대철
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.2C
- /
- pp.291-297
- /
- 2004
In this paper, a new structure for the AEC(Acoustic Echo Canceller) is proposed in which echo signal components that can be created in mobile communications is effectively eliminated. Block Data Flow Architecture is a parallel architecture that achieves high performance, high efficiency, high throughput, and almost linear speed up. The proposed architecture employs AEC and is implemented using the TMS320C6711 for real-time applications. The proposed AEC shows improved performance by eliminating echoes at 55ms delay path. Since the proposed AEC can also be implemented in Firmware, it is believed to effectively work on various types of echoes if it is applied on CDMA mobile devices. The TMS320C6711 shows much better performance comparing to previous DSPs. For experimental verifications, filtering operation using adaptive algorithm is performed on TMS320C6711 board and error signals resulted from computations are monitored on PC, and then the performance of the implemented AEC is verified through ERLE computation. According the results of simulation, good characteristic of 100dB are shown after 500 sampling data.
PDF KSCI

XCP-OFDM System using Cross-handed Circular Polarization (역선회 원편파를 이용한 XCP-OFDM 시스템)

김병옥;하덕호
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.13 no.3
- /
- pp.316-322
- /
- 2002
The Orthogonal Frequency Division Multiplexing(OFDM) is a special case of multicarrier transmission, where a single data stream is divided into many subcarriers and transferred in a parallel way. It reduces the necessary bandwidth using the orthogonality between the subcarriers. Therefore it requires the transmission channel which has stable characteristic. When the delay spread of the channel exceed the guard interval, then the orthogonality of the subcarriers cannot maintain and as a result the system performance degrade. In this paper, the XCP-OFDM(OFDM using cross-handed Circular Polarization) system is newly proposed. This system divides the channel in order to eliminate the overlapping of subcarrier's spectrum by using cross-handed circular polarization. Therefore, the proposed XCP-OFDM system can improve the performance without increasing the guard interval. Both theoretical analysis and simulation results are described.
PDF KSCI

An Effect of Electrical Interconnect in Optical Transceiver Module (광송수신 모듈 구현을 위한 전기 접속부에 관한 연구)

조인귀;한상필;윤근병;정명영
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.14 no.8
- /
- pp.863-870
- /
- 2003
The digital transmission system entered in a RF region as digital system use IC chips of the speeder edge rate and clock speed nowadays. Optical path really was used in order to obtain the more capacity. In this paper, we described importance of electrical interconnect to get the signal integrity in optical module by simulation and experiment. 12 channel${\times}$2.5 G/ps optical parallel transmitter modules were manufactured by two different method ; access lines with microstrip and stripline type. We have clearly shown that the optical module adopting microstrip type with S$\sub$11/ $\geq$ -10 dB presents distortion but the optical module adopting stripline type with S$\sub$11/ $\leq$ : 15 dB obtains eye opening in 2.5 Gbis optical eye pattern response.
PDF KSCI

A Study on the Corona Discharge Simulation Using FEM-FCT Method (FEM-FCT 기법을 이용한 코로나 방전 시뮬레이션에 대한 연구)

Min, Ung-Gi;Kim, Hyeong-Seok;Lee, Seok-Hyeon;Han, Song-Yeop
- The Transactions of the Korean Institute of Electrical Engineers C
- /
- v.48 no.3
- /
- pp.200-208
- /
- 1999
In this paper, the corona discharge is analyzed by Finite Element Method(FEM) combined with Flux-corrected Transport(FCT) algorithm. In the previous papers, Finite Difference Method(FDM) combined with FCT was used. Usually in the FDM, the regionof interest is discretized with structured grids. But to refine local regions with same resolution, much more grids are required for the structured grids than for unstructured grids than for unstructured grids. Therefore, we propose the FEM-FCT method to simulate the corona discharge. The proposed method has good flexibility in model shape and can reduce the computational cost by the local refinement where the physical quantities have steep gradients. Using the proposed method, we study the streamer growth of parallel plate electrodes which is initiated by the low and high perturbation density. We find that the varying the initial density of perturbation has very little effect on the streamer propagation. And the corona discharge of the rod-to-plane electrode is simulated. On the surface of the rod electrode, the high concentration of the electric field gives rise to many number of streamer seeds. The strong axial streamer propagate to the plane electrode. The weaker non-axial streamer repel each other and stop growing more. The results are very similar to those of the papers which used the FDM-FCT method on structured grids. Thus we can conclude that the proposed FEM-FCT method is more efficient than the conventional FDM-FCT method by virtue of the reduction in computational grids number.
PDF

Magnetism and Magnetocrystalline Anisotropy of Ni/Fe(001) Surface: A First Principles Study (Ni/Fe(001)의 자성과 자기이방성에 대한 제일원리계산)

Kwon, Oryong;Hong, Soon Cheol
- Journal of the Korean Magnetics Society
- /
- v.25 no.4
- /
- pp.101-105
- /
- 2015
Recent theoretical calculations predicted that a system composed exclusively of 3d transition metals without 4d/5d transition metals or rare earth metals can have strong perpendicular magnetocrystalline anisotropy (MCA) if Fe and Ni layers are arranged appropriately. They considered only Fe-terminated surfaces, noting that Fe/MgO(001) and CoFeB/MgO(001) show strong perpendicular MCA. In this paper, we investigate magnetism and MCA of Ni/Fe(001) surface where Ni layer is positioned at the surface, by using complementarily the first principles calculational methods of Vienna Ab-initio Simulation Package (VASP) and Full-potential Linearized Augmented Plane Wave (FLAPW) method. Comparing results of magnetism and MCA obtained by VASP with the results by FLAPW method, we find the VASP results do not show big difference from results by FLAPW method. Magnetic moments of Fe and Ni are enhanced due to strong hybridization between Fe and Ni bands. MCA of Ni/Fe(001) is parallel to the surface, which implies the surface termination plays a crucial role in determining MCA of a system.
https://doi.org/10.4283/JKMS.2015.25.4.101 인용 PDF KSCI

The Analysis of Fire-Driven Flow and Temperature in The Railway Tunnel with Ventilation (환기를 동반한 철도터널 화재 연기유속 및 온도장 해석)

Jang, Yong-Jun;Lee, Chang-Hyun;Kim, Hag-Beom;Lee, Woo-Dong
- Proceedings of the KSR Conference
- /
- 2008.06a
- /
- pp.1794-1801
- /
- 2008
Fire-driven flow and temperature distribution in a ventilated tunnel was analyzed by Large Eddy Simulation using FDS code. The simulated tunnel is 182m length, 5.4m wide and 2.4m height. A pool fire was located 112m from tunnel entrance and was taken as a heat source of $0.89m^2$. The heat is assumed to be released uniformly throughout the whole simulated time. The fire strength was 2.76MW and the fuel burnt was octane. The parallel computational method was employed to accelerate the computing time and manage the large grid points which is not possible to handle in the one CPU. The total grid points used were $2.4{\times}10^6$ and 7 CPUs were used to calculate the momentum and energy equations. The simulated results were well compared with the experiments.
PDF

Single Board Realtime 2-D IIR Filtering System (실시간 2차원 디지털 IIR 필터의 구현)

Jeong, Jae-Gil
- The Journal of Engineering Research
- /
- v.2 no.1
- /
- pp.39-47
- /
- 1997
This paper presents a single board digital signal processing system which can perform two-dimensional (2-D) digital infinite impulse response (IIR) filtering in realtime. We have developed an architecture to provide not only the necessary computational power but also a balance of the system input/output and computational requirements. The architecture achieves large system throughput by using highly parallel processing at both the system and processor levels. It reduces system data communication requirements significantly by taking advantage of a custom-designed processor and by providing each processor with its own input and ouput channel. After system initialization, almost 100 percent of the time is used for data processing. Data transfers occur concurrently with data processing. The functional level simulation reveals that the system throughput can reach as high as one pixel per system cycle. With only 10MHz clock frequency system, it can implement up to fourth order 2-D IIR filters for video-rate data ($512\times512$ pixels per frame at 30 frames per second). If we increase the system frequency, the system can be used for the preprocessing and postprocessing of video signal of HDTV.
PDF

Search Result 1,737, Processing Time 0.046 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)