Search | Korea Science

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
- Journal of KIISE:Databases
- /
- v.37 no.5
- /
- pp.275-284
- /
- 2010
GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.
PDF KSCI

A Parallel Algorithm for Large DOF Structural Analysis Problems (대규모 자유도 문제의 구조해석을 위한 병렬 알고리즘)

Kim, Min-Seok;Lee, Jee-Ho
- Journal of the Computational Structural Engineering Institute of Korea
- /
- v.23 no.5
- /
- pp.475-482
- /
- 2010
In this paper, an efficient two-level parallel domain decomposition algorithm is suggested to solve large-DOF structural problems. Each subdomain is composed of the coarse problem and local problem. In the coarse problem, displacements at coarse nodes are computed by the iterative method that does not need to assemble a stiffness matrix for the whole coarse problem. Then displacements at local nodes are computed by Multi-Frontal Sparse Solver. A parallel version of PCG(Preconditioned Conjugate Gradient Method) is developed to solve the coarse problem iteratively, which minimizes the data communication amount between processors to increase the possible problem DOF size while maintaining the computational efficiency. The test results show that the suggested algorithm provides scalability on computing performance and an efficient approach to solve large-DOF structural problems.
PDF KSCI

Implementation of Active Noise Curtains for Long Distance Noise (원거리 소음 제거를 위한 능동방음막 구현)

Nam, Hyun-Do;Kwon Hyuk
- Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
- /
- v.18 no.1
- /
- pp.154-160
- /
- 2004
In this paper, implementation of active noise curtains using multiple channel adaptive filters is presented. The same numbers of single channel LMS algorithms as control loudspeakers is used instead of a multi-channel LMS algorithm to reduce the computational burden of adaptive filter algorithms. In general, a multi-channel LMS algorithm is usually used in active noise control system. but this algorithm has much more computational complexity. The single channel control techniques have less amount of DSP calculation, compared to multiple channel control techniques. A stabilizing procedure for adaptive IIR filters is also proposed to improve the stability of recursive LMS algorithms. Both experimental results of two control techniques using TMS320VC33 digital signal processor show the similar noise reduction, but the single channel control techniques are more efficient in practical active noise curtain applications
https://doi.org/10.5207/JIEIE.2004.18.1.154 인용 PDF KSCI

MDA(Model Driven Architecture) based Design for Multitasking of Heterogeneous Embedded System (이종 임베디드 시스템의 멀티태스킹을 위한 MDA(Model Driven Architecture) 기반의 설계)

Son, Hyun-Seung;Kim, Woo-Yeol;Kim, R. Young-Chul
- The KIPS Transactions:PartD
- /
- v.15D no.3
- /
- pp.355-360
- /
- 2008
The complicated embedded system for multi-tasking requires RTOS(real-time operating system). It uses the optimal OS and processor to each embedded system on the heterogeneous development environment. This paper is proposed to use UML profile of OS API and Processor Configuration, instead of cross-compiling for developing the heterogeneous embedded system. This reduces the development time and cost through generating the automatic source code with the profile information of each embedded system. We generate and port the code after modeling the two heterogeneous real time operating systems (brickOS and uC/OS-II) and the processors (Hitachi H8 and Intel PXA255) with our proposed profile of the heterogeneous embedded system.
https://doi.org/10.3745/KIPSTD.2008.15-D.3.355 인용 PDF KSCI

The Design and Implementation of OSF/1 AD3 Based-Microkernel Initialization for SPAX (SPAX를 위한 OSF/1 AD3 기반의 마이크로 커널 초기화 설계 및 구현)

Kim, Jeong-Nyeo;Cho, Il-Yeon;Lee, Jae-Kyung;Kim, Hae-Jin
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.5
- /
- pp.1333-1344
- /
- 1998
In comparison to traditional monolithic kernel, the microkernel based operating system has slower speed. But Microkernel based OS suites for multi-computer system, because It has benefits in the modularity and portability point of view. Each unit and memory of a processor must be initialized by using the boot information so that the multi-computer system OS can actively run the function of the system. This paper describes the microkernel initialization of OSF/1 AD3 MISIX that is based on OSF/1 AD3 for SPAX. It will introduce the initialization of microkernel for the SPAX which is High-speed Parallel Processing system in terms of Boot, Initialization related hardware and memory address space construction. This paper will also state the test result based on test environments. Microkernel tested in single node system that has 4 processors.
PDF

A Performance Evaluation on Classic Mutual Exclusion Algorithms for Exploring Feasibility of Practical Application (실제 적용 타당성 탐색을 위한 고전적 상호배제 알고리즘 성능 평가)

Lee, Hyung-Bong;Kwon, Ki-Hyeon
- KIPS Transactions on Computer and Communication Systems
- /
- v.6 no.12
- /
- pp.469-478
- /
- 2017
The mutual exclusion is originally based on the theory of race condition prevention in symmetric multi-processor operating systems. But recently, due to the generalization of multi-core processors, its application range has been rapidly shifted to parallel processing application domain. POSIX thread, WIN32 thread, and Java thread, which are typical parallel processing application development environments, provide a unique mutual exclusion mechanism for each of them. Applications that are very sensitive to performance in these environments may want to reduce the burden of mutual exclusion, even at some cost, such as inconvenience of coding. In this study, we implement Dekker's and Peterson's algorithm in the form of busy-wait and processor-yield in various platforms, and compare the performance of them with the built-in mutual exclusion mechanisms to evaluate the usability of the classic algorithms. The analysis result shows that Dekker's algorithm of processor-yield type is superior to the built-in mechanisms in POSIX and WIN32 thread environments at least 2 times and up to 70 times, and confirms that the practicality of the algorithm is sufficient.
https://doi.org/10.3745/KTCCS.2017.6.12.469 인용 PDF KSCI

Realization of the Pulse Doppler Radar Signal Processor with an Expandable Feature using the Multi-DSP Based Morocco-2 Board (다중 DSP 구조의 Morocco-2 보드를 이용한 확장성을 갖는 펄스 도플러 레이다 신호처리기 구현)

조명제;임중수
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.12 no.7
- /
- pp.1147-1156
- /
- 2001
In this paper, a new design architecture of radar signal processor in real time is proposed. It has been designed and implemented under the consideration to minimize the inter-processor communication overhead and to maintain the coherence in Doppler pulse domain and in range domain. Its structure can be easily reconfigured and reprogrammed in accordance with an addition of function algorithm or a modification of operational scenario. As we designed a task configuration for parallel processing from measures of computation time for function algorithms and transmission time for results by signal processing, data exchange between processors for performing of function algorithms could be fully removed. Morocco-2 board equipped ADSP-21060 processor of Analog Devices inc. and APEX-3.2 developed for SHARC DSP were used to construct the radar signal processor.
PDF

Semantic Depth Data Transmission Reduction Techniques using Frame-to-Frame Masking Method for Light-weighted LiDAR Signal Processing Platform (LiDAR 신호처리 플랫폼을 위한 프레임 간 마스킹 기법 기반 유효 데이터 전송량 경량화 기법)

Chong, Taewon;Park, Daejin
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.12
- /
- pp.1859-1867
- /
- 2021
Multi LiDAR sensors are being mounted on autonomous vehicles, and a system to multi LiDAR sensors data is required. When sensors data is transmitted or processed to the main processor, a huge amount of data causes a load on the transport network or data processing. In order to minimize the number of load overhead into LiDAR sensor processors, only semantic data is transmitted through data comparison between frames in LiDAR data. When data from 4 LiDAR sensors are processed in a static environment without moving objects and a dynamic environment in which a person moves within sensor's field of view, in a static experiment environment, the transmitted data reduced by 89.5% from 232,104 to 26,110 bytes. In dynamic environment, it was possible to reduce the transmitted data by 88.1% to 29,179 bytes.
https://doi.org/10.6109/jkiice.2021.25.12.1859 인용 PDF KSCI

Parallel Implementations of Digital Focus Indices Based on Minimax Search Using Multi-Core Processors

HyungTae, Kim;Duk-Yeon, Lee;Dongwoon, Choi;Jaehyeon, Kang;Dong-Wook, Lee
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.2
- /
- pp.542-558
- /
- 2023
A digital focus index (DFI) is a value used to determine image focus in scientific apparatus and smart devices. Automatic focus (AF) is an iterative and time-consuming procedure; however, its processing time can be reduced using a general processing unit (GPU) and a multi-core processor (MCP). In this study, parallel architectures of a minimax search algorithm (MSA) are applied to two DFIs: range algorithm (RA) and image contrast (CT). The DFIs are based on a histogram; however, the parallel computation of the histogram is conventionally inefficient because of the bank conflict in shared memory. The parallel architectures of RA and CT are constructed using parallel reduction for MSA, which is performed through parallel relative rating of the image pixel pairs and halved the rating in every step. The array size is then decreased to one, and the minimax is determined at the final reduction. Kernels for the architectures are constructed using open source software to make it relatively platform independent. The kernels are tested in a hexa-core PC and an embedded device using Lenna images of various sizes based on the resolutions of industrial cameras. The performance of the kernels for the DFIs was investigated in terms of processing speed and computational acceleration; the maximum acceleration was 32.6× in the best case and the MCP exhibited a higher performance.
https://doi.org/10.3837/tiis.2023.02.014 인용 PDF HTML

Design of Low-complexity FFT Processor for Multi-mode Radar Signal Processing (멀티모드 레이다 신호처리를 위한 저복잡도 FFT 프로세서 설계)

Park, Yerim;Jung, Yongchul;Jung, Yunho
- Journal of Advanced Navigation Technology
- /
- v.24 no.2
- /
- pp.85-91
- /
- 2020
Recently, a multi-mode radar system was designed for efficient operation of unmanned aerial vehicles (UAVs) in various environments, which has the advantage of being able to integrate and utilize methods of the pulse Doppler (PD) radar and the frequency modulated continuous wave (FMCW) radar. For the range detection part of the multi-mode radar signal processor (RSP), the hardware structure using the FFT processor and the IFFT processor is required to be designed in a way that improves efficiency on the area side. In addition, given the radar application environment that requires a variety of distance resolutions, FFT processors need to support variable-length operations. In this paper, the FFT processor and IFFT processor in multi-mode RSP range estimation are designed and proposed as hardware for a single FFT processor that supports variable length operation of 16-1024 points. The proposed FFT processor designed in hardware description language (HDL) and can be implemented with 7,452 logic elements and 5,116 registers.
https://doi.org/10.12673/jant.2020.24.2.85 인용 PDF KSCI

Search Result 213, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)