• Title/Summary/Keyword: 병렬 시스템 동기화

Search Result 87, Processing Time 0.021 seconds

Multiple Pipelined Hash Joins using Synchronization of Page Execution Time (페이지 실행시간 동기화를 이용한 다중 파이프라인 해쉬 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.7
    • /
    • pp.639-649
    • /
    • 2000
  • In the relational database systems, the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed to reduce the execution time. Multiple hash join algorithm using allocation tree is one of most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. In this paper, to solve the performance degrading problem by the delay, we develop a join algorithm using the concept of 'synchronization of page execution time' for multiple hash joins. We reduce the processing time of each nodes in the allocation tree and improve the total system performance. In addition, we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

A New Asynchronous Pipeline Architecture for CISC type Embedded Micro-Controller, A8051 (CISC 임베디드 컨트롤러를 위한 새로운 비동기 파이프라인 아키텍쳐, A8051)

  • 이제훈;조경록
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.4
    • /
    • pp.85-94
    • /
    • 2003
  • The asynchronous design methods proved to have the higher performance in power consumption and execution speed than synchronous ones because it just needs to activate the required module without feeding clock in the system. Despite the advantage of CISC machine providing the variable addressing modes and instructions, its execution scheme is hardly suited for a synchronous Pipeline architecture and incurs a lot of overhead. This paper proposes a novel asynchronous pipeline architecture, A80sl, whose instruction set is fully compatible with that of Intel 80C51, an embedded micro controller. We classify the instructions into the group keeping the same execution scheme for the asynchronous pipeline and optimize it eliminating the bubble stage that comes from the overhead of the multi-cycle execution. The new methodologies for branch and various instruction lengths are suggested to minimize the number of states required for instructions execution and to increase its parallelism. The proposed A80C51 architecture is synthesized with 0.35${\mu}{\textrm}{m}$ CMOS standard cell library. The simulation results show higher speed than that of Intel 80C51 with 36 MHz and other asynchronous counterparts by 24 times.

Performance Evaluation of VBR MPEG Video Storage and Retrieval Schemes in a VOD System (VOD 시스템에서의 가변 비트율 MPEG 비디오 저장 및 검색 기법의 성능 평가)

  • 전용희;박정숙
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.1
    • /
    • pp.13-28
    • /
    • 2001
  • In a VOD(Vide-On-Demand) system, video data are generally stored in magnetic disk array. In order to provide real-time requirement for data retrieval, video streams must be delivered continuously to the clients such that the delivery of continuous media can be guaranteed in a timely fashion. Compared to the increased performance of processors and networks, the performance of magnetic disk systems have improved only modestly. In order to improve the performance of storage system, disk array system is proposed and used. The array system improves I/O performance by placing disks in parallel and retrieving data concurrently. In this paper, two approaches are considered in order to access the video data in a VOD system, which are CTL(Constant Time Length) and CDL(Constant Data Length) access policies. Disk scheduling policies are also classified into the two categories and compared in terms of the maximum allowable video streams with different degrees of disk array synchronization, under the mixed environments in which both data access policy and disk scheduling policy are considered. Among the compared scheduling policies, LOOK was shown to have the best performance. In terms of degree of disk synchronization, more gain was achieved with large degree of synchronization. In comparisons of performance of CTL and CDL, CTL was proved to have a little superior performance in terms of number of maximum allowable streams.

  • PDF

Serialized Multitasking Code Generation from Dataflow Specification (데이타 플로우 명세로부터 직렬화된 멀티태스킹 코드 생성)

  • Kwon, Seong-Nam;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.9_10
    • /
    • pp.429-440
    • /
    • 2008
  • As embedded system becomes more complex, software development becomes more important in the entire design process. Most embedded applications consist of multi -tasks, that are executed in parallel. So, dataflow model that expresses concurrency naturally is preferred than sequential programming language to develop multitask software. For the execution of multitasking codes, operating system is essential to schedule multi-tasks and to deal with the communication between tasks. But, it is needed to execute multitasking code without as when the target hardware platform cannot execute as or target platforms are candidates of design space exploration, because it is very costly to port as for all candidate platforms of DSE. For this reason, we propose the serialized multitasking code generation technique from dataflow specification. In the proposed technique, a task is specified with dataflow model, and generated as a C code. Code generation consists of two steps: First, a block in a task is generated as a separate function. Second, generated functions are scheduled by a multitasking scheduler that is also generated automatically. To make it easy to write customized scheduler manually, the data structure and information of each task are defined. With the preliminary experiment of DivX player, it is confirmed that the generated code from the proposed framework is efficiently and correctly executed on the target system.

Efficient Workload Distribution of Photomosaic Using OpenCL into a Heterogeneous Computing Environment (이기종 컴퓨팅 환경에서 OpenCL을 사용한 포토모자이크 응용의 효율적인 작업부하 분배)

  • Kim, Heegon;Sa, Jaewon;Choi, Dongwhee;Kim, Haelyeon;Lee, Sungju;Chung, Yongwha;Park, Daihee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.8
    • /
    • pp.245-252
    • /
    • 2015
  • Recently, parallel processing methods with accelerator have been introduced into a high performance computing and a mobile computing. The photomosaic application can be parallelized by using inherent data parallelism and accelerator. In this paper, we propose a way to distribute the workload of the photomosaic application into a CPU and GPU heterogeneous computing environment. That is, the photomosaic application is parallelized using both CPU and GPU resource with the asynchronous mode of OpenCL, and then the optimal workload distribution rate is estimated by measuring the execution time with CPU-only and GPU-only distribution rates. The proposed approach is simple but very effective, and can be applied to parallelize other applications on a CPU and GPU heterogeneous computing environment. Based on the experimental results, we confirm that the performance is improved by 141% into a heterogeneous computing environment with the optimal workload distribution compared with using GPU-only method.

Design and Development of SMIL Processor for efficient Embedding (효율적 Embedding을 위한 SMIL Processor의 설계 및 개발)

  • 장동옥;강미연;정원호;이은철;김도완;김종대;김윤수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10b
    • /
    • pp.265-267
    • /
    • 1999
  • XML 언어로 설계된 SMIL(Synchronized Multimedia Integration Language)은 멀티미디어 객체들의 순차적 혹은 병렬적 동기화를 효율적으로 할 수 있는 마크업 언어로써, web을 이용한 원격 강의나 홍보 등을 더욱 생성하고 dynamic하게 보여 줄 수 있어, 그 사용이 확대될 전망이다. 본 논문에서는 각종 웹 단말기에 손쉽게 embedding 될 수 있는 SMIL 프로세서에 대한 설계가 제안된다. 웹 응용을 위해, 속도의 개선과 시스템 독립적인 function들로 구성되는 parser와 응용에 적합한 API의 설계에 주안점을 두었으며, 추후 XML parser function들과 API 설계를 위해 가능한 적은 수정을 통하여 재사용이 가능하도록 하는데 또한 주안점을 두고 있다.

  • PDF

A Study on Parallel Operation of PWM Converter for Auxiliary Power Supply of High Speed Train (고속전철 보조전원장치용 PWM 컨버터의 병렬운전에 관한 연구)

  • Kim, Yeon-Chung;O, Geun-U;Won, Chung-Yeon;Choe, Jong-Muk;Gi, Sang-U
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.37 no.6
    • /
    • pp.64-72
    • /
    • 2000
  • This paper deals with the parallel operation of two PWM converters for auxiliary block of high speed train. The parallel operation of AC/DC PWM converter controlled by 3-level PWM switching method to operate switching devices to realize a high power factor and reduce the primary side of the transformer current harmonics is proposed. In this paper, it is presented the phase shift technique between two converters switching phase, solution to eliminate the coupling effects due to the transformer and zero crossing detection method for synchronized with the source and controller. Experimental results for laboratory system with TMS320C31 microprocessor and 10[kVA]PWM converter confirm the validity of the proposed algorithm.

  • PDF

An efficient acceleration algorithm of GPU ray tracing using CUDA (CUDA를 이용한 효과적인 GPU 광선추적 가속 알고리즘)

  • Ji, Joong-Hyun;Yun, Dong-Ho;Ko, Kwang-Hee
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.469-474
    • /
    • 2009
  • This paper proposes an real time ray tracing system using optimized kd-tree traversal environment and ray/triangle intersection algorithm. The previous kd-tree traversal algorithms search for the upper nodes in a bottom-up manner. In a such way we need to revisit the already visited parent node or use redundant memory after failing to find the intersected primitives in the leaf node. Thus ray tracing for relatively complex scenes become more difficult. The new algorithm contains stacks implemented on GPU's local memory on CUDA framework, thus elegantly eliminate the problems of previous algorithms. After traversing the node we perform the latest CPU-based ray/triangle intersection algorithm 'Plucker coordinate test', which is further accelerated in massively parallel thanks to CUDA. Plucker test can drastically reduce the computational costs since it does not use barycentric coordinates but only simple test using the relations between a ray and the triangle edges. The entire system is consist of a single ray kernel simply and implemented without introduction of complicated synchronization or ray packets. Consequently our experiment shows the new algorithm can is roughly twice as faster as the previous.

  • PDF

Distributed Test Method using Logical Clock (Logical Clock을 이용한 분산 시험)

  • Choi, Young-Joon;Kim, Myeong-Chul;Seol, Soon-Uk
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.9
    • /
    • pp.469-478
    • /
    • 2001
  • It is difficult to test a distributed system because of the task of controlling concurrent events,. Existing works do not propose the test sequence generation algorithm in a formal way and the amount of message is large due to synchronization. In this paper, we propose a formal test sequence generation algorithm using logical clock to control concurrent events. It can solve the control-observation problem and makes the test results reproducible. It also provides a generic solution such that the algorithm can be used for any possible communication paradigm. In distributed test, the number of channels among the testers increases non-linearly with the number of distributed objects. We propose a new remote test architecture for solving this problem. SDL Tool is used to verify the correctness of the proposed algorithm and it is applied to the message exchange for the establishment of Q.2971 point-to-multipoint call/connection as a case study.

  • PDF

Multiplexing of UHDTV Based on MPEG-2 TS (MPEG-2 TS 기반의 UHDTV 다중화)

  • Jang, Euy-Doc;Park, Dong-Il;Kim, Jae-Gon;Lee, Eung-Don;Cho, Suk-Hee;Choi, Jin-Soo
    • Journal of Broadcast Engineering
    • /
    • v.15 no.2
    • /
    • pp.205-216
    • /
    • 2010
  • In this paper, a method of MPEG-2 Transport Stream (TS) multiplexing for Ultra HDTV (UHDTV) and its design and implementation as a SW tool is described. In practice, UHD video may be divided into several HD videos and each video is encoded in parallel. Therefore, it is necessary to synchronize and multiplex multiple bitstreams encoding each HD video for transmitting and storing UHD video. In this paper, it is assumed that 4 HD videos partitioning a UHD spatially are encoded as H.264/AVC and two 5.0 channel audios are encoded by AC-3. Therefore, 4 H.264/AVC elementary streams (ESs) and 2 AC-3 ESs is mainly considered in the TS multiplexing of UHD. For the carriage of H.264/AVC and AC-3 over MPEG-2 TS, PES packetization and TS multiplexing are designed and implemented based on the extended specification of the MPEG-2 Systems and ATSC (Digital audio compressed standard), respectively. The implemented UHD TS multiplexing tool emulates real time HW operation in the time unit corresponding to the duration of one TS packet transmission in a given TS rate. In particular, in order to satisfy the timing model, the buffers defined in the TS System Target Decoder (T-STD) are monitored and their statuses are considered in the scheduling of TS multiplexing. For UHD multiplexing, two kinds of multiplexing structures, which are UHD re-multiplexing and UHD program multiplexing, are implemented and their strength and weakness are investigated. The developed UHD TS multiplexing tool is tested and verified in terms of the syntax and semantics conformance and functionalities by using a commercial analyzer and real-time presentation tools.