• Title/Summary/Keyword: Multi-processors

Search Result 213, Processing Time 0.024 seconds

A Study of Air Pollution Monitoring System using Gossiping Route Protocol in wireless Sensor Network (Gossiping Route Protocol을 이용한 공기오염감지시스템에 관한 연구)

  • Park, Yong-Man;Kim, Hie-Sik;Kim, Gyu-Sik;Lee, Moon-Gyu;Ayurzana, Odgerel;Kwon, Jong-Won;Koo, Sang-Jun;Oh, Shi-Hwan;Kim, Dong-Ki;Jo, Ik-Kyun;Park, Jeong-Hun
    • Proceedings of the KIEE Conference
    • /
    • 2007.10a
    • /
    • pp.485-486
    • /
    • 2007
  • Wireless Sensor Networking is state of the art technology that has a wide range of potential applications. Sensor network generally consists of a large number of distributed nodes that organize themselves into a multi-hop wireless network. Each node has one or more sensors, embedded processors and low-power radios, and is normally battery operated because of small size. In this paper wireless sensor networking technology applies to the environment monitoring system in the underground. This system can monitor a pollution level of the underground in realtime for keeping up a comfortable environment.

  • PDF

A Parallel Processing System for Visual Media Applications (시각매체를 위한 병렬처리 시스템)

  • Lee, Hyung;Pakr, Jong-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.1A
    • /
    • pp.80-88
    • /
    • 2002
  • Visual media(image, graphic, and video) processing poses challenge from several perpectives, specifically from the point of view of real-time implementation and scalability. There have been several approaches to obtain speedups to meet the computing demands in multimedia processing ranging from media processors to special purpose implementations. A variety of parallel processing strategies are adopted in these implementations in order to achieve the required speedups. We have investigated a parallel processing system for improving the processing speed o f visual media related applications. The parallel processing system we proposed is similar to a pipelined memory stystem(MAMS). The multi-access memory system is made up of m memory modules and a memory controller to perform parallel memory access with a variety of combinations of 1${\times}$pq, pq${\times}$1, and p${\times}$q subarray, which improves both cost and complexity of control. Facial recognition, Phong shading, and automatic segmentation of moving object in image sequences are some that have been applied to the parallel processing system and resulted in faithful processing speed. This paper describes the parallel processing systems for the speedup and its utilization to three time-consuming applications.

Study on the Implementation of a Virtual Switch using Intel DPDK (Intel DPDK를 이용한 가상스위치의 구현에 관한 연구)

  • Jeong, Gab-Joong;Choi, Kang-Il
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.2
    • /
    • pp.211-218
    • /
    • 2015
  • This paper describes the implementation of the accelerated virtual switch using Intel DPDK(Data Plane Development Kit), and evaluates the virtual network functions of the virtual switch which is one of the most important components to build a virtual network for cloud computing. Nowadays, new information service platforms are appeared from the interconnection of intelligent IT systems like IoT(Internet of Things). And many companies want to use the new service platform for their new application service. The companies can apply there new service early which needs small investment and responses adaptively to the fast change of consumer environment. Using cloud computing technology, the new business service can be introduced as a commercial IT service for the time to market. In this study, an implementation and investigation were performed for the accelerated virtual switch, called Intel DPDK virtual switch, which is using multi processors in network interface card for virtual network functions. It can be useful for Internet-oriented companies to leverage the new cloud service and businesses for its creativeness.

Design And Implementation of Linux Based Parallel Media Stream Server System (리눅스 기반의 고성능 병렬 미디어 스트림 서버 설계 및 구현)

  • 김서균;김경훈;류재상;남지승
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.287-292
    • /
    • 2001
  • Multimedia service systems should have efficient capacity to serve the growing clients and new data. In the general streaming services, users can endure the small amount of time delay at the beginning of service. But they want to have good quality of service. A streaming server tries to transfer video files to clients from a repository of files in real time. The server must guarantee concurrent and uninterrupted delivery of each video stream requested from clients. To achieve its purpose, many stream servers adopt multi-processors, sufficient memory, and RAID or SAN in their systems. In this paper, we propose a Linux-based parallel media streaming server. It is superior to the other systems in the storing structure, fault-tolerance, and service capacity. Since this system supports the web interlace, users can operate easily through the www. This system uses unique striping policy to distribute multimedia files into the parallel storage nodes. If a service request occurs, each storage node transmits striped files concurrently to the client. Its performance is better than the single media streaming service because of the parallel architecture.

  • PDF

Improving Multi-DNN Computational Performance of Embedded Multicore Processors through a Global Queue (글로벌 큐를 통한 임베디드 멀티코어 프로세서의 멀티 DNN 연산 성능 향상)

  • Cho, Ho-jin;Kim, Myung-sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.6
    • /
    • pp.714-721
    • /
    • 2020
  • DNN is expanding its use in embedded systems such as robots and autonomous vehicles. For high recognition accuracy, computational complexity is greatly increased, and multiple DNNs are running aperiodically. Therefore, the ability processing multiple DNNs in embedded environments is a crucial issue. Accordingly, multicore based platforms are being released. However, most DNN models are operated in a batch process, and when multiple DNNs are operated in multicore together, the execution time deviation between each DNN may be large and the end-to-end execution time of the whole DNNs could be long depending on how they are allocated to the cores. In this paper, we solve these problems by providing a framework that decompose each DNN into individual layers and then distribute to multicores through a global queue. As a result of the experiment, the total DNN execution time was reduced by 31%, and when operating multiple identical DNNs, the deviation in execution time was reduced by up to 95.1%.

Performance Improvement of Single Chip Multiprocessor using Concurrent Branch Execution (분기 동시 수행을 이용한 단일 칩 멀티프로세서의 성능 개선)

  • Lee, Seung-Ryul;Kim, Jun-Shik;Choi, Jae-Hyeok;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.2
    • /
    • pp.61-71
    • /
    • 2007
  • The instruction level parallelism, which has been used to improve the performance of processors, expose its limit. The change of a control flow by a branch miss prediction is one of the obstacles that restrict the instruction level parallelism. The single chip multiprocessors have been developed to utilize the thread level parallelism. However, we could not use the maximum performance of the single chip multiprocessor in case of executing the coded programs without considering the multi-thread. In order to overcome the two performance degradation factors, in this paper, we suggest the concurrent branch execution method that applies to the multi-path execution method at a single chip multiprocessor. We executes all two flows of the conditional branch using the idle core processor. Through this, we can improve the processor's efficiency with blocking the control flow termination by the branch instruction and reducing the idle time. We analyze the effects of concurrent branch execution proposed in this paper through the simulation. As a result of that, concurrent branch execution reduces about 20% of idle time and improves the maximum 10% of the branch prediction accuracy. We show that our scheme improves the overall performance of maximum 39% compared to the normal single chip multiprocessor and maximum 27% compared to the superscalar processor.

Run-time Memory Optimization Algorithm for the DDMB Architecture (DDMB 구조에서의 런타임 메모리 최적화 알고리즘)

  • Cho, Jeong-Hun;Paek, Yun-Heung;Kwon, Soo-Hyun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.5 s.102
    • /
    • pp.413-420
    • /
    • 2006
  • Most vendors of digital signal processors (DSPs) support a Harvard architecture, which has two or more memory buses, one for program and one or more for data and allow the processor to access multiple words of data from memory in a single instruction cycle. We already addressed how to efficiently assign data to multi-memory banks in our previous work. This paper reports on our recent attempt to optimize run-time memory. The run-time environment for dual data memory banks (DBMBs) requires two run-time stacks to control activation records located in two memory banks corresponding to calling procedures. However, activation records of two memory banks for a procedure are able to have different size. As a consequence, dual run-time stacks can be unbalanced whenever a procedure is called. This unbalance between two memory banks causes that usage of one memory bank can exceed the extent of on-chip memory area although there is free area in the other memory bank. We attempt balancing dual run-time slacks to enhance efficiently utilization of on-chip memory in this paper. The experimental results have revealed that although our algorithm is relatively quite simple, it still can utilize run-time memories efficiently; thus enabling our compiler to run extremely fast, yet minimizing the usage of un-time memory in the target code.

Hardware-Software Cosynthesis of Multitask Multicore SoC with Real-Time Constraints (실시간 제약조건을 갖는 다중태스크 다중코어 SoC의 하드웨어-소프트웨어 통합합성)

  • Lee Choon-Seung;Ha Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.592-607
    • /
    • 2006
  • This paper proposes a technique to select processors and hardware IPs and to map the tasks into the selected processing elements, aming to achieve high performance with minimal system cost when multitask applications with real-time constraints are run on a multicore SoC. Such technique is called to 'Hardware-Software Cosynthesis Technique'. A cosynthesis technique was already presented in our early work [1] where we divide the complex cosynthesis problem into three subproblems and conquer each subproblem separately: selection of appropriate processing components, mapping and scheduling of function blocks to the selected processing component, and schedulability analysis. Despite good features, our previous technique has a serious limitation that a task monopolizes the entire system resource to get the minimum schedule length. But in general we may obtain higher performance in multitask multicore system if independent multiple tasks are running concurrently on different processor cores. In this paper, we present two mapping techniques, task mapping avoidance technique(TMA) and task mapping pinning technique(TMP), which are applicable for general cases with diverse operating policies in a multicore environment. We could obtain significant performance improvement for a multimedia real-time application, multi-channel Digital Video Recorder system and for randomly generated multitask graphs obtained from the related works.

Advanced Victim Cache with Processor Reuse Information (프로세서의 재사용 정보를 이용하는 개선된 고성능 희생 캐쉬)

  • Kwak Jong Wook;Lee Hyunbae;Jhang Seong Tae;Jhon Chu Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.12
    • /
    • pp.704-715
    • /
    • 2004
  • Recently, a single or multi processor system uses the hierarchical memory structure to reduce the time gap between processor clock rate and memory access time. A cache memory system includes especially two or three levels of caches to reduce this time gap. Moreover, one of the most important things In the hierarchical memory system is the hit rate in level 1 cache, because level 1 cache interfaces directly with the processor. Therefore, the high hit rate in level 1 cache is critical for system performance. A victim cache, another high level cache, is also important to assist level 1 cache by reducing the conflict miss in high level cache. In this paper, we propose the advanced high level cache management scheme based on the processor reuse information. This technique is a kind of cache replacement policy which uses the frequency of processor's memory accesses and makes the higher frequency address of the cache location reside longer in cache than the lower one. With this scheme, we simulate our policy using Augmint, the event-driven simulator, and analyze the simulation results. The simulation results show that the modified processor reuse information scheme(LIVMR) outperforms the level 1 with the simple victim cache(LIV), 6.7% in maximum and 0.5% in average, and performance benefits become larger as the number of processors increases.

Parallel Range Query processing on R-tree with Graphics Processing Units (GPU를 이용한 R-tree에서의 범위 질의의 병렬 처리)

  • Yu, Bo-Seon;Kim, Hyun-Duk;Choi, Won-Ik;Kwon, Dong-Seop
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.5
    • /
    • pp.669-680
    • /
    • 2011
  • R-trees are widely used in various areas such as geographical information systems, CAD systems and spatial databases in order to efficiently index multi-dimensional data. As data sets used in these areas grow in size and complexity, however, range query operations on R-tree are needed to be further faster to meet the area-specific constraints. To address this problem, there have been various research efforts to develop strategies for acceleration query processing on R-tree by using the buffer mechanism or parallelizing the query processing on R-tree through multiple disks and processors. As a part of the strategies, approaches which parallelize query processing on R-tree through Graphics Processor Units(GPUs) have been explored. The use of GPUs may guarantee improved performances resulting from faster calculations and reduced disk accesses but may cause additional overhead costs caused by high memory access latencies and low data exchange rate between GPUs and the CPU. In this paper, to address the overhead problems and to adapt GPUs efficiently, we propose a novel approach which uses a GPU as a buffer to parallelize query processing on R-tree. The use of buffer algorithm can give improved performance by reducing the number of disk access and maximizing coalesced memory access resulting in minimizing GPU memory access latencies. Through the extensive performance studies, we observed that the proposed approach achieved up to 5 times higher query performance than the original CPU-based R-trees.