• Title/Summary/Keyword: 병렬프로세서

Search Result 579, Processing Time 0.019 seconds

An SoC-based Context-Aware System Architecture (SoC 기반 상황인식 시스템 구조)

  • Sohn, Bong-Ki;Lee, Keon-Myong;Kim, Jong-Tae;Lee, Seung-Wook;Lee, Ji-Hyong;Jeon, Jae-Wook;Cho, Jun-Dong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.512-516
    • /
    • 2004
  • Context-aware computing has been attracting the attention as an approach to alleviating the inconvenience in human-computer interaction. This paper proposes a context-aware system architecture to be implemented on an SoC(System-on-a-Chip). The proposed architecture supports sensor abstraction, notification mechanism for context changes, modular development, easy service composition using if-then rules, and flexible context-aware service implementation. It consists of the communication unit, the processing unit, the blackboard, and the rule-based system unit, where the first three components reside in the microprocessor part of the SoC and the rule-based system unit is implemented in hardware. For the proposed architecture, an SoC system has been designed and tested in an SoC development platform called SystemC and the feasibility of the behavoir modules for the microprocessor part has been evaluated by implementing software modules on the conventional computer platform. This SoC-based context-aware system architecture has been developed to apply to mobile intelligent robots which would assist old people at home in a context-aware manner.

Memory Delay Comparison between 2D GPU and 3D GPU (2차원 구조 대비 3차원 구조 GPU의 메모리 접근 효율성 분석)

  • Jeon, Hyung-Gyu;Ahn, Jin-Woo;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.7
    • /
    • pp.1-11
    • /
    • 2012
  • As process technology scales down, the number of cores integrated into a processor increases dramatically, leading to significant performance improvement. Especially, the GPU(Graphics Processing Unit) containing many cores can provide high computational performance by maximizing the parallelism. In the GPU architecture, the access latency to the main memory becomes one of the major reasons restricting the performance improvement. In this work, we analyze the performance improvement of the 3D GPU architecture compared to the 2D GPU architecture quantitatively and investigate the potential problems of the 3D GPU architecture. In general, memory instructions account for 30% of total instructions, and global/local memory instructions constitutes 60% of total memory instructions. Therefore, the performance of the 3D GPU is expected to be improved significantly compared to the 2D GPU by reducing the delay of memory instructions. However, according to our experimental results, the 3D architecture improves the GPU performance only by 2% compared to the 2D architecture due to the memory bottleneck, since the performance reduction due to memory bottleneck in the 3D GPU architecture increases by 245% compared to the 2D architecture. This paper provides the guideline for suitable memory design by analyzing the efficiency of the memory architecture in 3D GPU architecture.

Implementation of Wired Sensor Network Interface Systems (유선 센서 네트워크 인터페이스 시스템 구현)

  • Kim, Dong-Hyeok;Keum, Min-Ha;Oh, Se-Moon;Lee, Sang-Hoon;Islam, Mohammad Rakibul;Kim, Jin-Sang;Cho, Won-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.10
    • /
    • pp.31-38
    • /
    • 2008
  • This paper describes sensor network system implementation for the IEEE 1451.2 standard which guarantees compatibilities between various wired sensors. The proposed system consists of the Network Capable Application Processor(NCAP) in the IEEE 1451.0, the Transducer Independent Interface(TII) in the IEEE 1451.2, the Transducer Electronic Data Sheet(TEDS) and sensors. The research goal of this study is to minimize and optimize system complexity for IC design. The NCAP is implemented using C language in personal computer environment. TII is used in the parallel port between PC and an FPGA application board. Transducer is implemented using Verilog on the FPGA application board. We verified the proposed system architecture based on the standards.

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Efficient Multiple Joins using the Synchronization of Page Execution Time in Limited Processors Environments (한정된 프로세서 환경에서 체이지 실행시간 동기화를 이용한 효율적인 다중 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.732-741
    • /
    • 2001
  • In the relational database systems the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed 개 reduce the execution time Multiple hash join algorithm using allocation tree is one of the most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. This delay problem was solved by using the concept of synchronization of page execution time with we had proposed In this paper the effects of the performance improvements in each node of the allocation tree are extended to the whole allocation tree and the performance evaluation about that is processed. In addition we propose an efficient algorithm for multiple hash joins in limited number of processor environments according to the relationship between the number of input relations in the allocation tree and the number of processors allocated to the tree. Finally. we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

A Study on the Improved Load Sharing rate in Paralleled Operated Lead Acid Battery by Using Microprocessor (마이크로 프로세서를 이용한 축전지의 병렬 운전 부하분담률 개선에 관한 연구)

  • 이정민
    • Proceedings of the KIPE Conference
    • /
    • 2000.07a
    • /
    • pp.493-497
    • /
    • 2000
  • A battery is the device that transforms the chemical energy into the direct-current electrical energy without a mechanical process. Unit cells are connected in series to obtain the required voltage while being connected in parallel to organize capacity for load current. Because the voltage drop down in one set of battery is faster than in two one it may result in the low efficiency of power converter with the voltage drop and cause the system shutdown. However when the system being shutdown. However when the system being driven in parallel a circular-current can be generated,. It is shown that as a result the new batteries are heated by over-charge and over-discharge and the over charge current increases rust of the positive grid and consequently shortens the lifetime of the new batteries. The difference between the new batteries and old ones is the amount of internal resistance. In this paper we can detect the unbalance current using the microprocessor and achieve the balance current by adjusting resistance of each set, The internal resistance of each set becomes constant and the current of charge and discharge comes to be balanced by inserting the external resistance into the system and calculating the change of internal resistance.

  • PDF

Optimistic Colescing Technique for Copy Elimination in ILP Instruction Scheduling (ILP 명령 스케쥴링에서의 복사 제거를 위한 낙관적 융합 기법)

  • Park, Jin-Pyo;Mun, Su-Muk
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.5
    • /
    • pp.692-701
    • /
    • 1999
  • 수퍼스칼라(superscalar)나 VLIW 와 같은 명령어 수준 병렬화(ILP) 프로세서의 성능을 극대화하는 과감한 명령어 스케쥴링은 소프트웨어 파이프라이닝과같은 스케쥴링 과정을 거치면서 일반적인 복사 명령어 제거 기법으로 없앨 수 없는 서로 간섭하는 복사 명령을 많이 만들어내는데 루프 내부에 생성된 이러한 복사명령은 적절한 루프 펼침을 수행하여 간섭관계를 없앰으로서 제거할 수 있다. 본 논문에서는 이와 같이 루프 펼침이 수행된 루프 내부의 복사명령을 제거하는 기법으로 그래프 컬러링 상에 구현한 낙관적 융합기법을 제안한다. 그래프 컬러링에서의 융합기법은 간선의 개수가 많은 노드를 만들어 낼수 있으므로 채색성에 부정적인 영향을 주는 것으로 알려져 왔으나 본 기법에서는 융합되는 노드에 동시에 간섭하는 노드의 간선의 수가 줄어드는 긍정적인 영향을 최대한 이용하여 채색성을 높이고 융합된 노드에 대한 실제 버림(spill)이 일어나는 경우 유효 범위 분절(live range splitting)을 통하여 버림의 부담을 최대한 줄이도록 하였으며 이를 VLIW 스케쥴링 된 SPEC 정수벤치마크 루프내부의 복사 명령 제거에 적용한 결과 제거 가능한 복사 명령의 99%를 제거하면서도 버림명령은 다른 융합 기법과 비교하여 가장 적게 발생하는 우수한 결과를 얻을수 있었다.

PDOCM : Fast Text Compression on MasPar Machine (PDOCM : MasPar머쉰상의 새로운 압축기법과 빠른 텍스트 축약)

  • Min, Yong-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1
    • /
    • pp.40-47
    • /
    • 1995
  • Due to rapid progress in data communications, we are able to acquire the information we need with ease. One means of achieving this is a parallel machine such as the MasPar. Although the parallel machine makes it possible to receive/transmit enormous quantities of data, because of the increasing volume of information that must be processed, it is necessary to transmit only a minimal amount of data bits. This paper suggests a new coding method for the parallel machine, which compresses the data by reducing redundancy. Parallel Dynamic Octal Compact Mapping (PDOCM) compresses at least 1 byte per word, compared with other coding techniques, and achieves a 54.188-fold speedup with 64 processors to transmit 10 million characters.

  • PDF

A Study on the Development of Star Type LAN (Star형 근거리 통신망 개발에 관한 연구)

  • 유황빈;이대영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.13 no.2
    • /
    • pp.160-170
    • /
    • 1988
  • This paper describes the outboard NIU(Network Interface Unit) using microprocessor and the hardware and software of concentrator for constructing star typed LAN(Local Area Network) based on token ring method. The NIU adapter can adapter up to four parallel and serial typed terminals. Because it has PAD function on input output data, any type of terminals can be adapted. Since the concentrator has logical switching circuit which enables data to bypass the faulted NIU adapter, this network prevents communication break. When user transmits and receives data, the concentrator constructs star typed LAN which connct both transmitting and receiving sides. A s result, this network eliminated ring latency time in other NIU exculding transmitting and receiving NIU. So the throughput of this LAN is increased. Because this LAN system consists of several modultes according to it's function, the expansion of function or the modification of method is easy.

  • PDF

A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices (Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계)

  • Kwon Do-All;Chung Tae-Sang
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.12
    • /
    • pp.723-731
    • /
    • 2005
  • The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.