• Title/Summary/Keyword: distributed memory system

Search Result 212, Processing Time 0.071 seconds

Efficient Implementation of an Extreme Eigenvalue Problem on Cray T3E (Cray T3E에서 극한 고유치문제의 효과적인 수행)

  • 김선경
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.480-483
    • /
    • 2000
  • 공학의 많은 응용분야에서 큰 회소 행렬(Large Sparse Matrices)에 대한 가장 작거나 또는 가장 큰 고유치(Eigenvalues)들을 요구하게 되는데, 이때 많이 이용되는 것은 Krylov Subspace로의 Projection방법이다. 대칭 행렬에 대해서는 Lanczos방법을, 비대칭 행렬에 대해서는 Biorhtogonal Lanczos방법을 이용할 수 있다. 이러한 기존의 알고리즘들은 새롭게 제안되는 병렬처리 시스템에서 효과적이지 못하다. 많은 프로세서를 가지는 병렬처리 컴퓨터 중에서도 분산 기억장치 시스템(Distributed Memory System)에서는 프로세서들 사이의 Data Communication에 필요한 시간을 줄이도록 해야한다. 본 논문에서는 기존의 Lanczos 알고리즘을 수정함으로써, 알고리즘의 동기점(Synchronization Point)을 줄이고 병렬화를 위한 입상(Granularity)을 증가시켜서 MPP인 Cray T3E에서 Data Communication에 필요한 시간을 줄인다. 많은 프로세서를 사용하는 경우 수정된 알고리즘이 기존의 알고리즘에 비해 더 나은 speedup을 보여준다.

  • PDF

Design of a Pipeline Processor for the Automated ECG Diagnosis in Real Time (실시간 심전도 자동진단을 위한 파이프라인 프로세서의 설계)

  • 이경중;윤형로;이명호
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.8
    • /
    • pp.1217-1226
    • /
    • 1989
  • This paper describes a design of hardware system for real time automatic diagnosis of ECG arrhythmia based on pipeline processor consisting of three microcomputer. ECG data is acquisited by 12 bit A/D converter with hardware QRS triggered detector. Four diagnostic parameters-heart rate, morpholigy, axis, and ST segment-are used for the classification and the diagnosis of arrhythmia. The functions of the main CPU were distributed and processed with three microcomputers. Therefore the effective data process and the real time process using microcomputer can be obtained. The interconnection structure consisting of two common memory unit is designed to decrease the delay time caused by data transfer between processors and be which the delay time can be taken 1% of one clock period.

  • PDF

A design of pipeline processor for real time ECG process (실시간 심전도 처리를 위한 파이프라인 프로세서의 설계)

  • Lee, Kyoung-Joong;Lee, Yoon-Sun;Yoon, Hyoung-Ro;Lee, Myoung-Ho
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.731-733
    • /
    • 1988
  • This paper describes a design of hardware system for real time automatic diagnosis of ECG arrhythmia based on pipeline processor consisting of the three microcomputer. ECG data is acquisited by 12 bit A/D converter with hardware QRS triggered detector. Four diagnostic parameters - heart rate, morphology, axis, and ST segment - are used for the classification and the diagnosis of arrhythmia. The functions of the main CPU were distributed and processed with three microcomputers. There-fore the effective data process and the real time process using microcomputer can be obtained. The interconnection structure consisting of two common memory units is designed to decrease the delay time caused by data transfer between processors and by which the delay time can be taken 1 % of one clock period.

  • PDF

Self-Organized Ditributed Networks as Identifier of Nonlinear Systems (비선형 시스템 식별기로서의 자율분산 신경망)

  • Choi, Jong-Soo;Kim, Hyong-Suk;Kim, Sung-Joong;Choi, Chang-Ho
    • Proceedings of the KIEE Conference
    • /
    • 1995.07b
    • /
    • pp.804-806
    • /
    • 1995
  • This paper discusses Self-organized Distributed Networks(SODN) as identifier of nonlinear dynamical systems. The structure of system identification employs series-parallel model. The identification procedure is based on a discrete-time formulation. The learning with the proposed SODN is fast and precise. Such properties arc caused from the local learning mechanism. Each local networks learns only data in a subregion. Large number of memory requirements and low generalization capability for the untrained region, which are drawbacks of conventional local network learning, are overcomed in the SODN. Through extensive simulation, SODN is shown to be effective for identification of nonlinear dynamical systems.

  • PDF

Distributed Simulation for Cold rolling Control System in CEMTool (CEMTool환경에서의 분산시뮬레이션의 구현 및 냉간압연 제어시스템에서의 응용)

  • Lee, Tairi;Lee, Young-Sam;Lee, Kwan-Ho;Kwon, Wook-Hyun
    • Proceedings of the KIEE Conference
    • /
    • 2003.07d
    • /
    • pp.2591-2593
    • /
    • 2003
  • 본 논문에서는 CEMTool환경에서 수행된는 일반적 형태 분산 시뮬레이터의 구조를 제안한다. 먼저, 분산시뮬레이션을 할 수 있도록 SIMTool에서 전체 시스템을 패럴랠블록의 형식으로 여러개의 서브시스템으로 분할한다. 그런후에 CEMTool환경에서 분할된 시스템에 대하여 초기화, one step ahead 시뮬레이션, distribute와 ordering의 과정을 진행한 후 각각의 서브시스템에 대하여 독립적인 C code과 실행파일을 생성한다. 여러 대의 PC에서는 분할된 각각의 서브시스템을 독자적으로 실행시키는 동시에 서로간에 reflective memory를 통해 데이터를 주고반는다. 본 논문의 실험대상인 냉간압연 시스템은 각각의 서브시스템 내부의 계산량이 통신량보다 훨씬 많기 때문에 분산처리를 하기에 아주 적합하며, 본 논문에서는 냉간압연 시스템에 대한 분산시뮬레이션의 결과를 분석하고 제시된 방법으로 확실한 속도향상의 결과를 보여준다는 것을 설명하고자 한다.

  • PDF

A Scalable OWL Horst Lite Ontology Reasoning Approach based on Distributed Cluster Memories (분산 클러스터 메모리 기반 대용량 OWL Horst Lite 온톨로지 추론 기법)

  • Kim, Je-Min;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.307-319
    • /
    • 2015
  • Current ontology studies use the Hadoop distributed storage framework to perform map-reduce algorithm-based reasoning for scalable ontologies. In this paper, however, we propose a novel approach for scalable Web Ontology Language (OWL) Horst Lite ontology reasoning, based on distributed cluster memories. Rule-based reasoning, which is frequently used for scalable ontologies, iteratively executes triple-format ontology rules, until the inferred data no longer exists. Therefore, when the scalable ontology reasoning is performed on computer hard drives, the ontology reasoner suffers from performance limitations. In order to overcome this drawback, we propose an approach that loads the ontologies into distributed cluster memories, using Spark (a memory-based distributed computing framework), which executes the ontology reasoning. In order to implement an appropriate OWL Horst Lite ontology reasoning system on Spark, our method divides the scalable ontologies into blocks, loads each block into the cluster nodes, and subsequently handles the data in the distributed memories. We used the Lehigh University Benchmark, which is used to evaluate ontology inference and search speed, to experimentally evaluate the methods suggested in this paper, which we applied to LUBM8000 (1.1 billion triples, 155 gigabytes). When compared with WebPIE, a representative mapreduce algorithm-based scalable ontology reasoner, the proposed approach showed a throughput improvement of 320% (62k/s) over WebPIE (19k/s).

Energy/Distance Estimation-based and Distributed Selection/Migration of Cluster Heads in Wireless Sensor Networks (센서 네트워크의 에너지 및 거리 추정 기반 분산 클러스터 헤드 선정과 이주 방법)

  • Kim, Dong-Woo;Park, Jong-Ho;Lee, Tae-Jin
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.44 no.3 s.357
    • /
    • pp.18-25
    • /
    • 2007
  • In sensor networks, sensor nodes have limited computational capacity, power and memory. Thus energy efficiency is one of the most important requirements. How to extend the lifetime of wireless sensor networks has been widely discussed in recent years. However, one of the most effective approaches to cope with power conservation, network scalability, and load balancing is clustering technique. The function of a cluster head is to collect and route messages of all the nodes within its cluster. Cluster heads must be changed periodically for low energy consumption and load distribution. In this paper, we propose an energy-aware cluster head selection algorithm and Distance Estimation-based distributed Clustering Algorithm (DECA) in wireless sensor networks, which exchanges cluster heads for less energy consumption by distance estimation. Our simulation result shows that DECA can improve the system lifetime of sensor networks up to three times compared to the conventional scheme.

Speech Interactive Agent on Car Navigation System Using Embedded ASR/DSR/TTS

  • Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.181-192
    • /
    • 2004
  • This paper presents an efficient speech interactive agent rendering smooth car navigation and Telematics services, by employing embedded automatic speech recognition (ASR), distributed speech recognition (DSR) and text-to-speech (ITS) modules, all while enabling safe driving. A speech interactive agent is essentially a conversational tool providing command and control functions to drivers such' as enabling navigation task, audio/video manipulation, and E-commerce services through natural voice/response interactions between user and interface. While the benefits of automatic speech recognition and speech synthesizer have become well known, involved hardware resources are often limited and internal communication protocols are complex to achieve real time responses. As a result, performance degradation always exists in the embedded H/W system. To implement the speech interactive agent to accommodate the demands of user commands in real time, we propose to optimize the hardware dependent architectural codes for speed-up. In particular, we propose to provide a composite solution through memory reconfiguration and efficient arithmetic operation conversion, as well as invoking an effective out-of-vocabulary rejection algorithm, all made suitable for system operation under limited resources.

  • PDF

Utilizing Channel Bonding-based M-n and Interval Cache on a Distributed VOD Server (효율적인 분산 VOD 서버를 위한 Channel Bonding 기반 M-VIA 및 인터벌 캐쉬의 활용)

  • Chung, Sang-Hwa;Oh, Soo-Cheol;Yoon, Won-Ju;kim, Hyun-Pil;Choi, Young-In
    • The KIPS Transactions:PartA
    • /
    • v.12A no.7 s.97
    • /
    • pp.627-636
    • /
    • 2005
  • This paper presents a PC cluster-based distributed video on demand (VOD) server that minimizes the load of the interconnection network by adopting channel bonding-based MVIA and the interval cache algorithm Video data is distributed to the disks of each server node of the distributed VOD server and each server node receives the data through the interconnection network and sends it to clients. The load of the interconnection network increases because of the large volume of video data transferred. We adopt two techniques to reduce the load of the interconnection network. First, an Msupporting channel bonding technique is adopted for the interconnection network. n which is a user-level communication protocol that reduces the overhead of the TCP/IP protocol in cluster systems, minimizes the time spent in communicating. We increase the bandwidth of the interconnection network using the channel bonding technique with MThe channel bonding technique expands the bandwidth by sending data concurrently through multiple network cards. Second, the interval cache reduces traffic on the interconnection network by caching the video data transferred from the remote disks in main memory Experiments using the distributed VOD server of this paper showed a maximum performance improvement of $30\%$ compared with a distributed VOD server without channel bonding-based MVIA and the interval cache, when used with a four-node PC cluster.

Application of Correlation-Aided DSA(CDSA) Technique to Fast Cell Search in IMT-2000 W-CDMA Systems.

  • Kim, Byoung-Hoon;Jeong, Byeong-Kook;Lee, Byeong-Gi
    • Journal of Communications and Networks
    • /
    • v.2 no.1
    • /
    • pp.58-68
    • /
    • 2000
  • In this paper we introduce the correlation-aided distributed sample acquisition (CDSA) scheme for fast cell search in IMT-2000 W-CDMA cellular system. The proposed scheme incorporates the state symbol correlation process into the comparison-correction based synchronization process of the original DSA scheme to enable fast acquisition even under very poor channel environment. for its realization, each mobile station (MS) has to store in its memory a set of state sample sequences. which are determined by the long-period scrambling sequences used in the system and the sampling interval of the state samples. CDSA based cell search is carried out in two stages : First, the MS first acquires the slot timing by using the primary synch code (PSC) and then identifies the igniter code which conveys the state samples of the current cell . Secondly. the MS identifies the scrambling code and frame timing by taking the comparison-correction based synchronization approach and, if the identification is not done satisfactorily within preset time. it initiates the state symbol correlation process which correlates the received symbol sequence with the pre-stored state sample sequences for a successful identification. As the state symbol SNR is relatively high. the state symbol correlation process enables reliable synchronization even in very low chip-SNR environment. Simulation results show that the proposed CDSA scheme outperforms the 3GPP 3-step approach, requiring the signal power of about 7 dB less for achieving the same acquisition time performance in low-SNR environments. Furthermore, it turns out very robust in the typical synchronization environment where large frequency offset exists.

  • PDF