• Title/Summary/Keyword: Memory improvement

Search Result 691, Processing Time 0.026 seconds

Efficient Schemes for Scaling Ring Bandwidth in Ring-based Multiprocessor System (링 구조 다중프로세서 시스템에서 링 대역폭 확장을 위한 효율적인 방안)

  • Jang, Byoung-Soon;Chung, Sung-Woo;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.2
    • /
    • pp.177-187
    • /
    • 2000
  • In the past several years, many systems which adopted ring topology with high-speed unidirectional point-to-point links have emerged to overcome the limit of bus for interconnection network of clustered multiprocessor system. However, rapid increase of processor speed and performance improvement of local bus and memory system limit scalability of system with point-to-point link of standard bandwidth. Therefore, necessity to extend bandwidth is emphasized. In this paper, we adopt PANDA system as base model, which is clustering-based multiprocessor system. By simulating a model adopting commercial processor and local bus specification, we show that point-to-point link is bottleneck of system performance, and bandwidth expansion by more than 200% is needed. To expand bandwidth of interconnection network, it needs excessive design cost and time to develop new point-to-point link with doubled bandwidth. As an alternative to double bandwidth, we propose several ways to implement dual ring -simple dual ring, transaction-separated dual ring, direction-separated dual ring- by using off-the-shelf point-to-point links with IEEE standard bandwidth. We analyze pros. and cons. of each model compared with doubled-bandwidth single ring by simulation.

  • PDF

A Scalable OWL Horst Lite Ontology Reasoning Approach based on Distributed Cluster Memories (분산 클러스터 메모리 기반 대용량 OWL Horst Lite 온톨로지 추론 기법)

  • Kim, Je-Min;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.307-319
    • /
    • 2015
  • Current ontology studies use the Hadoop distributed storage framework to perform map-reduce algorithm-based reasoning for scalable ontologies. In this paper, however, we propose a novel approach for scalable Web Ontology Language (OWL) Horst Lite ontology reasoning, based on distributed cluster memories. Rule-based reasoning, which is frequently used for scalable ontologies, iteratively executes triple-format ontology rules, until the inferred data no longer exists. Therefore, when the scalable ontology reasoning is performed on computer hard drives, the ontology reasoner suffers from performance limitations. In order to overcome this drawback, we propose an approach that loads the ontologies into distributed cluster memories, using Spark (a memory-based distributed computing framework), which executes the ontology reasoning. In order to implement an appropriate OWL Horst Lite ontology reasoning system on Spark, our method divides the scalable ontologies into blocks, loads each block into the cluster nodes, and subsequently handles the data in the distributed memories. We used the Lehigh University Benchmark, which is used to evaluate ontology inference and search speed, to experimentally evaluate the methods suggested in this paper, which we applied to LUBM8000 (1.1 billion triples, 155 gigabytes). When compared with WebPIE, a representative mapreduce algorithm-based scalable ontology reasoner, the proposed approach showed a throughput improvement of 320% (62k/s) over WebPIE (19k/s).

Mouse Single Oral Dose Toxicity Test of Chongmyung-tang Aqueous Extracts (총명탕(聰明湯) 열수(熱水) 추출물의 마우스 단회 경구투여 독성 실험)

  • Hwang, Ha-Yeon;Jang, Woo-Seok;Baek, Kyung-Min
    • The Journal of Internal Korean Medicine
    • /
    • v.35 no.1
    • /
    • pp.37-49
    • /
    • 2014
  • Objectives & Methods : The objective of this study was to evaluate the single oral dose toxicity of Chongmyung-tang (CMT) in ICR mice. Korean traditional herbal prescription CMT has traditionally been used as a neuroprotective for treatment of learning disability and memory improvement. CMT, lyophilized aqueous extracts (yield=9.7%) were administered to female and male mice with oral dose of 2,000, 1,000 and 500 mg/kg (body weight) according to the recommendation of Korea Food and Drug Administration (KFDA) Guidelines. Animals were monitored for mortality, changes in body weight, clinical signs and gross observation during 14 days after administration upon necropsy; organ weight and histopathology of 14 principle organs were examined. Results : We could not find any CMT extracts treatment related mortalities, clinical signs, changes in body and organ weight, or gross and histopathological observations against 14 principle organs up to 2,000 mg/kg in both female and male mice, except for some accidental sporadic findings which did not show any obvious dose-relations and most of which also demonstrated in both the female and male vehicle control mice in this experiments. Conclusions : Based on the results of this experiment, the 50% lethal dose ($LD_{50}$) and approximate lethal dose (ALD) of CMT extracts after single oral treatment in female and male mice can be considered to be over 2,000 mg/kg, and is likely to be safe in humans.

A Distributed VOD Server Based on Virtual Interface Architecture and Interval Cache (버추얼 인터페이스 아키텍처 및 인터벌 캐쉬에 기반한 분산 VOD 서버)

  • Oh, Soo-Cheol;Chung, Sang-Hwa
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.10
    • /
    • pp.734-745
    • /
    • 2006
  • This paper presents a PC cluster-based distributed VOD server that minimizes the load of an interconnection network by adopting the VIA communication protocol and the interval cache algorithm. Video data is distributed to the disks of the distributed VOD server and each server node receives the data through the interconnection network and sends it to clients. The load of the interconnection network increases because of the large amount of video data transferred. This paper developed a distributed VOD file system, which is based on VIA, to minimize cost using interconnection network when accessing remote disks. VIA is a user-level communication protocol removing the overhead of TCP/IP. This papers also improved the performance of the interconnection network by expanding the maximum transfer size of VIA. In addition, the interval cache reduces traffic on the interconnection network by caching, in main memory, the video data transferred from disks of remote server nodes. Experiments using the distributed VOD server of this paper showed a maximum performance improvement of 21.3% compared with a distributed VOD server without VIA and the interval cache, when used with a four-node PC cluster.

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

  • Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.9
    • /
    • pp.21-28
    • /
    • 2015
  • In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.

Properties of $RuO_2$ Thin Films for Bottom Electrode in Ferroelectric Memory by Using the RF Sputtering (RF Sputtering 법으로 제작한 강유전체 메모리의 하부전극용$RuO_2$ 박막의 특성에 관한 연구)

  • 강성준;정양희
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.5
    • /
    • pp.1127-1134
    • /
    • 2000
  • $RuO_2$ thin films are prepared by RP magnetron reactive sputtering and their characteristics of crystalliBation,microstructure, surface roughness and resistivity are studied with various O2/(Ar+O2)ratios and substrate temperatures. As O2/(Ar+O2) ratio decreases and substrate temperature increases, the preferred growing plane of$RuO_2$ thin films are changed from (110) to (101) plane. With increase of the 021(Ar+O2) ratio from 2075 to 50%, the surface roughness and the resistivity of $RuO_2$ thin films increase from 2.38nm to 7.81nm, and from $103.6 \mu\Omega-cm\; to \; 227 \mu\Omega-cm$, respectively, but the deposition rate decreases from 47nm/min to 17nm/min. On the other hand, as the substrate temperature increases from room temperature to$500^{\circ}C$, resistivity decreases from $210.5 \mu\Omega-cm\; to \; 93.7\mu\Omega-cm$. $RuO_2$ thin film deposited at $300^{\circ}C$ shows a excellent surface roughness of 2.38 m. As the annealing temperature increases in the range between $400^{\circ}C$ and $650^{\circ}C$, the resistivity decreases because of the improvement of crystallinity. We find that RuO$_2$ thin film deposited at 20% of 02/(Ar+O2) ratio and $300^{\circ}C$ of substrate temperature shows excellent combination of surface smoothness and low resistivity so that it is well qualified for bottom electrode for ferroelectric thin films.

  • PDF

A Study on the Procedure for Constructing Linked Open Data of Records Information by Using Open Source Tool (오픈소스 도구를 이용한 기록정보 링크드 오픈 데이터 구축 절차 연구)

  • Ha, Seung Rok;Yim, Jin Hee;Rieh, Hae-young
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.1
    • /
    • pp.341-371
    • /
    • 2017
  • Recently, the web service environment has changed from document-centered to data-oriented focus, and the Linked Open Data (LOD) exists at the core of the new environment. Specific procedures and methods were examined to build the LOD of records information in accordance with this trend. With the service sustainability of small-scale archive in consideration, an exemplification on LOD building process by utilizing open source software was developed in this paper. To this end, a 5-step service framework for LOD construction was proposed and applied to a collection of diary records from 'Human and Memory Archive'. Proof of Concept (POC) utilizing open source softwares, Protege and Apache Jena Fuseki, was conducted according to the proposed 5 step framework. After establishing the LOD of record information by utilizing the open source software, the connection with external LOD through interlinking and SPARQL search has been successfully performed. In addition, archives' considerations for LOD construction, including improvement on the quality of content information, the role of the archivist, were suggested based on the understanding obtained through the LOD construction process of records information.

Utilizing Channel Bonding-based M-n and Interval Cache on a Distributed VOD Server (효율적인 분산 VOD 서버를 위한 Channel Bonding 기반 M-VIA 및 인터벌 캐쉬의 활용)

  • Chung, Sang-Hwa;Oh, Soo-Cheol;Yoon, Won-Ju;kim, Hyun-Pil;Choi, Young-In
    • The KIPS Transactions:PartA
    • /
    • v.12A no.7 s.97
    • /
    • pp.627-636
    • /
    • 2005
  • This paper presents a PC cluster-based distributed video on demand (VOD) server that minimizes the load of the interconnection network by adopting channel bonding-based MVIA and the interval cache algorithm Video data is distributed to the disks of each server node of the distributed VOD server and each server node receives the data through the interconnection network and sends it to clients. The load of the interconnection network increases because of the large volume of video data transferred. We adopt two techniques to reduce the load of the interconnection network. First, an Msupporting channel bonding technique is adopted for the interconnection network. n which is a user-level communication protocol that reduces the overhead of the TCP/IP protocol in cluster systems, minimizes the time spent in communicating. We increase the bandwidth of the interconnection network using the channel bonding technique with MThe channel bonding technique expands the bandwidth by sending data concurrently through multiple network cards. Second, the interval cache reduces traffic on the interconnection network by caching the video data transferred from the remote disks in main memory Experiments using the distributed VOD server of this paper showed a maximum performance improvement of $30\%$ compared with a distributed VOD server without channel bonding-based MVIA and the interval cache, when used with a four-node PC cluster.

Parallel Cell-Connectivity Information Extraction Algorithm for Ray-casting on Unstructured Grid Data (비정렬 격자에 대한 광선 투사를 위한 셀 사이 연결정보 추출 병렬처리 알고리즘)

  • Lee, Jihun;Kim, Duksu
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.1
    • /
    • pp.17-25
    • /
    • 2020
  • We present a novel multi-core CPU based parallel algorithm for the cell-connectivity information extraction algorithm, which is one of the preprocessing steps for volume rendering of unstructured grid data. We first check the synchronization issues when parallelizing the prior serial algorithm naively. Then, we propose a 3-step parallel algorithm that achieves high parallelization efficiency by removing synchronization in each step. Also, our 3-step algorithm improves the cache utilization efficiency by increasing the spatial locality for the duplicated triangle test process, which is the core operation of building cell-connectivity information. We further improve the efficiency of our parallel algorithm by employing a memory pool for each thread. To check the benefit of our approach, we implemented our method on a system consisting of two octa-core CPUs and measured the performance. As a result, our method shows continuous performance improvement as we add threads. Also, it achieves up to 82.9 times higher performance compared with the prior serial algorithm when we use thirty-two threads (sixteen physical cores). These results demonstrate the high parallelization efficiency and high cache utilization efficiency of our method. Also, it validates the suitability of our algorithm for large-scale unstructured data.

Performance Evaluation of the GPU Architecture Executing Parallel Applications (병렬 응용프로그램 실행 시 GPU 구조에 따른 성능 분석)

  • Choi, Hong-Jun;Kim, Cheol-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.10-21
    • /
    • 2012
  • The role of GPU has evolved from graphics-specific processing to general-purpose processing with the development of unified shader core architecture. Especially, execution methods for general-purpose parallel applications using GPU have been researched intensively, since the parallel hardware architecture can be utilized efficiently when the parallel applications are executed. However, current GPU architecture has limitations in executing general-purpose parallel applications, since the GPU is not specialized for general-purpose computing yet. To improve the GPU performance when general-purpose parallel applications are executed, the GPU architecture should be evolved. In this work, we analyze the GPU performance according to the architecture varying the number of cores and clock frequency. Our simulation results show that the GPU performance improves by up to 125.8% and 16.2% as the number of cores increases and the clock frequency increases, respectively. However, note that the improvement of the GPU performance is saturated even though the number of cores increases and the clock frequency increases continuously, since the data cannot be provided to the GPU due to the limit of memory bandwidth. Consequently, to accomplish high performance effectiveness on GPU, computational resources must be more suitably considered.