• Title/Summary/Keyword: shared parallel systems

Search Result 68, Processing Time 0.022 seconds

Study of an In-order SMT Architecture and Grouping Schemes

  • Moon, Byung-In;Kim, Moon-Gyung;Hong, In-Pyo;Kim, Ki-Chang;Lee, Yong-Surk
    • International Journal of Control, Automation, and Systems
    • /
    • v.1 no.3
    • /
    • pp.339-350
    • /
    • 2003
  • In this paper, we propose a simultaneous multithreading (SMT) architecture that improves instruction throughput by exploiting instruction level parallelism (ILP) and thread level parallelism (TLP). The proposed architecture issues and completes instructions belonging to the same thread in exact program order. The issue and completion policy greatly reduces the design complexity and hardware cost of our architecture, compared with others that employ out-of-order issue and completion. On the other hand, when the instructions belong to different threads, the issue and completion orders for those instructions may not necessarily be identical to the fetch order. The processor issues instructions simultaneously from multiple threads to functional units by exploiting ILP and TLP, and by dynamic resource sharing. That parallel execution notably improves performance and resource utilization with minimal additional hardware cost over the conventional superscalar processors. This paper proposes an SMT architecture with grouping as well as one without grouping. Without grouping, all threads dynamically and flexibly share most resources. On the other hand, in the SMT architecture with grouping, in which resources and threads are divided into several groups for design simplification, resources are shared only among threads belonging to the same group as those resources. Simulation results show that our processors with four and eight threads improve performance by three or more times over the conventional superscalar processor with comparable execution resources and policies, and that reasonable grouping reduces the design complexity of SMT processors with little negative effect on performance.

40Gb/s Foward Error Correction Architecture for Optical Communication System (광통신 시스템을 위한 40Gb/s Forward Error Correction 구조 설계)

  • Lee, Seung-Beom;Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.101-111
    • /
    • 2008
  • This paper introduces a high-speed Reed-Solomon(RS) decoder, which reduces the hardware complexity, and presents an RS decoder based FEC architecture which is used for 40Gb/s optical communication systems. We introduce new pipelined degree computationless modified Euclidean(pDCME) algorithm architecture, which has high throughput and low hardware complexity. The proposed 16 channel RS FEC architecture has two 8 channel RS FEC architectures, which has 8 syndrome computation block and shared single KES block. It can reduce the hardware complexity about 30% compared to the conventional 16 channel 3-parallel FEC architecture, which is 4 syndrome computation block and shared single KES block. The proposed RS FEC architecture has been designed and implemented with the $0.18-{\mu}m$ CMOS technology in a supply voltage of 1.8 V. The result show that total number of gate is 250K and it has a data processing rate of 5.1Gb/s at a clock frequency of 400MHz. The proposed area-efficient architecture can be readily applied to the next generation FEC devices for high-speed optical communications as well as wireless communications.

The Study of the Object Replication Management using Adaptive Duplication Object Algorithm (적응적 중복 객체 알고리즘을 이용한 객체 복제본 관리 연구)

  • 박종선;장용철;오수열
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.1
    • /
    • pp.51-59
    • /
    • 2003
  • It is effective to be located in the double nodes in the distributed object replication systems, then object which nodes share is the same contents. The nodes store an access information on their local cache as it access to the system. and then the nodes fetch and use it, when it needed. But with time the coherence Problems will happen because a data carl be updated by other nodes. So keeping the coherence of the system we need a mechanism that we managed the to improve to improve the performance and availability of the system effectively. In this paper to keep coherence in the shared memory condition, we can set the limited parallel performance without the additional cost except the coherence cost using it to keep the object at the proposed adaptive duplication object(ADO) algorithms. Also to minimize the coherence maintenance cost which is the bi99est overhead in the duplication method, we must manage the object effectively for the number of replication and location of the object replica which is the most important points, and then it determines the cos. And that we must study the adaptive duplication object management mechanism which will improve the entire run time.

  • PDF

Strains of abutment and bones on implant overdentures (임플란트 피개의치에서 지대주와 골의 변형률에 관한 연구)

  • Kim, Myung-Seok;Heo, Seong-Joo;Koak, Jai-Young;Kim, Sung-Kyun
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.47 no.2
    • /
    • pp.222-231
    • /
    • 2009
  • Statements of the problem: Over the past decades, conventional complete dentures were used for various patients although they have incomplete function. Overdentures using dental implants could help the improvement of denture function. Purpose: The purpose of this study was to compare the strains of abutment and bone on implant overdenture between splinted and unsplinted type of prosthesis. Additionally, the strain values of parallel placed implant model and unparallel placed implant model were compared. Material and methods: Two acrylic resin model were prepared and two implants were placed at the canine positions in each model. In the first model, two implant were placed parallel. In the second model, two implants were placed with 10 degree labiolingual divergence. Two types of abutment were connected to the fixtures alternatively. One was splint type of Hader bar, the other was unsplint type of ball abutment. Overdentures were fabricated with corresponding attachment systems and seated on abutments. Strains of abutments and labial bone simulants were measured with electric resistance strain gauges when static load from 100 N to 200 N were applied to overdentures. Results: 1. Splinted type of overdentures using bar and clip showed higher absolute strain values. But the strain was compressive and the load was shared by two implants(P<.05). 2. Unsplinted type overdentures using ball and O-ring showed low absolute strain values(P<.05). 3. Labially inclined implant showed higher tensile strain values in unsplinted type of prosthesis than in splinted type of prosthesis. Lingually inclined implant showed rather low strain values under load(P<.05). 4. Non parallel implant model showed higher absolute strain values than parallel placed implant model comprehensively(P<.05).

Cache Performance Analysis of Multiprocessor Systems for OLTP Applications based on a Memory-Resident DBMS (메모리 상주 DBMS 기반의 OLTP 응용을 위한 다중프로세서 시스템 캐쉬 성능 분석)

  • Chung, Yong-Wha;Hahn, Woo-Jong;Yoon, Suk-Han;Park, Jin-Won;Lee, Kang-Woo;Kim, Yang-Woo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.4
    • /
    • pp.383-392
    • /
    • 2000
  • Currently, multiprocessors are evaluated almost exclusively with scientific applications. Commercial applications are rarely explored because it is difficult to obtain the source codes of commercial DBMS. Even when the source code is available, such as for POSTGRES, understanding the source code enough to perform detailed meaningful performance evaluations is a daunting task for computer architects.To evaluate multiprocessors with commercial applications, we have developed our own DBMS, called EZDB. EZDB is a parallelized DBMS, loosely inspired from POSTGRES, and running on top of a software architecture simulator. It is capable of executing parallel programs written in SQL. Contrary to POSTGRES, EZDB is not intended as a prototype for a production-quality DBMS. Its purpose is to easily run and evaluate the performance of commercial applications on multiprocessor architectures. To illustrate the usefulness of EZDB, we showed the cache performance data collected for the TPC-B benchmark on a shared-memory multiprocessor. The simulation results showed that the data structures exhibited unique sharing characteristics and that their locality properties and working sets were very different from those in scientific applications.

  • PDF

Dynamic NAND Operation Scheduling for Flash Storage Controller Systems (플래시 저장장치 컨트롤러 시스템을 위한 동적 낸드 오퍼레이션 스케줄링)

  • Jeong, Jaehyeong;Song, Yong Ho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.6
    • /
    • pp.188-198
    • /
    • 2013
  • In order to increase its performance, NAND flash memory-based storage is composed of data buses that are shared by a number of flash memories and uses a parallel technique that can carry out multiple flash memory operations simultaneously. Since the storage performance is strongly influenced by the performance of each data bus, it is important to improve the utilization of the bus by ensuring effective scheduling of operations by the storage controller. However, this is difficult because of dynamic changes in buses due to the unique characteristics of each operation with different timing, cost, and usage by each bus. Furthermore, the scheduling technique for increasing bus utilization may cause unanticipated operation delay and wastage of storage resource. In this study, we suggest various dynamic operation scheduling techniques that consider data bus performance and storage resource efficiency. The proposed techniques divide each operation into three different stages and schedule each stage depending on the characteristics of the operation and the dynamic status of the data bus. We applied the suggested techniques to the controller and verified them on the FPGA platform, and found that program operation decreased by 1.9% in comparison to that achieved by a static scheduling technique, and bus utilization and throughput was approximately 4-7% and 4-19% higher, respectively.

Dynamic Virtual Ontology using Tags with Semantic Relationship on Social-web to Support Effective Search (효율적 자원 탐색을 위한 소셜 웹 태그들을 이용한 동적 가상 온톨로지 생성 연구)

  • Lee, Hyun Jung;Sohn, Mye
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.19-33
    • /
    • 2013
  • In this research, a proposed Dynamic Virtual Ontology using Tags (DyVOT) supports dynamic search of resources depending on user's requirements using tags from social web driven resources. It is general that the tags are defined by annotations of a series of described words by social users who usually tags social information resources such as web-page, images, u-tube, videos, etc. Therefore, tags are characterized and mirrored by information resources. Therefore, it is possible for tags as meta-data to match into some resources. Consequently, we can extract semantic relationships between tags owing to the dependency of relationships between tags as representatives of resources. However, to do this, there is limitation because there are allophonic synonym and homonym among tags that are usually marked by a series of words. Thus, research related to folksonomies using tags have been applied to classification of words by semantic-based allophonic synonym. In addition, some research are focusing on clustering and/or classification of resources by semantic-based relationships among tags. In spite of, there also is limitation of these research because these are focusing on semantic-based hyper/hypo relationships or clustering among tags without consideration of conceptual associative relationships between classified or clustered groups. It makes difficulty to effective searching resources depending on user requirements. In this research, the proposed DyVOT uses tags and constructs ontologyfor effective search. We assumed that tags are extracted from user requirements, which are used to construct multi sub-ontology as combinations of tags that are composed of a part of the tags or all. In addition, the proposed DyVOT constructs ontology which is based on hierarchical and associative relationships among tags for effective search of a solution. The ontology is composed of static- and dynamic-ontology. The static-ontology defines semantic-based hierarchical hyper/hypo relationships among tags as in (http://semanticcloud.sandra-siegel.de/) with a tree structure. From the static-ontology, the DyVOT extracts multi sub-ontology using multi sub-tag which are constructed by parts of tags. Finally, sub-ontology are constructed by hierarchy paths which contain the sub-tag. To create dynamic-ontology by the proposed DyVOT, it is necessary to define associative relationships among multi sub-ontology that are extracted from hierarchical relationships of static-ontology. The associative relationship is defined by shared resources between tags which are linked by multi sub-ontology. The association is measured by the degree of shared resources that are allocated into the tags of sub-ontology. If the value of association is larger than threshold value, then associative relationship among tags is newly created. The associative relationships are used to merge and construct new hierarchy the multi sub-ontology. To construct dynamic-ontology, it is essential to defined new class which is linked by two more sub-ontology, which is generated by merged tags which are highly associative by proving using shared resources. Thereby, the class is applied to generate new hierarchy with extracted multi sub-ontology to create a dynamic-ontology. The new class is settle down on the ontology. So, the newly created class needs to be belong to the dynamic-ontology. So, the class used to new hyper/hypo hierarchy relationship between the class and tags which are linked to multi sub-ontology. At last, DyVOT is developed by newly defined associative relationships which are extracted from hierarchical relationships among tags. Resources are matched into the DyVOT which narrows down search boundary and shrinks the search paths. Finally, we can create the DyVOT using the newly defined associative relationships. While static data catalog (Dean and Ghemawat, 2004; 2008) statically searches resources depending on user requirements, the proposed DyVOT dynamically searches resources using multi sub-ontology by parallel processing. In this light, the DyVOT supports improvement of correctness and agility of search and decreasing of search effort by reduction of search path.

A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems (방출단층촬영 시스템을 위한 GPU 기반 반복적 기댓값 최대화 재구성 알고리즘 연구)

  • Ha, Woo-Seok;Kim, Soo-Mee;Park, Min-Jae;Lee, Dong-Soo;Lee, Jae-Sung
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.459-467
    • /
    • 2009
  • Purpose: The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Materials and Methods: Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. Results: The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 see, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 see, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. Conclusion: The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries.