Search | Korea Science

Performance Analysis and Identifying Characteristics of Processing-in-Memory System with Polyhedral Benchmark Suite (프로세싱 인 메모리 시스템에서의 PolyBench 구동에 대한 동작 성능 및 특성 분석과 고찰)

Jeonggeun Kim
- Journal of the Semiconductor & Display Technology
- /
- v.22 no.3
- /
- pp.142-148
- /
- 2023
In this paper, we identify performance issues in executing compute kernels from PolyBench, which includes compute kernels that are the core computational units of various data-intensive workloads, such as deep learning and data-intensive applications, on Processing-in-Memory (PIM) devices. Therefore, using our in-house simulator, we measured and compared the various performance metrics of workloads based on traditional out-of-order and in-order processors with Processing-in-Memory-based systems. As a result, the PIM-based system improves performance compared to other computing models due to the short-term data reuse characteristic of computational kernels from PolyBench. However, some kernels perform poorly in PIM-based systems without a multi-layer cache hierarchy due to some kernel's long-term data reuse characteristics. Hence, our evaluation and analysis results suggest that further research should consider dynamic and workload pattern adaptive approaches to overcome performance degradation from computational kernels with long-term data reuse characteristics and hidden data locality.
PDF

A Memory Intensive Real-time 3x3 Neighborhood processor for Image Processing (Memory Intensive 실시간 영상신호처리용 3 $\times$ 3 Neighborhood VLSI 처리기)

김진홍;남철우;우성일;김용태
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.27 no.6
- /
- pp.963-971
- /
- 1990
This paper proposes a memory intensive VLSI architecture for the realization of real-time 3x3 neighborhood processor based on the distributed arithmetic. The proposed architecture is characterized by a bit serial and multi-kernel parallel processing which exploits the pixel kernel parallelism and concurrency. The chip implements 8 neighborhood processing elements in parallel with efficirnt input and output modules which operate concurrently. Besides the a4chitectural design of a neighborhood processor, the design methodology using module generator concept has been considered and MOGOT(MOdule Generator Oriented VLSI design Tool) has been constructed based on the workstation. Based on these design environments MOGOT, it has been shown that the main part of the suggested architecture can be designed efficiently using 2\ulcorner double metal CMOS technology. It includes design of input delay and data conversion module, look-up table for inner product operation, carry save accumulator, output data converter and delay module, and control module.
PDF

Range Segmentation of Dynamic Offloading (RSDO) Algorithm by Correlation for Edge Computing

Kang, Jieun;Kim, Svetlana;Kim, Jae-Ho;Sung, Nak-Myoung;Yoon, Yong-Ik
- Journal of Information Processing Systems
- /
- v.17 no.5
- /
- pp.905-917
- /
- 2021
In recent years, edge computing technology consists of several Internet of Things (IoT) devices with embedded sensors that have improved significantly for monitoring, detection, and management in an environment where big data is commercialized. The main focus of edge computing is data optimization or task offloading due to data and task-intensive application development. However, existing offloading approaches do not consider correlations and associations between data and tasks involving edge computing. The extent of collaborative offloading segmented without considering the interaction between data and task can lead to data loss and delays when moving from edge to edge. This article proposes a range segmentation of dynamic offloading (RSDO) algorithm that isolates the offload range and collaborative edge node around the edge node function to address the offloading issue.The RSDO algorithm groups highly correlated data and tasks according to the cause of the overload and dynamically distributes offloading ranges according to the state of cooperating nodes. The segmentation improves the overall performance of edge nodes, balances edge computing, and solves data loss and average latency.
https://doi.org/10.3745/JIPS.01.0080 인용 PDF KSCI

On the Performance of Oracle Grid Engine Queuing System for Computing Intensive Applications

Kolici, Vladi;Herrero, Albert;Xhafa, Fatos
- Journal of Information Processing Systems
- /
- v.10 no.4
- /
- pp.491-502
- /
- 2014
In this paper we present some research results on computing intensive applications using modern high performance architectures and from the perspective of high computational needs. Computing intensive applications are an important family of applications in distributed computing domain. They have been object of study using different distributed computing paradigms and infrastructures. Such applications distinguish for their demanding needs for CPU computing, independently of the amount of data associated with the problem instance. Among computing intensive applications, there are applications based on simulations, aiming to maximize system resources for processing large computations for simulation. In this research work, we consider an application that simulates scheduling and resource allocation in a Grid computing system using Genetic Algorithms. In such application, a rather large number of simulations is needed to extract meaningful statistical results about the behavior of the simulation results. We study the performance of Oracle Grid Engine for such application running in a Cluster of high computing capacities. Several scenarios were generated to measure the response time and queuing time under different workloads and number of nodes in the cluster.
https://doi.org/10.3745/JIPS.01.0004 인용 PDF KSCI

PhysioCover: Recovering the Missing Values in Physiological Data of Intensive Care Units

Kim, Sun-Hee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Lee, Guee-Sang
- International Journal of Contents
- /
- v.10 no.2
- /
- pp.47-58
- /
- 2014
Physiological signals provide important clues in the diagnosis and prediction of disease. Analyzing these signals is important in health and medicine. In particular, data preprocessing for physiological signal analysis is a vital issue because missing values, noise, and outliers may degrade the analysis performance. In this paper, we propose PhysioCover, a system that can recover missing values of physiological signals that were monitored in real time. PhysioCover integrates a gradual method and EM-based Principle Component Analysis (PCA). This approach can (1) more readily recover long- and short-term missing data than existing methods, such as traditional EM-based PCA, linear interpolation, 5-average and Missing Value Singular Value Decomposition (MSVD), (2) more effectively detect hidden variables than PCA and Independent component analysis (ICA), and (3) offer fast computation time through real-time processing. Experimental results with the physiological data of an intensive care unit show that the proposed method assigns more accurate missing values than previous methods.
https://doi.org/10.5392/IJoC.2014.10.2.047 인용 PDF KSCI KPUBS HTML

A Light-weight, Adaptive, Reliable Processing Integrity Audit for e-Science Grid (e-Science 그리드를 위한 가볍고, 적응성있고, 신뢰성있는 처리 무결성 감사)

Jung, Im-Young;Jung, Eun-Jin
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.18 no.5
- /
- pp.181-188
- /
- 2008
E-Science Grid is designed to cope with computation-intensive tasks and to manage a huge volume of science data efficiently. However, certain tasks may involve more than one grid can offer in computation capability or incur a long wait time on other tasks. Resource sharing among Grids can solve this problem with proper processing-integrity check via audit. Due to their computing-intensive nature, the processing time of e-Science tasks tends to be long. This potential long wait before an audit failure encourages earlier audit mechanism during execution in order both to prevent resource waste and to detect any problem fast. In this paper, we propose a Light-weight, Adaptive and Reliable Audit, LARA, of processing Integrity for e-Science applications. With the LARA scheme. researchers can verify their processing earlier and fast.
https://doi.org/10.13089/JKIISC.2008.18.5.181 인용 PDF KSCI HTML

Trends in Compute Express Link(CXL) Technology (CXL 인터커넥트 기술 연구개발 동향)

S.Y. Kim;H.Y. Ahn;Y.M. Park;W.J. Han
- Electronics and Telecommunications Trends
- /
- v.38 no.5
- /
- pp.23-33
- /
- 2023
With the widespread demand from data-intensive tasks such as machine learning and large-scale databases, the amount of data processed in modern computing systems is increasing exponentially. Such data-intensive tasks require large amounts of memory to rapidly process and analyze massive data. However, existing computing system architectures face challenges when building large-scale memory owing to various structural issues such as CPU specifications. Moreover, large-scale memory may cause problems including memory overprovisioning. The Compute Express Link (CXL) allows computing nodes to use large amounts of memory while mitigating related problems. Hence, CXL is attracting great attention in industry and academia. We describe the overarching concepts underlying CXL and explore recent research trends in this technology.
https://doi.org/10.22648/ETRI.2023.J.380503 인용 PDF

Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data

Kai, Cheng;Keisuke, Abe
- Journal of Information Processing Systems
- /
- v.19 no.1
- /
- pp.1-16
- /
- 2023
Synthetic data generation is generally used in performance evaluation and function tests in data-intensive applications, as well as in various areas of data analytics, such as privacy-preserving data publishing (PPDP) and statistical disclosure limit/control. A significant amount of research has been conducted on tools and languages for data generation. However, existing tools and languages have been developed for specific purposes and are unsuitable for other domains. In this article, we propose a regular expression-based data generation language (DGL) for flexible big data generation. To achieve a general-purpose and powerful DGL, we enhanced the standard regular expressions to support the data domain, type/format inference, sequence and random generation, probability distributions, and resource reference. To efficiently implement the proposed language, we propose caching techniques for both the intermediate and database queries. We evaluated the proposed improvement experimentally.
https://doi.org/10.3745/JIPS.04.0262 인용 PDF

Feasibility Study of a Distributed and Parallel Environment for Implementing the Standard Version of AAM Model

Naoui, Moulkheir;Mahmoudi, Said;Belalem, Ghalem
- Journal of Information Processing Systems
- /
- v.12 no.1
- /
- pp.149-168
- /
- 2016
The Active Appearance Model (AAM) is a class of deformable models, which, in the segmentation process, integrates the priori knowledge on the shape and the texture and deformation of the structures studied. This model in its sequential form is computationally intensive and operates on large data sets. This paper presents another framework to implement the standard version of the AAM model. We suggest a distributed and parallel approach justified by the characteristics of the model and their potentialities. We introduce a schema for the representation of the overall model and we study of operations that can be parallelized. This approach is intended to exploit the benefits build in the area of advanced image processing.
https://doi.org/10.3745/JIPS.02.0039 인용 PDF KSCI

Delay and Energy Efficient Data Aggregation in Wireless Sensor Networks

Le, Huu Nghia;Choe, Junseong;Shon, Minhan;Choo, Hyunseung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.04a
- /
- pp.607-608
- /
- 2012
Data aggregation is a fundamental problem in wireless sensor networks which attracts great attention in recent years. Delay and energy efficiencies are two crucial issues of designing a data aggregation scheme. In this paper, we propose a distributed, energy efficient algorithm for collecting data from all sensor nodes with the minimum latency called Delay-aware Power-efficient Data Aggregation algorithm (DPDA). The DPDA algorithm minimizes the latency in data collection process by building a time efficient data aggregation network structure. It also saves sensor energy by decreasing node transmission distances. Energy is also well-balanced between sensors to achieve acceptable network lifetime. From intensive experiments, the DPDA scheme could significantly decrease the data collection latency and obtain reasonable network lifetime compared with other approaches.
https://doi.org/10.3745/PKIPS.y2012m04a.607 인용 PDF

Search Result 131, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)