• Title/Summary/Keyword: 고성능 컴퓨팅

Search Result 289, Processing Time 0.028 seconds

A Study on Scalability of Profiling Method Based on Hardware Performance Counter for Optimal Execution of Supercomputer (슈퍼컴퓨터 최적 실행 지원을 위한 하드웨어 성능 카운터 기반 프로파일링 기법의 확장성 연구)

  • Choi, Jieun;Park, Guenchul;Rho, Seungwoo;Park, Chan-Yeol
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.10
    • /
    • pp.221-230
    • /
    • 2020
  • Supercomputer that shares limited resources to multiple users needs a way to optimize the execution of application. For this, it is useful for system administrators to get prior information and hint about the applications to be executed. In most high-performance computing system operations, system administrators strive to increase system productivity by receiving information about execution duration and resource requirements from users when executing tasks. They are also using profiling techniques that generates the necessary information using statistics such as system usage to increase system utilization. In a previous study, we have proposed a scheduling optimization technique by developing a hardware performance counter-based profiling technique that enables characterization of applications without further understanding of the source code. In this paper, we constructed a profiling testbed cluster to support optimal execution of the supercomputer and experimented with the scalability of the profiling method to analyze application characteristics in the built cluster environment. Also, we experimented that the profiling method can be utilized in actual scheduling optimization with scalability even if the application class is reduced or the number of nodes for profiling is minimized. Even though the number of nodes used for profiling was reduced to 1/4, the execution time of the application increased by 1.08% compared to profiling using all nodes, and the scheduling optimization performance improved by up to 37% compared to sequential execution. In addition, profiling by reducing the size of the problem resulted in a quarter of the cost of collecting profiling data and a performance improvement of up to 35%.

A Contextual Information and Physics-based Mobile Augmented Reality Contents Manipulation Method (맥락 정보와 물리적 속성 부여가 가능한 모바일 증강 현실 콘텐츠 조작 방법)

  • Hong, Dong-Pyo;Lee, Jeong-Gyu;Chae, Chang-Hun;Lee, Jong-Weon;Ko, Kwang-Hee;Woo, Woon-Taek
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.526-530
    • /
    • 2009
  • In this paper, we propose a contextual information and physics-based contents manipulation method for a mobile augmented reality authoring system. Due to proliferation of ubiquitous computing in information technology(IT) and advances in sensor technology and mobile devices, AR systems that were only possible in PC can be now feasible on mobile devices. In addition, many AR systems have been proposed that utilize sensory data and reflect them into. Thus, the proposed method provides appropriate visual cues for 3D manipulations of the augmented contents. In addition, uses can manipulate the augmented contents with sensory information through the assignment of sensors to the contents. Moreover, it supports not only a physics-based contents loader that enables users to specify physics properties into the contents, but also the transform matrix between AR and physics engine coordinates. To show the feasibility of the proposed method, we implemented a mobile augmented reality authoring system. We believe that the proposed method can be a key factor for context-aware mobile AR authoring system.

  • PDF

FAST : A Log Buffer Scheme with Fully Associative Sector Translation for Efficient FTL in Flash Memory (FAST :플래시 메모리 FTL을 위한 완전연관섹터변환에 기반한 로그 버퍼 기법)

  • Park Dong-Joo;Choi Won-Kyung;Lee Sang-Won
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.205-214
    • /
    • 2005
  • Flash memory is at high speed used as storage of personal information utilities, ubiquitous computing environments, mobile phones, electronic goods, etc. This is because flash memory has the characteristics of low electronic power, non-volatile storage, high performance, physical stability, portability, and so on. However, differently from hard disks, it has a weak point that overwrites on already written block of flash memory is impossible to be done. In order to make an overwrite possible, an erase operation on the written block should be performed before the overwrite, which lowers the performance of flash memory highly. In order to solve this problem the flash memory controller maintains a system software module called the flash translation layer(FTL). Of many proposed FTL schemes, the log block buffer scheme is best known so far. This scheme uses a small number of log blocks of flash memory as a write buffer, which reduces the number of erase operations by overwrites, leading to good performance. However, this scheme shows a weakness of low page usability of log blocks. In this paper, we propose an enhanced log block buffer scheme, FAST(Full Associative Sector Translation), which improves the page usability of each log block by fully associating sectors to be written by overwrites to the entire log blocks. We also show that our FAST scheme outperforms the log block buffer scheme.

An Integrated Access Control for Sharing of E-Science Grid Resources (유휴 멀티 e-Science 그리드 자원 공유를 위한 통합 자원 접근 제어)

  • Jung, Im-Y.;Jung, Eun-Jin;Yeom, Heon-Y.
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.9_10
    • /
    • pp.452-465
    • /
    • 2008
  • This paper proposes a light-weight, seamless integrated access control for global e-Science resource sharing. E-Science, based on Grid Computing, was designed to help scientists to remotely control and process the Grid resources such as high-end equipments and remote machines. As many researchers engage in the e-Science Grids, the researchers in a grid often have to wait for or give up use of the Grid resources, even when there are idle resources in other Grids. In this case, provided that proper compensation is given, Grid resource sharing is helpful both for the researchers and the Grids which provide their resources. But, sharing Grid resources globally is not simple, as each e-Science Grid is especially designed for resource sharing in its Virtual Organization(VO) and already has its unique access control policy for its resources. This paper proposes a new integrated access control for e-Science Grid resource sharing. The access control is light-weight without any priori service level agreement(SLA)s among the Grids which share their resources and seamless because the users can use the resources shared as the ones belonging to their Grids without their additional registration to the other Grids.

An Efficient Buffer Page Replacement Strategy for System Software on Flash Memory (플래시 메모리상에서 시스템 소프트웨어의 효율적인 버퍼 페이지 교체 기법)

  • Park, Jong-Min;Park, Dong-Joo
    • Journal of KIISE:Databases
    • /
    • v.34 no.2
    • /
    • pp.133-140
    • /
    • 2007
  • Flash memory has penetrated our life in various forms. For example, flash memory is important storage component of ubiquitous computing or mobile products such as cell phone, MP3 player, PDA, and portable storage kits. Behind of the wide acceptance as memory is many advantages of flash memory: for instances, low power consumption, nonvolatile, stability and portability. In addition to mentioned strengths, the recent development of gigabyte range capacity flash memory makes a careful prediction that the flash memory might replace some of storage area dominated by hard disks. In order to have overwriting function, one block must be erased before overwriting is performed. This difference results in the cost of reading, writing and erasing in flash memory[1][5][6]. Since this difference has not been considered in traditional buffer replacement technologies adopted in system software such as OS and DBMS, a new buffer replacement strategy becomes necessary. In this paper, a new buffer replacement strategy, reflecting difference I/O cost and applicable to flash memory, suggest and compares with other buffer replacement strategies using workloads as Zipfian distribution and real data.

Odysseus/m: a High-Performance ORDBMS Tightly-Coupled with IR Features (오디세우스/IR: 정보 검색 기능과 밀결합된 고성능 객체 관계형 DBMS)

  • Whang Kyu-Young;Lee Min-Jae;Lee Jae-Gil;Kim Min-Soo;Han Wook-Shin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.3
    • /
    • pp.209-215
    • /
    • 2005
  • Conventional ORDBMS vendors provide extension mechanisms for adding user-defined types and functions to their own DBMSs. Here, the extension mechanisms are implemented using a high-level interface. We call this technique loose-coupling. The advantage of loose-coupling is that it is easy to implement. However, it is not preferable for implementing new data types and operations in large databases when high Performance is required. In this paper, we propose to use the notion of tight-coupling to satisfy this requirement. In tight-coupling, new data types and operations are integrated into the core of the DBMS engine. Thus, they are supported in a consistent manner with high performance. This tight-coupling architecture is being used to incorporate information retrieval(IR) features and spatial database features into the Odysseus/IR ORDBMS that has been under development at KAIST/AITrc. In this paper, we introduce Odysseus/IR and explain its tightly-coupled IR features (U.S. patented). We then demonstrate a web search engine that is capable of managing 20 million web pages in a non-parallel configuration using Odysseus/IR.

A Hardware Design Space Exploration toward Low-Area and High-Performance Architecture for the 128-bit Block Cipher Algorithm SEED (128-비트 블록 암호화 알고리즘 SEED의 저면적 고성능 하드웨어 구조를 위한 하드웨어 설계 공간 탐색)

  • Yi, Kang
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.4
    • /
    • pp.231-239
    • /
    • 2007
  • This paper presents the trade-off relationship between area and performance in the hardware design space exploration for the Korean national standard 128-bit block cipher algorithm SEED. In this paper, we compare the following four hardware design types of SEED algorithm : (1) Design 1 that is 16 round fully pipelining approach, (2) Design 2 that is a one round looping approach, (3) Design 3 that is a G function sharing and looping approach, and (4) Design 4 that is one round with internal 3 stage pipelining approach. The Design 1, Design 2, and Design 3 are the existing design approaches while the Design 4 is the newly proposed design in this paper. Our new design employs the pipeline between three G-functions and adders consisting of a F function, which results in the less area requirement than Design 2 and achieves the higher performance than Design 2 and Design 3 due to pipelining and module sharing techniques. We design and implement all the comparing four approaches with real hardware targeting FPGA for the purpose of exact performance and area analysis. The experimental results show that Design 4 has the highest performance except Design 1 which pursues very aggressive parallelism at the expanse of area. Our proposed design (Design 4) shows the best throughput/area ratio among all the alternatives by 2.8 times. Therefore, our new design for SEED is the most efficient design comparing with the existing designs.

Real-Time GPU Task Monitoring and Node List Management Techniques for Container Deployment in a Cluster-Based Container Environment (클러스터 기반 컨테이너 환경에서 실시간 GPU 작업 모니터링 및 컨테이너 배치를 위한 노드 리스트 관리기법)

  • Jihun, Kang;Joon-Min, Gil
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.11
    • /
    • pp.381-394
    • /
    • 2022
  • Recently, due to the personalization and customization of data, Internet-based services have increased requirements for real-time processing, such as real-time AI inference and data analysis, which must be handled immediately according to the user's situation or requirement. Real-time tasks have a set deadline from the start of each task to the return of the results, and the guarantee of the deadline is directly linked to the quality of the services. However, traditional container systems are limited in operating real-time tasks because they do not provide the ability to allocate and manage deadlines for tasks executed in containers. In addition, tasks such as AI inference and data analysis basically utilize graphical processing units (GPU), which typically have performance impacts on each other because performance isolation is not provided between containers. And the resource usage of the node alone cannot determine the deadline guarantee rate of each container or whether to deploy a new real-time container. In this paper, we propose a monitoring technique for tracking and managing the execution status of deadlines and real-time GPU tasks in containers to support real-time processing of GPU tasks running on containers, and a node list management technique for container placement on appropriate nodes to ensure deadlines. Furthermore, we demonstrate from experiments that the proposed technique has a very small impact on the system.

The Effect of Domain Specificity on the Performance of Domain-Specific Pre-Trained Language Models (도메인 특수성이 도메인 특화 사전학습 언어모델의 성능에 미치는 영향)

  • Han, Minah;Kim, Younha;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.251-273
    • /
    • 2022
  • Recently, research on applying text analysis to deep learning has steadily continued. In particular, researches have been actively conducted to understand the meaning of words and perform tasks such as summarization and sentiment classification through a pre-trained language model that learns large datasets. However, existing pre-trained language models show limitations in that they do not understand specific domains well. Therefore, in recent years, the flow of research has shifted toward creating a language model specialized for a particular domain. Domain-specific pre-trained language models allow the model to understand the knowledge of a particular domain better and reveal performance improvements on various tasks in the field. However, domain-specific further pre-training is expensive to acquire corpus data of the target domain. Furthermore, many cases have reported that performance improvement after further pre-training is insignificant in some domains. As such, it is difficult to decide to develop a domain-specific pre-trained language model, while it is not clear whether the performance will be improved dramatically. In this paper, we present a way to proactively check the expected performance improvement by further pre-training in a domain before actually performing further pre-training. Specifically, after selecting three domains, we measured the increase in classification accuracy through further pre-training in each domain. We also developed and presented new indicators to estimate the specificity of the domain based on the normalized frequency of the keywords used in each domain. Finally, we conducted classification using a pre-trained language model and a domain-specific pre-trained language model of three domains. As a result, we confirmed that the higher the domain specificity index, the higher the performance improvement through further pre-training.