• Title/Summary/Keyword: unified memory

Search Result 52, Processing Time 0.058 seconds

Performance Evaluation of the GPU Architecture Executing Parallel Applications (병렬 응용프로그램 실행 시 GPU 구조에 따른 성능 분석)

  • Choi, Hong-Jun;Kim, Cheol-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.10-21
    • /
    • 2012
  • The role of GPU has evolved from graphics-specific processing to general-purpose processing with the development of unified shader core architecture. Especially, execution methods for general-purpose parallel applications using GPU have been researched intensively, since the parallel hardware architecture can be utilized efficiently when the parallel applications are executed. However, current GPU architecture has limitations in executing general-purpose parallel applications, since the GPU is not specialized for general-purpose computing yet. To improve the GPU performance when general-purpose parallel applications are executed, the GPU architecture should be evolved. In this work, we analyze the GPU performance according to the architecture varying the number of cores and clock frequency. Our simulation results show that the GPU performance improves by up to 125.8% and 16.2% as the number of cores increases and the clock frequency increases, respectively. However, note that the improvement of the GPU performance is saturated even though the number of cores increases and the clock frequency increases continuously, since the data cannot be provided to the GPU due to the limit of memory bandwidth. Consequently, to accomplish high performance effectiveness on GPU, computational resources must be more suitably considered.

Prediction of Dormant Customer in the Card Industry (카드산업에서 휴면 고객 예측)

  • DongKyu Lee;Minsoo Shin
    • Journal of Service Research and Studies
    • /
    • v.13 no.2
    • /
    • pp.99-113
    • /
    • 2023
  • In a customer-based industry, customer retention is the competitiveness of a company, and improving customer retention improves the competitiveness of the company. Therefore, accurate prediction and management of potential dormant customers is paramount to increasing the competitiveness of the enterprise. In particular, there are numerous competitors in the domestic card industry, and the government is introducing an automatic closing system for dormant card management. As a result of these social changes, the card industry must focus on better predicting and managing potential dormant cards, and better predicting dormant customers is emerging as an important challenge. In this study, the Recurrent Neural Network (RNN) methodology was used to predict potential dormant customers in the card industry, and in particular, Long-Short Term Memory (LSTM) was used to efficiently learn data for a long time. In addition, to redefine the variables needed to predict dormant customers in the card industry, Unified Theory of Technology (UTAUT), an integrated technology acceptance theory, was applied to redefine and group the variables used in the model. As a result, stable model accuracy and F-1 score were obtained, and Hit-Ratio proved that models using LSTM can produce stable results compared to other algorithms. It was also found that there was no moderating effect of demographic information that could occur in UTAUT, which was pointed out in previous studies. Therefore, among variable selection models using UTAUT, dormant customer prediction models using LSTM are proven to have non-biased stable results. This study revealed that there may be academic contributions to the prediction of dormant customers using LSTM algorithms that can learn well from previously untried time series data. In addition, it is a good example to show that it is possible to respond to customers who are preemptively dormant in terms of customer management because it is predicted at a time difference with the actual dormant capture, and it is expected to contribute greatly to the industry.

A Study of Classification in the Terms of "Biwiron(脾胃論)" (비위론(脾胃論)에 기재된 용어 분류체계에 관한 연구)

  • Chung, Du-Young;Lee, Byung-Wook;Eom, Dong-Myung;Kim, Eun-Ha
    • Journal of Korean Medical classics
    • /
    • v.22 no.1
    • /
    • pp.191-205
    • /
    • 2009
  • Objective : Up to the present, theories of medical books is too difficult to understand thoroughly. However, these study methods have some problems in dealing with lots of meaning because the comprehension of theories are dependent upon one's memory. Especially, comparison distinct medical books are more difficult matter. So, we have attempted to solve a problem. Method : We have researched medical terms in the "Piweilun" according to below the procedure. (1) Making a terms list: We have selected constituent of sentence. And we have made term list on the basis of concept of term. (2) Making a synonym list: We have collected identical conception and made a synonym list. So, using an synonym tables of DB, it is possible to search for the non-standard terms of medical theory. (3) Making a classification system: Using UMLS(Unified Medical Language System), MeSH(Medical Subject Headings), IST(International Standard Terminology) ect., we have made a classification system of oriental medicine terms in the "Piwelun". Analysis of relation between terms. Result : In the "Piweilun", there are more than 1,790s concepts. Parts of those are belonged to UMLS-Semantic Type, the other parts of those are not belonged to UMLS-Semantic Type. And those include predicate more than UMLS-Semantic Relations.

  • PDF

A Framework for Constructing Interactive Tiled Display Applications (인터랙티브 타일드 디스플레이 응용프로그램 개발을 위한 프레임워크)

  • Cho, Yong-Joo;Kim, Seok-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.1
    • /
    • pp.37-44
    • /
    • 2009
  • This paper describes a new tiled display framework called, iTDF (Interactive Tiled Display Framework), that is designed to support rapid construction of the interactive digital 3D contents running on top of the cluster-based tiled display. This framework allows synchronizing the rendering slaves, sharing software's state over the network, the features, such as, launching multiple applications on a cluster-based computers, moving and resizing windows, synchronization of rendering slaves, distributed shared memory, and unified input interface. This paper analyzes the requirements of the framework and describes the design and implementation of the framework. A couple desktop-based applications are ported with the new iTDF and to find out the usefulness and usability of the framework.

Implementation Strategy on Sharing Resources to Organizational Change (학술자원 공동 활용 기반구축사업 개선 방안 연구)

  • Kim, Young-Kee;Park, Sung-Ho;Lee, Soo-Sang
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.2
    • /
    • pp.287-310
    • /
    • 2009
  • This paper strives to shed a new light on current academic resource sharing initiatives of both Korea Research Foundation(KRF) and Korea Science and Engineering Foundation(KOSEF), and seeks the measure to administer information resource effectively for a unified organization, which will launch soon, through the comparison and scrutiny, and analyses of strength and weakness on current projects. It first of all attempts to draw outcomes and suggestions on the basis of issues and implications identified through the exhaustive as-is analyses in the aspects of management, service, and infrastructure of academic resource sharing initiatives carried out by each foundations. The unification of projects and information systems are discussed in two aspects, as viewed each significant measures, that is, 1) an organizational aspect in terms to develop a new academic and research information service through the unification of operation systems related to information service project of each foundation; 2) An Initiative toward the integrated service via the unification of scattered individual unit systems in each foundation.

  • PDF

A Scheme for Push/Pull Buffer Management in the Multimedia Communication Environments (멀티미디어 통신 환경에서 Push/Pull 버퍼 관리 기법)

  • Jeong, Chan-Gyun;Lee, Seung-Ryong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.2S
    • /
    • pp.721-732
    • /
    • 2000
  • Multimedia communication systems require not only high-performance computer hardwares and high-speed networks, but also a buffer management mechanism to process many data efficiently. Two buffer handling methods, Push and Pull, are commonly used. In the Push method, a server controls the flow of dat to a client, while in the Pull method, a client controls the flow of data from a server. Those buffering schemes can be applied to the data transfer between the packet receiving buffer, which receives media data from a network server, and media playout devices, which play the recived media data. However, the buffer management mechanism in client-sides mainly support either one of the Push or the Pull method. Consequently, they have some limitations to support various media playout devices. Futhermore, even though some of them support both methods, it is difficult to use since they can't provide a unified structure. To resolved these problems, in this paper, we propose an efficient and flexible Push/Pull buffer management mechanism at client-side. The proposed buffer management scheme supports both Push and Pull method to provide various media playout devices and to support buffering function to absorb network jitter. The proposed scheme can support the various media playback devices using a single buffer space which in consequence, saves memory space compared to the case that a client keeps tow types of buffers. Moreover, it facilitates the single buffer as a mechanism for the absorbing network jitter effectively and efficiently. The proposed scheme has been implemented in an existing multimedia communication system, so called ISSA (Integrated Streaming Service Architecture), and it shows a good performance result compared to the conventional buffering methods in multimedia communication environments.

  • PDF

Efficient GPU Framework for Adaptive and Continuous Signed Distance Field Construction, and Its Applications

  • Kim, Jong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.63-69
    • /
    • 2022
  • In this paper, we propose a new GPU-based framework for quickly calculating adaptive and continuous SDF(Signed distance fields), and examine cases related to rendering/collision processing using them. The quadtree constructed from the triangle mesh is transferred to the GPU memory, and the Euclidean distance to the triangle is processed in parallel for each thread by using it to find the shortest continuous distance without discontinuity in the adaptive grid space. In this process, it is shown through experiments that the cut-off view of the adaptive distance field, the distance value inquiry at a specific location, real-time raytracing, and collision handling can be performed quickly and efficiently. Using the proposed method, the adaptive sign distance field can be calculated quickly in about 1 second even on a high polygon mesh, so it is a method that can be fully utilized not only for rigid bodies but also for deformable bodies. It shows the stability of the algorithm through various experimental results whether it can accurately sample and represent distance values in various models.

Radix-4 Trellis Parallel Architecture and Trace Back Viterbi Decoder with Backward State Transition Control (Radix-4 트렐리스 병렬구조 및 역방향 상태천이의 제어에 의한 역추적 비터비 디코더)

  • 정차근
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.397-409
    • /
    • 2003
  • This paper describes an implementation of radix-4 trellis parallel architecture and backward state transition control trace back Viterbi decoder, and presents the application results to high speed wireless LAN. The radix-4 parallelized architecture Vietrbi decoder can not only improve the throughput with simple structure, but also have small processing delay time and overhead circuit compared to M-step trellis architecture one. Based on these features, this paper addresses a novel Viterbi decoder which is composed of branch metric computation, architecture of ACS and trace back decoding by sequential control of backward state transition for the implementation of radix-4 trellis parallelized structure. With the proposed architecture, the decoding of variable code rate due to puncturing the base code can easily be implemented by the unified Viterbi decoder. Moreover, any additional circuit and/or peripheral control logic are not required in the proposed decoder architecture. The trace back decoding scheme with backward state transition control can carry out the sequential decoding according to ACS cycle clock without additional circuit for survivor memory control. In order to evaluate the usefulness, the proposed method is applied to channel CODEC of the IEEE 802.11a high speed wireless LAN, and HDL coding simulation results are presented.

Massive Fluid Simulation Using a Responsive Interaction Between Surface and Wave Foams (수면거품과 웨이브거품의 미세한 상호작용을 이용한 대규모 유체 시뮬레이션)

  • Kim, Jong-Hyun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.2
    • /
    • pp.29-39
    • /
    • 2017
  • This paper presents a unified framework to efficiently and realistically simulate surface and wave foams. The framework is designed to first project 3D water particles from an underlying water solver onto 2D screen space in order to reduce the computational complexity of determining where foam particles should be generated. Because foam effects are often created primarily in fast and complicated water flows, we analyze the acceleration and curvature values to identify the areas exhibiting such flow patterns. Foam particles are emitted from the identified areas in 3D space, and each foam particle is advected according to its type, which is classified on the basis of velocity, thereby capturing the essential characteristics of foam wave motions. We improve the realism of the resulting foam by classifying it into two types: surface foam and wave foam. Wave foam is characterized by the sharp wave patterns of torrential flow s, and surface foam is characterized by a cloudy foam shape even in water with reduced motion. Based on these features, we propose a technique to correct the velocity and position of a foam particle. In addition, we propose a kernel technique using the screen space density to efficiently reduce redundant foam particles, resulting in improved overall memory efficiency without loss of visual detail in terms of foam effects. Experiments convincingly demonstrate that the proposed approach is efficient and easy to use while delivering high-quality results.

A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems (방출단층촬영 시스템을 위한 GPU 기반 반복적 기댓값 최대화 재구성 알고리즘 연구)

  • Ha, Woo-Seok;Kim, Soo-Mee;Park, Min-Jae;Lee, Dong-Soo;Lee, Jae-Sung
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.459-467
    • /
    • 2009
  • Purpose: The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Materials and Methods: Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. Results: The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 see, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 see, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. Conclusion: The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries.