• Title/Summary/Keyword: Efficient Memory

Search Result 1,330, Processing Time 0.021 seconds

A Study on Efficient Natural Language Processing Method based on Transformer (트랜스포머 기반 효율적인 자연어 처리 방안 연구)

  • Seung-Cheol Lim;Sung-Gu Youn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.4
    • /
    • pp.115-119
    • /
    • 2023
  • The natural language processing models used in current artificial intelligence are huge, causing various difficulties in processing and analyzing data in real time. In order to solve these difficulties, we proposed a method to improve the efficiency of processing by using less memory and checked the performance of the proposed model. The technique applied in this paper to evaluate the performance of the proposed model is to divide the large corpus by adjusting the number of attention heads and embedding size of the BERT[1] model to be small, and the results are calculated by averaging the output values of each forward. In this process, a random offset was assigned to the sentences at every epoch to provide diversity in the input data. The model was then fine-tuned for classification. We found that the split processing model was about 12% less accurate than the unsplit model, but the number of parameters in the model was reduced by 56%.

Booting Process Profiling Tool for Baseboard Management Controllers (베이스보드 매니지먼트 컨트롤러를 위한 부팅 과정 프로파일링 도구)

  • Jaeseop Kim;Minho Park;Jiman Hong
    • Smart Media Journal
    • /
    • v.11 no.11
    • /
    • pp.84-91
    • /
    • 2022
  • Baseboard Management Controller(BMC) supports server monitoring, maintenance, and control functions using various communication interfaces. However, if an unexpected problem occurs during the device driver initialization process, the BMC may not operate normally. Therefore, a boot process profiling tool that accurately analyzes the device driver initialization process and provides a function to check the analysis result is essential. Existing boot process profiling tools do not specifically provide the device driver initialization process and results required for BMC boot process analysis, forcing developers to use a combination of tools to analyze the boot process in detail. In this paper, we propose an integrated profiling tool for BMC's booting process. The proposed tool provides device driver initialization process analysis, CPU and memory usage analysis, and kernel version management functions. Users can easily analyze the booting process using the proposed tool, and the analysis result can be used to shorten the booting time. Also, the proposed tool is implemented in Linux-based BMC, and it is shown that the proposed tool is more efficient than the existing profiling tool.

Federated Deep Reinforcement Learning Based on Privacy Preserving for Industrial Internet of Things (산업용 사물 인터넷을 위한 프라이버시 보존 연합학습 기반 심층 강화학습 모델)

  • Chae-Rim Han;Sun-Jin Lee;Il-Gu Lee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.1055-1065
    • /
    • 2023
  • Recently, various studies using deep reinforcement learning (deep RL) technology have been conducted to solve complex problems using big data collected at industrial internet of things. Deep RL uses reinforcement learning"s trial-and-error algorithms and cumulative compensation functions to generate and learn its own data and quickly explore neural network structures and parameter decisions. However, studies so far have shown that the larger the size of the learning data is, the higher are the memory usage and search time, and the lower is the accuracy. In this study, model-agnostic learning for efficient federated deep RL was utilized to solve privacy invasion by increasing robustness as 55.9% and achieve 97.8% accuracy, an improvement of 5.5% compared with the comparative optimization-based meta learning models, and to reduce the delay time by 28.9% on average.

MAGICal Synthesis: Memory-Efficient Approach for Generative Semiconductor Package Image Construction (MAGICal Synthesis: 반도체 패키지 이미지 생성을 위한 메모리 효율적 접근법)

  • Yunbin Chang;Wonyong Choi;Keejun Han
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.30 no.4
    • /
    • pp.69-78
    • /
    • 2023
  • With the rapid growth of artificial intelligence, the demand for semiconductors is enormously increasing everywhere. To ensure the manufacturing quality and quantity simultaneously, the importance of automatic defect detection during the packaging process has been re-visited by adapting various deep learning-based methodologies into automatic packaging defect inspection. Deep learning (DL) models require a large amount of data for training, but due to the nature of the semiconductor industry where security is important, sharing and labeling of relevant data is challenging, making it difficult for model training. In this study, we propose a new framework for securing sufficient data for DL models with fewer computing resources through a divide-and-conquer approach. The proposed method divides high-resolution images into pre-defined sub-regions and assigns conditional labels to each region, then trains individual sub-regions and boundaries with boundary loss inducing the globally coherent and seamless images. Afterwards, full-size image is reconstructed by combining divided sub-regions. The experimental results show that the images obtained through this research have high efficiency, consistency, quality, and generality.

Cycle Detection of Discrete Logarithm using an Array (배열을 이용한 이산대수의 사이클 검출)

  • Sang-Un Lee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.5
    • /
    • pp.15-20
    • /
    • 2023
  • Until now, Pollard's Rho algorithm has been known as the most efficient way for discrete algebraic problems to decrypt symmetric keys. However, the algorithm is being studied on how to further reduce the complexity of O(${\sqrt{p}}$) performance, along with the disadvantage of having to store the giant stride m=⌈${\sqrt{p}}$⌉ data. This paper proposes an array method for cycle detection in discrete logarithms. The proposed method reduces the number of updates of stack memory by at least 73%. This is done by only updating the array when (xi<0.5xi-1)∩(xi<0.5(p-1)). The proposed array method undergoes the same number of modular calculation as stack method, but significantly reduces the number of updates and the execution time for array through the use of a binary search method.

The Analysis of Efficient Disk Buffer Management Policies to Develop Undesignated Cultural Heritage Management and Real-time Theft Chase (실시간 비지정 문화재 관리 및 도난 추적 시스템 개발을 위한 효율적인 디스크 버퍼 관리 정책 분석)

  • Jun-Hyeong Choi;Sang-Ho Hwang;SeungMan Chun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1299-1306
    • /
    • 2023
  • In this paper, we present a system for undesignated cultural heritage management and real-time theft chase, which uses flash-based large-capacity storage. The proposed system is composed of 3 parts, such as a cultural management device, a flash-based server, and a monitoring service for managing cultural heritages and chasing thefts using IoT technologies. However flash-based storage needs methods to overcome the limited lifespan. Therefore, in this paper, we present a system, which uses the disk buffer in flash-based storage to overcome the disadvantage, and evaluate the system performance in various environments. In our experiments, LRU policy shows the number of direct writes in the flash-based storage by 10.7% on average compared with CLOCK and FCFS.

Enhancing Alzheimer's Disease Classification using 3D Convolutional Neural Network and Multilayer Perceptron Model with Attention Network

  • Enoch A. Frimpong;Zhiguang Qin;Regina E. Turkson;Bernard M. Cobbinah;Edward Y. Baagyere;Edwin K. Tenagyei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.11
    • /
    • pp.2924-2944
    • /
    • 2023
  • Alzheimer's disease (AD) is a neurological condition that is recognized as one of the primary causes of memory loss. AD currently has no cure. Therefore, the need to develop an efficient model with high precision for timely detection of the disease is very essential. When AD is detected early, treatment would be most likely successful. The most often utilized indicators for AD identification are the Mini-mental state examination (MMSE), and the clinical dementia. However, the use of these indicators as ground truth marking could be imprecise for AD detection. Researchers have proposed several computer-aided frameworks and lately, the supervised model is mostly used. In this study, we propose a novel 3D Convolutional Neural Network Multilayer Perceptron (3D CNN-MLP) based model for AD classification. The model uses Attention Mechanism to automatically extract relevant features from Magnetic Resonance Images (MRI) to generate probability maps which serves as input for the MLP classifier. Three MRI scan categories were considered, thus AD dementia patients, Mild Cognitive Impairment patients (MCI), and Normal Control (NC) or healthy patients. The performance of the model is assessed by comparing basic CNN, VGG16, DenseNet models, and other state of the art works. The models were adjusted to fit the 3D images before the comparison was done. Our model exhibited excellent classification performance, with an accuracy of 91.27% for AD and NC, 80.85% for MCI and NC, and 87.34% for AD and MCI.

Strategic construction of mRNA vaccine derived from conserved and experimentally validated epitopes of avian influenza type A virus: a reverse vaccinology approach

  • Leana Rich Herrera-Ong
    • Clinical and Experimental Vaccine Research
    • /
    • v.12 no.2
    • /
    • pp.156-171
    • /
    • 2023
  • Purpose: The development of vaccines that confer protection against multiple avian influenza A (AIA) virus strains is necessary to prevent the emergence of highly infectious strains that may result in more severe outbreaks. Thus, this study applied reverse vaccinology approach in strategically constructing messenger RNA (mRNA) vaccine construct against avian influenza A (mVAIA) to induce cross-protection while targeting diverse AIA virulence factors. Materials and Methods: Immunoinformatics tools and databases were utilized to identify conserved experimentally validated AIA epitopes. CD8+ epitopes were docked with dominant chicken major histocompatibility complexes (MHCs) to evaluate complex formation. Conserved epitopes were adjoined in the optimized mVAIA sequence for efficient expression in Gallus gallus. Signal sequence for targeted secretory expression was included. Physicochemical properties, antigenicity, toxicity, and potential cross-reactivity were assessed. The tertiary structure of its protein sequence was modeled and validated in silico to investigate the accessibility of adjoined B-cell epitope. Potential immune responses were also simulated in C-ImmSim. Results: Eighteen experimentally validated epitopes were found conserved (Shannon index <2.0) in the study. These include one B-cell (SLLTEVETPIRNEWGCR) and 17 CD8+ epitopes, adjoined in a single mRNA construct. The CD8+ epitopes docked favorably with MHC peptidebinding groove, which were further supported by the acceptable ∆Gbind (-28.45 to -40.59 kJ/mol) and Kd (<1.00) values. The incorporated Sec/SPI (secretory/signal peptidase I) cleavage site was also recognized with a high probability (0.964814). Adjoined B-cell epitope was found within the disordered and accessible regions of the vaccine. Immune simulation results projected cytokine production, lymphocyte activation, and memory cell generation after the 1st dose of mVAIA. Conclusion: Results suggest that mVAIA possesses stability, safety, and immunogenicity. In vitro and in vivo confirmation in subsequent studies are anticipated.

An Efficient Matrix Multiplier Available in Multi-Head Attention and Feed-Forward Network of Transformer Algorithms (트랜스포머 알고리즘의 멀티 헤드 어텐션과 피드포워드 네트워크에서 활용 가능한 효율적인 행렬 곱셈기)

  • Seok-Woo Chang;Dong-Sun Kim
    • Journal of IKEEE
    • /
    • v.28 no.1
    • /
    • pp.53-64
    • /
    • 2024
  • With the advancement of NLP(Natural Language Processing) models, conversational AI such as ChatGPT is becoming increasingly popular. To enhance processing speed and reduce power consumption, it is important to implement the Transformer algorithm, which forms the basis of the latest natural language processing models, in hardware. In particular, the multi-head attention and feed-forward network, which analyze the relationships between different words in a sentence through matrix multiplication, are the most computationally intensive core algorithms in the Transformer. In this paper, we propose a new variable systolic array based on the number of input words to enhance matrix multiplication speed. Quantization maintains Transformer accuracy, boosting memory efficiency and speed. For evaluation purposes, this paper verifies the clock cycles required in multi-head attention and feed-forward network and compares the performance with other multipliers.

An Efficient Array Algorithm for VLSI Implementation of Vector-radix 2-D Fast Discrete Cosine Transform (Vector-radix 2차원 고속 DCT의 VLSI 구현을 위한 효율적인 어레이 알고리듬)

  • 신경욱;전흥우;강용섬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1970-1982
    • /
    • 1993
  • This paper describes an efficient array algorithm for parallel computation of vector-radix two-dimensional (2-D) fast discrete cosine transform (VR-FCT), and its VLSI implementation. By mapping the 2-D VR-FCT onto a 2-D array of processing elements (PEs), the butterfly structure of the VR-FCT can be efficiently importanted with high concurrency and local communication geometry. The proposed array algorithm features architectural modularity, regularity and locality, so that it is very suitable for VLSI realization. Also, no transposition memory is required, which is invitable in the conventional row-column decomposition approach. It has the time complexity of O(N+Nnzp-log2N) for (N*N) 2-D DCT, where Nnzd is the number of non-zero digits in canonic-signed digit(CSD) code, By adopting the CSD arithmetic in circuit desine, the number of addition is reduced by about 30%, as compared to the 2`s complement arithmetic. The computational accuracy analysis for finite wordlength processing is presented. From simulation result, it is estimated that (8*8) 2-D DCT (with Nnzp=4) can be computed in about 0.88 sec at 50 MHz clock frequency, resulting in the throughput rate of about 72 Mega pixels per second.

  • PDF