• Title/Summary/Keyword: 비병렬 데이터

Search Result 305, Processing Time 0.035 seconds

Bias and Gate-Length Dependent Data Extraction of Substrate Circuit Parameters for Deep Submicron MOSFETs (Deep Submicron MOSFET 기판회로 파라미터의 바이어스 및 게이트 길이 종속 데이터 추출)

  • Lee Yongtaek;Choi Munsung;Ku Janam;Lee Seonghearn
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.12
    • /
    • pp.27-34
    • /
    • 2004
  • The study on the RF substrate circuit is necessary to model RF output characteristics of deep submicron MOSFETs below 0.2$\mum$ gate length that have bun commercialized by the recent development of Si submicron process. In this paper, direct extraction methods are developed to apply for a simple substrate resistance model as well as another substrate model with connecting resistance and capacitance in parallel. Using these extraction methods, better agreement with measured Y22-parameter up to 30 GHz is achieved for 0.15$\mum$ CMOS device by using the parallel RC substrate model rather than the simple resistance one, demonstrating the RF accuracy of the parallel model and extraction technique. Using this model, bias and gate length dependent curves of substrate parameters in the RF region are obtained by increasing drain voltage of 0 to 1.2V at deep submicron devices with various gate lengths of 0.11 to 0.5㎛ These new extraction data will greatly contribute to developing a scalable RF nonlinear substrate model.

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

Image Pattern Classification and Recognition by Using the Associative Memory with Cellular Neural Networks (셀룰라 신경회로망의 연상메모리를 이용한 영상 패턴의 분류 및 인식방법)

  • Shin, Yoon-Cheol;Park, Yong-Hun;Kang, Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.2
    • /
    • pp.154-162
    • /
    • 2003
  • In this paper, Associative Memory with Cellular Neural Networks classifies and recognizes image patterns as an operator applied to image process. CNN processes nonlinear data in real-time like neural networks, and made by cell which communicates with each other directly through its neighbor cells as the Cellular Automata does. It is applied to the optimization problem, associative memory, pattern recognition, and computer vision. Image processing with CNN is appropriate to 2-D images, because each cell which corresponds to each pixel in the image is simultaneously processed in parallel. This paper shows the method for designing the structure of associative memory based on CNN and getting output image by choosing the most appropriate weight pattern among the whole learned weight pattern memories. Each template represents weight values between cells and updates them by learning. Hebbian rule is used for learning template weights and LMS algorithm is used for classification.

GLSL based Additional Learning Nearest Neighbor Algorithm suitable for Locating Unpaved Road (추가 학습이 빈번히 필요한 비포장도로에서 주행로 탐색에 적합한 GLSL 기반 ALNN Algorithm)

  • Ku, Bon Woo;Kim, Jun kyum;Rhee, Eun Joo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.1
    • /
    • pp.29-36
    • /
    • 2019
  • Unmanned Autonomous Vehicle's driving road in the national defense includes not only paved roads, but also unpaved roads which have rough and unexpected changes. This Unmanned Autonomous Vehicles monitor and recon rugged or remote areas, and defend own position, they frequently encounter environments roads of various and unpredictable. Thus, they need additional learning to drive in this environment, we propose a Additional Learning Nearest Neighbor (ALNN) which is modified from Approximate Nearest Neighbor to allow for quick learning while avoiding the 'Forgetting' problem. In addition, since the Execution speed of the ALNN algorithm decreases as the learning data accumulates, we also propose a solution to this problem using GPU parallel processing based on OpenGL Shader Language. The ALNN based on GPU algorithm can be used in the field of national defense and other similar fields, which require frequent and quick application of additional learning in real-time without affecting the existing learning data.

Many-to-many voice conversion experiments using a Korean speech corpus (다수 화자 한국어 음성 변환 실험)

  • Yook, Dongsuk;Seo, HyungJin;Ko, Bonggu;Yoo, In-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.351-358
    • /
    • 2022
  • Recently, Generative Adversarial Networks (GAN) and Variational AutoEncoders (VAE) have been applied to voice conversion that can make use of non-parallel training data. Especially, Conditional Cycle-Consistent Generative Adversarial Networks (CC-GAN) and Cycle-Consistent Variational AutoEncoders (CycleVAE) show promising results in many-to-many voice conversion among multiple speakers. However, the number of speakers has been relatively small in the conventional voice conversion studies using the CC-GANs and the CycleVAEs. In this paper, we extend the number of speakers to 100, and analyze the performances of the many-to-many voice conversion methods experimentally. It has been found through the experiments that the CC-GAN shows 4.5 % less Mel-Cepstral Distortion (MCD) for a small number of speakers, whereas the CycleVAE shows 12.7 % less MCD in a limited training time for a large number of speakers.

Analysis on the GPU Performance according to Hierarchical Memory Organization (계층적 메모리 구성에 따른 GPU 성능 분석)

  • Choi, Hongjun;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.3
    • /
    • pp.22-32
    • /
    • 2014
  • Recently, GPGPU has been widely used for general-purpose processing as well as graphics processing by providing optimized hardware for parallel processing. Memory system has big effects on the performance of parallel processing units such as GPU. In the GPU, hierarchical memory architecture is implemented for high memory bandwidth. Moreover, both memory address coalescing and memory request merging techniques are widely used. This paper analyzes the GPU performance according to various memory organizations. According to our simulation results, GPU performance improves by 15.5%, 21.5%, 25.5%, 30.9% as adding 8KB L1, 16KB L1, 32KB L1, 64KB L1 cache, respectively, compared to case without L1 cache. However, experimental results show that some benchmarks decrease performance since memory transaction increases due to data dependency. Moreover, average memory access latency is increased as the depth of hierarchical cache level increases when cache miss occurs significantly.

Slotted Transmissions using Frame aggregation: A MAC protocol for Capacity Enhancement in Ad-hoc Wireless LANs (프레임 집합화를 이용한 애드-혹 무선 랜의 성능 향상을 위한 MAC 프로토콜)

  • Rahman, Md. Mustafizur;Hong, Choong-Seon
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.44 no.8
    • /
    • pp.33-41
    • /
    • 2007
  • The IEEE 802.11 DCF channel access function allows single transmission inside two-hop network in order to avoid collisions and eliminate the hidden and exposed terminal problems. Singular transmission capability causes data frames waiting for the entire roundtrip time in the transmitter neighborhood, and results in increased frame latency and lower network throughput. Real-time and pervasive applications are severely affected for the lower medium utilization; especially with high network traffic. This work proposes a new scheme with the help of Frame Aggregation technique in IEEE802.11n and overcomes the single transmission barrier maintaining the basic DCF functionality. Proposed scheme allows parallel transmissions in non-interfering synchronized slots. Parallel transmissions bypass the conventional physical carrier sense and random Backoff time for several cases and reduce the frame latency and increase the medium utilization and network capacity.

Implementation of Pedestrian Detection and Tracking with GPU at Night-time (GPU를 이용한 야간 보행자 검출과 추적 시스템 구현)

  • Choi, Beom-Joon;Yoon, Byung-Woo;Song, Jong-Kwan;Park, Jangsik
    • Journal of Broadcast Engineering
    • /
    • v.20 no.3
    • /
    • pp.421-429
    • /
    • 2015
  • This paper is about an approach for pedestrian detection and tracking with infrared imagery. We used the CUDA(Computer Unified Device Architecture) that is a parallel processing language in order to improve the speed of video-based pedestrian detection and tracking. The detection phase is performed by Adaboost algorithm based on Haar-like features. Adaboost classifier is trained with datasets generated from infrared images. After detecting the pedestrian with the Adaboost classifier, we proposed a particle filter tracking strategies on HSV histogram feature that exploit adaptively at the same time. The proposed approach is implemented on an NVIDIA Jetson TK1 developer board that is full-featured device ideal for software development within the Linux environment. In this paper, we presented the results of parallel processing with the NVIDIA GPU on the CUDA development environment for detection and tracking of pedestrians. We compared the object detection and tracking processing time for night-time images on both GPU and CPU. The result showed that the detection and tracking speed of the pedestrian with GPU is approximately 6 times faster than that for CPU.

New Parallel MDC FFT Processor for Low Computation Complexity (연산복잡도 감소를 위한 새로운 8-병렬 MDC FFT 프로세서)

  • Kim, Moon Gi;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.75-81
    • /
    • 2015
  • This paper proposed the new eight-parallel MDC FFT processor using the eight-parallel MDC architecture and the efficient scheduling scheme. The proposed FFT processor supports the 256-point FFT based on the modified radix-$2^6$ FFT algorithm. The proposed scheduling scheme can reduce the number of complex multipliers from eight to six without increasing delay buffers and computation cycles. Moreover, the proposed FFT processor can be used in OFDM systems required high throughput and low hardware complexity. The proposed FFT processor has been designed and implemented with a 90nm CMOS technology. The experimental result shows that the area of the proposed FFT processor is $0.27mm^2$. Furthermore, the proposed eight-parallel MDC FFT processor can achieve the throughput rate up to 2.7 GSample/s at 388MHz.

A Hardware Architecture of Hough Transform Using an Improved Voting Scheme (개선된 보팅 정책을 적용한 허프 변환 하드웨어 구조)

  • Lee, Jeong-Rok;Bae, Kyeong-Ryeol;Moon, Byungin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.9
    • /
    • pp.773-781
    • /
    • 2013
  • The Hough transform for line detection is widely used in many machine vision applications due to its robustness against data loss and distortion. However, it is not appropriate for real-time embedded vision systems, because it has inefficient computation structure and demands a large number of memory accesses. Thus, this paper proposes an improved voting scheme of the Hough transform, and then applies this scheme to a Hough transform hardware architecture so that it can provide real-time performance with less hardware resource. The proposed voting scheme reduces computation overhead of the voting procedure using correlation between adjacent pixels, and improves computational efficiency by increasing reusability of vote values. The proposed hardware architecture, which adopts this improved scheme, maximizes its throughput by computing and storing vote values for many adjacent pixels in parallel. This parallelization for throughput improvement is accomplished with little hardware overhead compared with sequential computation.