• Title/Summary/Keyword: Pooling

Search Result 316, Processing Time 0.029 seconds

Characteristics of Sputtered Ta films by Statistical Method (통계적 실험 방법에 의한 Ta 박막의 증착 특성 연구)

  • Seo, Yu-Seok;Park, Dae-Gyu;Jeong, Cheol-Mo;Kim, Sang-Beom;Son, Pyeong-Geun;Lee, Seung-Jin;Kim, Han-Min;Yang, Hong-Seon;Park, Jin-Won
    • Korean Journal of Materials Research
    • /
    • v.11 no.6
    • /
    • pp.492-497
    • /
    • 2001
  • We report the characteristics and the dependence of sputter-deposited Ta films on the process parameters. The properties of as-deposited Ta films such as deposition rate, resistivity, Rs uniformity, reflectivity, and stress were investigated and analyzed as a function of process parameter using a statistical experimental method. The functional relationships between the independent and dependent variables were predicted by surface response. The optimal deposition condition of DC magnetron sputtered Ta films was obtained at the chamber pressure of 2 mTorr, power density of 8 W/$\textrm{cm}^2$, and substrate temperature of 2$0^{\circ}C$ by means of resistivity and Rs uniformity. The fitness value for quadratic model as evaluated by the R- square was 0.85~ 0.9 without pooling. The as-deposited Ta films exhibited the resistivity of ~180$\mu$$\Omega$cm with Rs uniformity of ~2%. The transmission electron microscopy and x-ray diffractometry identified that the phase of as-deposited film was $\beta$-Ta having the grain size of 100~200.

  • PDF

Optimal Band Selection Techniques for Hyperspectral Image Pixel Classification using Pooling Operations & PSNR (초분광 이미지 픽셀 분류를 위한 풀링 연산과 PSNR을 이용한 최적 밴드 선택 기법)

  • Chang, Duhyeuk;Jung, Byeonghyeon;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.5
    • /
    • pp.141-147
    • /
    • 2021
  • In this paper, in order to improve the utilization of hyperspectral large-capacity data feature information by reducing complex computations by dimension reduction of neural network inputs in embedded systems, the band selection algorithm is applied in each subset. Among feature extraction and feature selection techniques, the feature selection aim to improve the optimal number of bands suitable for datasets, regardless of wavelength range, and the time and performance, more than others algorithms. Through this experiment, although the time required was reduced by 1/3 to 1/9 times compared to the others band selection technique, meaningful results were improved by more than 4% in terms of performance through the K-neighbor classifier. Although it is difficult to utilize real-time hyperspectral data analysis now, it has confirmed the possibility of improvement.

Depth Map Estimation Model Using 3D Feature Volume (3차원 특징볼륨을 이용한 깊이영상 생성 모델)

  • Shin, Soo-Yeon;Kim, Dong-Myung;Suh, Jae-Won
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.11
    • /
    • pp.447-454
    • /
    • 2018
  • This paper proposes a depth image generation algorithm of stereo images using a deep learning model composed of a CNN (convolutional neural network). The proposed algorithm consists of a feature extraction unit which extracts the main features of each parallax image and a depth learning unit which learns the parallax information using extracted features. First, the feature extraction unit extracts a feature map for each parallax image through the Xception module and the ASPP(Atrous spatial pyramid pooling) module, which are composed of 2D CNN layers. Then, the feature map for each parallax is accumulated in 3D form according to the time difference and the depth image is estimated after passing through the depth learning unit for learning the depth estimation weight through 3D CNN. The proposed algorithm estimates the depth of object region more accurately than other algorithms.

COVID-19 Diagnosis from CXR images through pre-trained Deep Visual Embeddings

  • Khalid, Shahzaib;Syed, Muhammad Shehram Shah;Saba, Erum;Pirzada, Nasrullah
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.175-181
    • /
    • 2022
  • COVID-19 is an acute respiratory syndrome that affects the host's breathing and respiratory system. The novel disease's first case was reported in 2019 and has created a state of emergency in the whole world and declared a global pandemic within months after the first case. The disease created elements of socioeconomic crisis globally. The emergency has made it imperative for professionals to take the necessary measures to make early diagnoses of the disease. The conventional diagnosis for COVID-19 is through Polymerase Chain Reaction (PCR) testing. However, in a lot of rural societies, these tests are not available or take a lot of time to provide results. Hence, we propose a COVID-19 classification system by means of machine learning and transfer learning models. The proposed approach identifies individuals with COVID-19 and distinguishes them from those who are healthy with the help of Deep Visual Embeddings (DVE). Five state-of-the-art models: VGG-19, ResNet50, Inceptionv3, MobileNetv3, and EfficientNetB7, were used in this study along with five different pooling schemes to perform deep feature extraction. In addition, the features are normalized using standard scaling, and 4-fold cross-validation is used to validate the performance over multiple versions of the validation data. The best results of 88.86% UAR, 88.27% Specificity, 89.44% Sensitivity, 88.62% Accuracy, 89.06% Precision, and 87.52% F1-score were obtained using ResNet-50 with Average Pooling and Logistic regression with class weight as the classifier.

Lip and Voice Synchronization Using Visual Attention (시각적 어텐션을 활용한 입술과 목소리의 동기화 연구)

  • Dongryun Yoon;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.166-173
    • /
    • 2024
  • This study explores lip-sync detection, focusing on the synchronization between lip movements and voices in videos. Typically, lip-sync detection techniques involve cropping the facial area of a given video, utilizing the lower half of the cropped box as input for the visual encoder to extract visual features. To enhance the emphasis on the articulatory region of lips for more accurate lip-sync detection, we propose utilizing a pre-trained visual attention-based encoder. The Visual Transformer Pooling (VTP) module is employed as the visual encoder, originally designed for the lip-reading task, predicting the script based solely on visual information without audio. Our experimental results demonstrate that, despite having fewer learning parameters, our proposed method outperforms the latest model, VocaList, on the LRS2 dataset, achieving a lip-sync detection accuracy of 94.5% based on five context frames. Moreover, our approach exhibits an approximately 8% superiority over VocaList in lip-sync detection accuracy, even on an untrained dataset, Acappella.

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.

An Approach for Frequency Analysis of Multiyear Drought Magnitude and Severity (다년간 계속되는 갈수의 크기 및 심도에 관한 빈도분석 방안)

  • Lee, Kil Seong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.7 no.1
    • /
    • pp.111-120
    • /
    • 1987
  • A frequency analysis procedure for the multi-year drought severity/magnitude is developed using observed duration-dependent deficit properties. A standardization of the deficit with the decimated monthly deficit statistics and a data pooling procedure are performed to identify the change of mean deficit. The reproductive properties of the Gamma family of distribution for the deficit are utilized to estimate the parameters of drought magnitude and severity. Compounding of these distributions with the duration distribution and an implication of the results for the realtime forecasting are discussed.

  • PDF

3D CNN-Based Segmentation of Prostate MR images (3D CNN 기반 전립선 MRI 영상 분할 기술)

  • Mun, Juhyeok;Choi, Hwan;Lee, Se-Ho;Jang, Won-Dong;Kim, Chang-Su
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2017.06a
    • /
    • pp.145-146
    • /
    • 2017
  • 본 논문에서는 남성의 하반신을 촬영한 MRI 영상으로부터 전립선을 분할하는 알고리즘을 제안한다. 우선 3 차원 입체 영상을 학습하기 위해 3D 컨볼루션 계층(convolutional layer) 및 3D 풀링 계층(pooling layer)에 기반한 네트워크를 제안한다. 다음으로 네트워크의 최후단에 해당하는 전연결 계층(fully connected layer)의 강인한 학습을 돕는 잡음 계층을 제안한다. 잡음 계층은 네트워크의 학습 파라미터 혹은 출력 영상에 가우시안 잡음를 더함으로써 드롭 아웃과 같이 훈련 영상에 대한 과적합(overfitting)을 막고 테스트 영상에 강인한 네트워크의 학습을 돕는다. 마지막으로 실험을 통해 제안하는 기법이 기존 기법에 비해 우수한 분할 성능을 보임을 확인한다.

  • PDF

"Pool-the-Maximum-Violators" Algorithm

  • Kikuo Yanagi;Akio Kudo;Park, Yong-Beom
    • Journal of the Korean Statistical Society
    • /
    • v.21 no.2
    • /
    • pp.201-207
    • /
    • 1992
  • The algorithm for obtaining the isotonic regression in simple tree order, the most basic and simplest model next to the simple order, is considered. We propose to call it "Pool-the-Maximum-Violators" algorithm (PMVA) in conjunction with the "Pool-Adjacent-Violators" algorithm (PAVA) in the simple order. The dual problem of obtaining the isotonic regression in simple tree order is our main concern. An intuitively appealing relation between the primal and the dual problems is demonstrated. The interesting difference is that in simple order the required number of pooling is at least the number of initial violating pairs and any path leads to the solution, whereas in the simple tree order it is at most the number of initial violators and there is only one advisable path although there may be some others leading to the same solution.o the same solution.

  • PDF

Multiscale Spatial Position Coding under Locality Constraint for Action Recognition

  • Yang, Jiang-feng;Ma, Zheng;Xie, Mei
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.4
    • /
    • pp.1851-1863
    • /
    • 2015
  • – In the paper, to handle the problem of traditional bag-of-features model ignoring the spatial relationship of local features in human action recognition, we proposed a Multiscale Spatial Position Coding under Locality Constraint method. Specifically, to describe this spatial relationship, we proposed a mixed feature combining motion feature and multi-spatial-scale configuration. To utilize temporal information between features, sub spatial-temporal-volumes are built. Next, the pooled features of sub-STVs are obtained via max-pooling method. In classification stage, the Locality-Constrained Group Sparse Representation is adopted to utilize the intrinsic group information of the sub-STV features. The experimental results on the KTH, Weizmann, and UCF sports datasets show that our action recognition system outperforms the classical local ST feature-based recognition systems published recently.