• Title/Summary/Keyword: Receptive field

Search Result 87, Processing Time 0.028 seconds

Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label (약한 레이블을 이용한 확장 합성곱 신경망과 게이트 선형 유닛 기반 음향 이벤트 검출 및 태깅 알고리즘)

  • Park, Chungho;Kim, Donghyun;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.414-423
    • /
    • 2020
  • In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).

A Multi-Stage Convolution Machine with Scaling and Dilation for Human Pose Estimation

  • Nie, Yali;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.6
    • /
    • pp.3182-3198
    • /
    • 2019
  • Vision-based Human Pose Estimation has been considered as one of challenging research subjects due to problems including confounding background clutter, diversity of human appearances and illumination changes in scenes. To tackle these problems, we propose to use a new multi-stage convolution machine for estimating human pose. To provide better heatmap prediction of body joints, the proposed machine repeatedly produces multiple predictions according to stages with receptive field large enough for learning the long-range spatial relationship. And stages are composed of various modules according to their strategic purposes. Pyramid stacking module and dilation module are used to handle problem of human pose at multiple scales. Their multi-scale information from different receptive fields are fused with concatenation, which can catch more contextual information from different features. And spatial and channel information of a given input are converted to gating factors by squeezing the feature maps to a single numeric value based on its importance in order to give each of the network channels different weights. Compared with other ConvNet-based architectures, we demonstrated that our proposed architecture achieved higher accuracy on experiments using standard benchmarks of LSP and MPII pose datasets.

ASPPMVSNet: A high-receptive-field multiview stereo network for dense three-dimensional reconstruction

  • Saleh Saeed;Sungjun Lee;Yongju Cho;Unsang Park
    • ETRI Journal
    • /
    • v.44 no.6
    • /
    • pp.1034-1046
    • /
    • 2022
  • The learning-based multiview stereo (MVS) methods for three-dimensional (3D) reconstruction generally use 3D volumes for depth inference. The quality of the reconstructed depth maps and the corresponding point clouds is directly influenced by the spatial resolution of the 3D volume. Consequently, these methods produce point clouds with sparse local regions because of the lack of the memory required to encode a high volume of information. Here, we apply the atrous spatial pyramid pooling (ASPP) module in MVS methods to obtain dense feature maps with multiscale, long-range, contextual information using high receptive fields. For a given 3D volume with the same spatial resolution as that in the MVS methods, the dense feature maps from the ASPP module encoded with superior information can produce dense point clouds without a high memory footprint. Furthermore, we propose a 3D loss for training the MVS networks, which improves the predicted depth values by 24.44%. The ASPP module provides state-of-the-art qualitative results by constructing relatively dense point clouds, which improves the DTU MVS dataset benchmarks by 2.25% compared with those achieved in the previous MVS methods.

Pixel-Wise Polynomial Estimation Model for Low-Light Image Enhancement

  • Muhammad Tahir Rasheed;Daming Shi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2483-2504
    • /
    • 2023
  • Most existing low-light enhancement algorithms either use a large number of training parameters or lack generalization to real-world scenarios. This paper presents a novel lightweight and robust pixel-wise polynomial approximation-based deep network for low-light image enhancement. For mapping the low-light image to the enhanced image, pixel-wise higher-order polynomials are employed. A deep convolution network is used to estimate the coefficients of these higher-order polynomials. The proposed network uses multiple branches to estimate pixel values based on different receptive fields. With a smaller receptive field, the first branch enhanced local features, the second and third branches focused on medium-level features, and the last branch enhanced global features. The low-light image is downsampled by the factor of 2b-1 (b is the branch number) and fed as input to each branch. After combining the outputs of each branch, the final enhanced image is obtained. A comprehensive evaluation of our proposed network on six publicly available no-reference test datasets shows that it outperforms state-of-the-art methods on both quantitative and qualitative measures.

Activation of Lumbar Spinal Neurons by Forelimb Afferent Inputs in Cats (상지구심성 입력에 의한 요수팽대부 척수세포의 활성화)

  • Ku, Ja-Ran;Lee, Ae-Joo;Shin, Hong-Kee;Kim, Kee-Soon
    • The Korean Journal of Physiology
    • /
    • v.23 no.2
    • /
    • pp.409-420
    • /
    • 1989
  • Extracellular recordings were made from the spinal neurons in the lumbar enlargement of 16 cats before and during electrical stimulation of the radial nerve ipsilaterally and contralaterally. Only neurons activated by remote nerve stimulation (RNS) were included in sample. All the cell classes of spinal neurons which received afferents message from the skin and/or muscles were activated by RNS except LT cells. Approximately three quaters of cells activated by RNS had an inhibitory receptive field (RF) on the ipsilateral hindlimb and two thirds of RNS-activated neurons showed spontaneous activity. The most of these RNS-activated cells seemed to be in deep dorsal horn and in ventral horn as well. Stimulation of contralateral radial nerve produced activation of spinal neurons almost same degree as by ipsilateral nerve stimulation. The optimal stimulation parameters of radial nerve for activation of spinal cells were 5Hz-0.5 msec-2V while threshold stimulus for activation was approximately 0.18 V. Following close intra-arterial injection of $K^+$ ion excitability of RNS-activated neuron was increased in 4 of 8 cells whereas it was decreased in 2 of 8 cells. The results indicate that there are some spinal neurons in the lumbar enlargement of cats that can be activated by forelimb afferent $(A{\beta}\;&\;A{\delta})$ inputs.

  • PDF

Comparative Study on the Nociceptive Responses Induced by Whole Bee Venom and Melittin

  • Shin, Hong-Kee;Lee, Kyung-Hee;Lee, Seo-Eun
    • The Korean Journal of Physiology and Pharmacology
    • /
    • v.8 no.5
    • /
    • pp.281-288
    • /
    • 2004
  • The present study was undertaken to confirm whether melittin, a major constituent of whole bee venom (WBV), had the ability to produce the same nociceptive responses as those induced by WBV. In the behavioral experiment, changes in mechanical threshold, flinching behaviors and paw thickness (edema) were measured after intraplantar (i.pl.) injection of WBV (0.1 mg & 0.3 mg/paw) and melittin (0.05 mg & 0.15 mg/paw), and intrathecal (i.t.) injection of melittin $(6{\mu}g)$. Also studied were the effects of i.p. (2 mg & 4 mg/kg), i.t. $(0.2{\mu}g\;&\;0.4{\mu}g)$ or i.pl. (0.3 mg) administration of morphine on melittin-induced pain responses. I.pl. injection of melittin at half the dosage of WBV strongly reduced mechanical threshold, and increased flinchings and paw thickness to a similar extent as those induced by WBV. Melittin- and WBV-induced flinchings and changes in mechanical threshold were dose- dependent and had a rapid onset. Paw thickness increased maximally about 1 hr after melittin and WBV treatment. Time-courses of nociceptive responses induced by melittin and WBV were very similar. Melittin-induced decreases in mechanical threshold and flinchings were suppressed by i.p., i.t. or i.pl. injection of morphine. I.t. administration of melittin $(6{\mu}g)$ reduced mechanical threshold of peripheral receptive field and induced flinching behaviors, but did not cause any increase in paw thickness. In the electrophysiological study, i.pl. injection of melittin increased discharge rates of dorsal horn neurons only with C fiber inputs from the peripheral receptive field, which were almost completely blocked by topical application of lidocaine to the sciatic nerve. These findings suggest that pain behaviors induced by WBV are mediated by melittin-induced activation of C afferent fiber, that the melittin-induced pain model is a very useful model for the study of pain, and that melittin-induced nociceptive responses are sensitive to the widely used analgesics, morphine.

Implementation of Commercial IWB Interface using Image Processing (영상처리를 이용한 상업용 전자칠판의 인터페이스 구현)

  • Ko, Eunsang;Rhee, Yang Won;Lee, Chang Woo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.19-24
    • /
    • 2012
  • In this paper we introduce a commercial interactive whiteboard (IWB) system named ImSensorTouch by ImSensor Inc. Using this interface system, we can control our computer through the interactive whiteboard screen just by touching it with your finger or a pen. The interface interacts with Windows operating system (OS) and is adaptable to changes of surroundings especially temperature, and illumination condition. The proposed system calculates the difference between a reference image and a current image captured by a camera in the optical receptive field. And the position making the difference is used to generate the position on Windows screen. Then, we send a mouse event on the position to Windows OS. We have implemented the system using a critical section(CS) with two threads for the reference frame update process in which an adaptive thresholding technique is periodically exploited to get reliable result. We expect the system is competitive and promises a bright future in the IWB market.

Study on the Implementation of Primitive Visual Cortex Model in Retina Using Gabor Wavelet (가버 웨이블릿을 이용한 원시 시각 피질 모델 구현에 관한 연구)

  • Lee, Youngseok
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.6
    • /
    • pp.477-482
    • /
    • 2020
  • The human visual cortex has the characteristic that reacts sensitively to stimuli with special directional or temporal frequency changes while it is insensitive to selective stimuli of spatial phases. In this paper we implemented the model of complex cell using an image estimation iterative algorithm by Gabor wavelet transform. The performance of implemented model evaluated the consistency between the physiological experimental results in related papers. The implemented model is limited in the complete model of the receptive field in the retina where simple cells and complex cells are distributed together. But the implemented model express the reaction of the complex cells from the point of view of the detection of corners and edges.

ISFRNet: A Deep Three-stage Identity and Structure Feature Refinement Network for Facial Image Inpainting

  • Yan Wang;Jitae Shin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.3
    • /
    • pp.881-895
    • /
    • 2023
  • Modern image inpainting techniques based on deep learning have achieved remarkable performance, and more and more people are working on repairing more complex and larger missing areas, although this is still challenging, especially for facial image inpainting. For a face image with a huge missing area, there are very few valid pixels available; however, people have an ability to imagine the complete picture in their mind according to their subjective will. It is important to simulate this capability while maintaining the identity features of the face as much as possible. To achieve this goal, we propose a three-stage network model, which we refer to as the identity and structure feature refinement network (ISFRNet). ISFRNet is based on 1) a pre-trained pSp-styleGAN model that generates an extremely realistic face image with rich structural features; 2) a shallow structured network with a small receptive field; and 3) a modified U-net with two encoders and a decoder, which has a large receptive field. We choose structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), L1 Loss and learned perceptual image patch similarity (LPIPS) to evaluate our model. When the missing region is 20%-40%, the above four metric scores of our model are 28.12, 0.942, 0.015 and 0.090, respectively. When the lost area is between 40% and 60%, the metric scores are 23.31, 0.840, 0.053 and 0.177, respectively. Our inpainting network not only guarantees excellent face identity feature recovery but also exhibits state-of-the-art performance compared to other multi-stage refinement models.

Attention-induced expansion in visual space (주의에 의한 시각 공간 확장)

  • 유명현;박정선;정찬섭
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.3
    • /
    • pp.51-66
    • /
    • 1999
  • Selective attention induces perceptual distortions. ranging from repulsion of objects located near the attended area(Suzuki & Cavanagh. 1997) to magnification of the u unattended objects (Tsal & Shalev. 1996). Two hypothetical mechanisms have been p postulated: a shift of receptive fields' positions away from the locus of attention(receptive-field-recruitment hypothesis) or the enlargement of perceived space around the a attended location(space-enlargement hypothesis). The present study distinguished between these hypotheses by investigating the spatial and temporal properties of attention-induced d distortions. Perceptual judgements on vernier alignment. line tilt. line length were used to measure attention-induced changes in perception. Attention was induced exogenously(by blinking a specific set of dots around the test stimuli} or endogenously(by instructing the subject to selectively attend the dots). After inducing attention. the test stimuli were briefly flashed. A staircase method was used to measure the attentional effect. A vertical line was perceived as repelled from the locus of attention. and a line segment appeared longer when attention was given to its vicinity. The effects decreased as the distance between the locus of attention or the time between the onset of attention and the stimulus presentation increased. The results imply that the space-enlargement hypothesis provides a better explanation for the attention-induced changes in perception than the receptive-field-recruitment hypothesis.

  • PDF