• 제목/요약/키워드: Receptive Field

검색결과 90건 처리시간 0.018초

약한 레이블을 이용한 확장 합성곱 신경망과 게이트 선형 유닛 기반 음향 이벤트 검출 및 태깅 알고리즘 (Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label)

  • 박충호;김동현;고한석
    • 한국음향학회지
    • /
    • 제39권5호
    • /
    • pp.414-423
    • /
    • 2020
  • 본 논문은 약한 레이블 기반 음향 이벤트 검출을 위한 시간-주파수 영역분할 맵 추출 모델에서 발생하는 희소성 및 수용영역 부족에 관한 문제를 완화 시키기 위해, 확장 게이트 선형 유닛(Dilated Convolution Gated Linear Unit, DCGLU)을 제안한다. 딥러닝 분야에서 음향 이벤트 검출을 위한 영역분할 맵 추출 기반 방법은 잡음 환경에서 좋은 성능을 보여준다. 하지만, 이 방법은 영역분할 맵을 추출하기 위해 특징 맵의 크기를 유지해야 하므로 풀링 연산 없이 모델을 구성하게 된다. 이로 인해 이 방법은 희소성과 수용영역의 부족으로 성능 저하를 보이게 된다. 이런 문제를 완화하기 위해, 본 논문에서는 정보의 흐름을 제어할 수 있는 게이트 선형 유닛과 추가의 파라미터 없이 수용영역을 넓혀 줄 수 있는 확장 합성곱 신경망을 적용하였다. 실험을 위해 사용된 데이터는 URBAN-SED와 자체 제작한 조류 울음소리 데이터이며, 제안하는 DCGLU 모델이 기존 베이스라인 논문들보다 더 좋을 성능을 보였다. 특히, DCGLU 모델이 자연 소리가 섞인 환경인 세 개의 Signal to Noise Ratio(SNR)(20 dB, 10 dB, 0 dB)에서 강인하다는 것을 확인하였다.

A Multi-Stage Convolution Machine with Scaling and Dilation for Human Pose Estimation

  • Nie, Yali;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권6호
    • /
    • pp.3182-3198
    • /
    • 2019
  • Vision-based Human Pose Estimation has been considered as one of challenging research subjects due to problems including confounding background clutter, diversity of human appearances and illumination changes in scenes. To tackle these problems, we propose to use a new multi-stage convolution machine for estimating human pose. To provide better heatmap prediction of body joints, the proposed machine repeatedly produces multiple predictions according to stages with receptive field large enough for learning the long-range spatial relationship. And stages are composed of various modules according to their strategic purposes. Pyramid stacking module and dilation module are used to handle problem of human pose at multiple scales. Their multi-scale information from different receptive fields are fused with concatenation, which can catch more contextual information from different features. And spatial and channel information of a given input are converted to gating factors by squeezing the feature maps to a single numeric value based on its importance in order to give each of the network channels different weights. Compared with other ConvNet-based architectures, we demonstrated that our proposed architecture achieved higher accuracy on experiments using standard benchmarks of LSP and MPII pose datasets.

ASPPMVSNet: A high-receptive-field multiview stereo network for dense three-dimensional reconstruction

  • Saleh Saeed;Sungjun Lee;Yongju Cho;Unsang Park
    • ETRI Journal
    • /
    • 제44권6호
    • /
    • pp.1034-1046
    • /
    • 2022
  • The learning-based multiview stereo (MVS) methods for three-dimensional (3D) reconstruction generally use 3D volumes for depth inference. The quality of the reconstructed depth maps and the corresponding point clouds is directly influenced by the spatial resolution of the 3D volume. Consequently, these methods produce point clouds with sparse local regions because of the lack of the memory required to encode a high volume of information. Here, we apply the atrous spatial pyramid pooling (ASPP) module in MVS methods to obtain dense feature maps with multiscale, long-range, contextual information using high receptive fields. For a given 3D volume with the same spatial resolution as that in the MVS methods, the dense feature maps from the ASPP module encoded with superior information can produce dense point clouds without a high memory footprint. Furthermore, we propose a 3D loss for training the MVS networks, which improves the predicted depth values by 24.44%. The ASPP module provides state-of-the-art qualitative results by constructing relatively dense point clouds, which improves the DTU MVS dataset benchmarks by 2.25% compared with those achieved in the previous MVS methods.

Pixel-Wise Polynomial Estimation Model for Low-Light Image Enhancement

  • Muhammad Tahir Rasheed;Daming Shi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권9호
    • /
    • pp.2483-2504
    • /
    • 2023
  • Most existing low-light enhancement algorithms either use a large number of training parameters or lack generalization to real-world scenarios. This paper presents a novel lightweight and robust pixel-wise polynomial approximation-based deep network for low-light image enhancement. For mapping the low-light image to the enhanced image, pixel-wise higher-order polynomials are employed. A deep convolution network is used to estimate the coefficients of these higher-order polynomials. The proposed network uses multiple branches to estimate pixel values based on different receptive fields. With a smaller receptive field, the first branch enhanced local features, the second and third branches focused on medium-level features, and the last branch enhanced global features. The low-light image is downsampled by the factor of 2b-1 (b is the branch number) and fed as input to each branch. After combining the outputs of each branch, the final enhanced image is obtained. A comprehensive evaluation of our proposed network on six publicly available no-reference test datasets shows that it outperforms state-of-the-art methods on both quantitative and qualitative measures.

Multi-scale context fusion network for melanoma segmentation

  • Zhenhua Li;Lei Zhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권7호
    • /
    • pp.1888-1906
    • /
    • 2024
  • Aiming at the problems that the edge of melanoma image is fuzzy, the contrast with the background is low, and the hair occlusion makes it difficult to segment accurately, this paper proposes a model MSCNet for melanoma segmentation based on U-net frame. Firstly, a multi-scale pyramid fusion module is designed to reconstruct the skip connection and transmit global information to the decoder. Secondly, the contextural information conduction module is innovatively added to the top of the encoder. The module provides different receptive fields for the segmented target by using the hole convolution with different expansion rates, so as to better fuse multi-scale contextural information. In addition, in order to suppress redundant information in the input image and pay more attention to melanoma feature information, global channel attention mechanism is introduced into the decoder. Finally, In order to solve the problem of lesion class imbalance, this paper uses a combined loss function. The algorithm of this paper is verified on ISIC 2017 and ISIC 2018 public datasets. The experimental results indicate that the proposed algorithm has better accuracy for melanoma segmentation compared with other CNN-based image segmentation algorithms.

상지구심성 입력에 의한 요수팽대부 척수세포의 활성화 (Activation of Lumbar Spinal Neurons by Forelimb Afferent Inputs in Cats)

  • 구자란;이애주;신홍기;김기순
    • The Korean Journal of Physiology
    • /
    • 제23권2호
    • /
    • pp.409-420
    • /
    • 1989
  • Extracellular recordings were made from the spinal neurons in the lumbar enlargement of 16 cats before and during electrical stimulation of the radial nerve ipsilaterally and contralaterally. Only neurons activated by remote nerve stimulation (RNS) were included in sample. All the cell classes of spinal neurons which received afferents message from the skin and/or muscles were activated by RNS except LT cells. Approximately three quaters of cells activated by RNS had an inhibitory receptive field (RF) on the ipsilateral hindlimb and two thirds of RNS-activated neurons showed spontaneous activity. The most of these RNS-activated cells seemed to be in deep dorsal horn and in ventral horn as well. Stimulation of contralateral radial nerve produced activation of spinal neurons almost same degree as by ipsilateral nerve stimulation. The optimal stimulation parameters of radial nerve for activation of spinal cells were 5Hz-0.5 msec-2V while threshold stimulus for activation was approximately 0.18 V. Following close intra-arterial injection of $K^+$ ion excitability of RNS-activated neuron was increased in 4 of 8 cells whereas it was decreased in 2 of 8 cells. The results indicate that there are some spinal neurons in the lumbar enlargement of cats that can be activated by forelimb afferent $(A{\beta}\;&\;A{\delta})$ inputs.

  • PDF

Comparative Study on the Nociceptive Responses Induced by Whole Bee Venom and Melittin

  • Shin, Hong-Kee;Lee, Kyung-Hee;Lee, Seo-Eun
    • The Korean Journal of Physiology and Pharmacology
    • /
    • 제8권5호
    • /
    • pp.281-288
    • /
    • 2004
  • The present study was undertaken to confirm whether melittin, a major constituent of whole bee venom (WBV), had the ability to produce the same nociceptive responses as those induced by WBV. In the behavioral experiment, changes in mechanical threshold, flinching behaviors and paw thickness (edema) were measured after intraplantar (i.pl.) injection of WBV (0.1 mg & 0.3 mg/paw) and melittin (0.05 mg & 0.15 mg/paw), and intrathecal (i.t.) injection of melittin $(6{\mu}g)$. Also studied were the effects of i.p. (2 mg & 4 mg/kg), i.t. $(0.2{\mu}g\;&\;0.4{\mu}g)$ or i.pl. (0.3 mg) administration of morphine on melittin-induced pain responses. I.pl. injection of melittin at half the dosage of WBV strongly reduced mechanical threshold, and increased flinchings and paw thickness to a similar extent as those induced by WBV. Melittin- and WBV-induced flinchings and changes in mechanical threshold were dose- dependent and had a rapid onset. Paw thickness increased maximally about 1 hr after melittin and WBV treatment. Time-courses of nociceptive responses induced by melittin and WBV were very similar. Melittin-induced decreases in mechanical threshold and flinchings were suppressed by i.p., i.t. or i.pl. injection of morphine. I.t. administration of melittin $(6{\mu}g)$ reduced mechanical threshold of peripheral receptive field and induced flinching behaviors, but did not cause any increase in paw thickness. In the electrophysiological study, i.pl. injection of melittin increased discharge rates of dorsal horn neurons only with C fiber inputs from the peripheral receptive field, which were almost completely blocked by topical application of lidocaine to the sciatic nerve. These findings suggest that pain behaviors induced by WBV are mediated by melittin-induced activation of C afferent fiber, that the melittin-induced pain model is a very useful model for the study of pain, and that melittin-induced nociceptive responses are sensitive to the widely used analgesics, morphine.

영상처리를 이용한 상업용 전자칠판의 인터페이스 구현 (Implementation of Commercial IWB Interface using Image Processing)

  • 고은상;이양원;이창우
    • 한국산업정보학회논문지
    • /
    • 제17권6호
    • /
    • pp.19-24
    • /
    • 2012
  • 본 논문에서는 상업용 아임센서터치 전자칠판(Interactive Whiteboard System, IWB)을 소개한다. 이 시스템은 손가락이나 펜을 이용하여 접촉식 상호작용이 가능한 화이트보드 스크린을 통해 개인용 컴퓨터를 운용할 수 있도록 도와주는 인터페이스(Interface)이다. 제안된 인터페이스는 윈도우즈 운영체제와 상호작용하며, 온도와 조명의 변화에 적응적으로 동작한다. 제안된 시스템은 카메라에서 입력된 수광부(Optical Receptive Field)의 영상을 참조영상과 비교하여 차이를 계산하고, 그 차이를 이용하여 터치스크린의 좌표값을 계산한다. 계산된 좌표값을 기반으로 윈도우즈 마우스 이벤트를 생성하여 윈도우즈시스템으로 전달한다. 우리는 참조영상을 갱신하기 위해 두 개의 스레드(Thread)을 이용한 임계영역을 구현하고, 차이계산의 신뢰성을 위해 적응적 임계값을 이용한 참조영상의 갱신을 구현한다. 제안된 터치스크린 인터페이스를 장착한 전자칠판 시스템은 향후 국내외 시장의 성장률이 높아 전도유망한 상품이며, 시장성이 밝을 것으로 기대한다.

가버 웨이블릿을 이용한 원시 시각 피질 모델 구현에 관한 연구 (Study on the Implementation of Primitive Visual Cortex Model in Retina Using Gabor Wavelet)

  • 이영석
    • 한국정보전자통신기술학회논문지
    • /
    • 제13권6호
    • /
    • pp.477-482
    • /
    • 2020
  • 인간의 시각피질의 특징은 특별한 방향성을 갖거나 시간적인 주파수 변화를 동반하는 자극에는 민감하게 반응하지만, 공간 위상의 선택적 자극에는 둔감하게 작용한다는 것이 고등 포유동물의 시각 피질에 대한 생리학적 실험으로 증명되었다. 이 결과는 위치에 민감한 단순 세포의 분포가 복잡 세포의 분포에 비하여 상대적으로 적은 생리학적 특징에 기인한 것으로 본 논문에서는 원시 시각 피질을 구성하는 단순 세포와 복잡 세포 가운데 더 넓은 분포의 복잡 세포 모델링을 가버 웨이블릿 변환을 이용한 영상추정 반복 알고리즘을 이용하여 구현하였다. 구현된 모델은 영상의 경계 및 모서리의 검출 평가와 함께 기존의 생리학적 실험논문과 구현한 모델의 결과 사이의 일관성을 확인하였다. 구현된 모델은 단순 세포와 복잡 세포가 함께 분포하는 망막의 수용 장을 완전한 형태를 구현할 수 없는 제한이 있지만, 시각 피질을 일부를 담당하는 복잡 세포를 알고리즘의 관점에서 구현하여 더 완전한 시각 피질 모델의 기초로 활용할 수 있다.

ISFRNet: A Deep Three-stage Identity and Structure Feature Refinement Network for Facial Image Inpainting

  • Yan Wang;Jitae Shin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권3호
    • /
    • pp.881-895
    • /
    • 2023
  • Modern image inpainting techniques based on deep learning have achieved remarkable performance, and more and more people are working on repairing more complex and larger missing areas, although this is still challenging, especially for facial image inpainting. For a face image with a huge missing area, there are very few valid pixels available; however, people have an ability to imagine the complete picture in their mind according to their subjective will. It is important to simulate this capability while maintaining the identity features of the face as much as possible. To achieve this goal, we propose a three-stage network model, which we refer to as the identity and structure feature refinement network (ISFRNet). ISFRNet is based on 1) a pre-trained pSp-styleGAN model that generates an extremely realistic face image with rich structural features; 2) a shallow structured network with a small receptive field; and 3) a modified U-net with two encoders and a decoder, which has a large receptive field. We choose structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), L1 Loss and learned perceptual image patch similarity (LPIPS) to evaluate our model. When the missing region is 20%-40%, the above four metric scores of our model are 28.12, 0.942, 0.015 and 0.090, respectively. When the lost area is between 40% and 60%, the metric scores are 23.31, 0.840, 0.053 and 0.177, respectively. Our inpainting network not only guarantees excellent face identity feature recovery but also exhibits state-of-the-art performance compared to other multi-stage refinement models.