Search | Korea Science

Color-Image Guided Depth Map Super-Resolution Based on Iterative Depth Feature Enhancement

Lijun Zhao;Ke Wang;Jinjing, Zhang;Jialong Zhang;Anhong Wang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.8
- /
- pp.2068-2082
- /
- 2023
With the rapid development of deep learning, Depth Map Super-Resolution (DMSR) method has achieved more advanced performances. However, when the upsampling rate is very large, it is difficult to capture the structural consistency between color features and depth features by these DMSR methods. Therefore, we propose a color-image guided DMSR method based on iterative depth feature enhancement. Considering the feature difference between high-quality color features and low-quality depth features, we propose to decompose the depth features into High-Frequency (HF) and Low-Frequency (LF) components. Due to structural homogeneity of depth HF components and HF color features, only HF color features are used to enhance the depth HF features without using the LF color features. Before the HF and LF depth feature decomposition, the LF component of the previous depth decomposition and the updated HF component are combined together. After decomposing and reorganizing recursively-updated features, we combine all the depth LF features with the final updated depth HF features to obtain the enhanced-depth features. Next, the enhanced-depth features are input into the multistage depth map fusion reconstruction block, in which the cross enhancement module is introduced into the reconstruction block to fully mine the spatial correlation of depth map by interleaving various features between different convolution groups. Experimental results can show that the two objective assessments of root mean square error and mean absolute deviation of the proposed method are superior to those of many latest DMSR methods.
https://doi.org/10.3837/tiis.2023.08.006 인용 PDF HTML

Lightweight Attention-Guided Network with Frequency Domain Reconstruction for High Dynamic Range Image Fusion

Park, Jae Hyun;Lee, Keuntek;Cho, Nam Ik
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2022.06a
- /
- pp.205-208
- /
- 2022
Multi-exposure high dynamic range (HDR) image reconstruction, the task of reconstructing an HDR image from multiple low dynamic range (LDR) images in a dynamic scene, often produces ghosting artifacts caused by camera motion and moving objects and also cannot deal with washed-out regions due to over or under-exposures. While there has been many deep-learning-based methods with motion estimation to alleviate these problems, they still have limitations for severely moving scenes. They also require large parameter counts, especially in the case of state-of-the-art methods that employ attention modules. To address these issues, we propose a frequency domain approach based on the idea that the transform domain coefficients inherently involve the global information from whole image pixels to cope with large motions. Specifically we adopt Residual Fast Fourier Transform (RFFT) blocks, which allows for global interactions of pixels. Moreover, we also employ Depthwise Overparametrized convolution (DO-conv) blocks, a convolution in which each input channel is convolved with its own 2D kernel, for faster convergence and performance gains. We call this LFFNet (Lightweight Frequency Fusion Network), and experiments on the benchmarks show reduced ghosting artifacts and improved performance up to 0.6dB tonemapped PSNR compared to recent state-of-the-art methods. Our architecture also requires fewer parameters and converges faster in training.
PDF

Progressive occupancy network for 3D reconstruction (3차원 형상 복원을 위한 점진적 점유 예측 네트워크)

Kim, Yonggyu;Kim, Duksu
- Journal of the Korea Computer Graphics Society
- /
- v.27 no.3
- /
- pp.65-74
- /
- 2021
3D reconstruction means that reconstructing the 3D shape of the object in an image and a video. We proposed a progressive occupancy network architecture that can recover not only the overall shape of the object but also the local details. Unlike the original occupancy network, which uses a feature vector embedding information of the whole image, we extract and utilize the different levels of image features depending on the receptive field size. We also propose a novel network architecture that applies the image features sequentially to the decoder blocks in the decoder and improves the quality of the reconstructed 3D shape progressively. In addition, we design a novel decoder block structure that combines the different levels of image features properly and uses them for updating the input point feature. We trained our progressive occupancy network with ShapeNet. We compare its representation power with two prior methods, including prior occupancy network(ONet) and the recent work(DISN) that used different levels of image features like ours. From the perspective of evaluation metrics, our network shows better performance than ONet for all the metrics, and it achieved a little better or a compatible score with DISN. For visualization results, we found that our method successfully reconstructs the local details that ONet misses. Also, compare with DISN that fails to reconstruct the thin parts or occluded parts of the object, our progressive occupancy network successfully catches the parts. These results validate the usefulness of the proposed network architecture.
https://doi.org/10.15701/kcgs.2021.27.3.65 인용 PDF KSCI

3DentAI: U-Nets for 3D Oral Structure Reconstruction from Panoramic X-rays (3DentAI: 파노라마 X-ray로부터 3차원 구강구조 복원을 위한 U-Nets)

Anusree P.Sunilkumar;Seong Yong Moon;Wonsang You
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.7
- /
- pp.326-334
- /
- 2024
Extra-oral imaging techniques such as Panoramic X-rays (PXs) and Cone Beam Computed Tomography (CBCT) are the most preferred imaging modalities in dental clinics owing to its patient convenience during imaging as well as their ability to visualize entire teeth information. PXs are preferred for routine clinical treatments and CBCTs for complex surgeries and implant treatments. However, PXs are limited by the lack of third dimensional spatial information whereas CBCTs inflict high radiation exposure to patient. When a PX is already available, it is beneficial to reconstruct the 3D oral structure from the PX to avoid further expenses and radiation dose. In this paper, we propose 3DentAI - an U-Net based deep learning framework for 3D reconstruction of oral structure from a PX image. Our framework consists of three module - a reconstruction module based on attention U-Net for estimating depth from a PX image, a realignment module for aligning the predicted flattened volume to the shape of jaw using a predefined focal trough and ray data, and lastly a refinement module based on 3D U-Net for interpolating the missing information to obtain a smooth representation of oral cavity. Synthetic PXs obtained from CBCT by ray tracing and rendering were used to train the networks without the need of paired PX and CBCT datasets. Our method, trained and tested on a diverse datasets of 600 patients, achieved superior performance to GAN-based models even with low computational complexity.
https://doi.org/10.3745/TKIPS.2024.13.7.326 인용 PDF

Constrained adversarial loss for generative adversarial network-based faithful image restoration

Kim, Dong-Wook;Chung, Jae-Ryun;Kim, Jongho;Lee, Dae Yeol;Jeong, Se Yoon;Jung, Seung-Won
- ETRI Journal
- /
- v.41 no.4
- /
- pp.415-425
- /
- 2019
Generative adversarial networks (GAN) have been successfully used in many image restoration tasks, including image denoising, super-resolution, and compression artifact reduction. By fully exploiting its characteristics, state-of-the-art image restoration techniques can be used to generate images with photorealistic details. However, there are many applications that require faithful rather than visually appealing image reconstruction, such as medical imaging, surveillance, and video coding. We found that previous GAN-training methods that used a loss function in the form of a weighted sum of fidelity and adversarial loss fails to reduce fidelity loss. This results in non-negligible degradation of the objective image quality, including peak signal-to-noise ratio. Our approach is to alternate between fidelity and adversarial loss in a way that the minimization of adversarial loss does not deteriorate the fidelity. Experimental results on compression-artifact reduction and super-resolution tasks show that the proposed method can perform faithful and photorealistic image restoration.
https://doi.org/10.4218/etrij.2018-0473 인용 PDF KSCI

Face inpainting via Learnable Structure Knowledge of Fusion Network

Yang, You;Liu, Sixun;Xing, Bin;Li, Kesen
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.3
- /
- pp.877-893
- /
- 2022
With the development of deep learning, face inpainting has been significantly enhanced in the past few years. Although image inpainting framework integrated with generative adversarial network or attention mechanism enhanced the semantic understanding among facial components, the issues of reconstruction on corrupted regions are still worthy to explore, such as blurred edge structure, excessive smoothness, unreasonable semantic understanding and visual artifacts, etc. To address these issues, we propose a Learnable Structure Knowledge of Fusion Network (LSK-FNet), which learns a prior knowledge by edge generation network for image inpainting. The architecture involves two steps: Firstly, structure information obtained by edge generation network is used as the prior knowledge for face inpainting network. Secondly, both the generated prior knowledge and the incomplete image are fed into the face inpainting network together to get the fusion information. To improve the accuracy of inpainting, both of gated convolution and region normalization are applied in our proposed model. We evaluate our LSK-FNet qualitatively and quantitatively on the CelebA-HQ dataset. The experimental results demonstrate that the edge structure and details of facial images can be improved by using LSK-FNet. Our model surpasses the compared models on L1, PSNR and SSIM metrics. When the masked region is less than 20%, L1 loss reduce by more than 4.3%.
https://doi.org/10.3837/tiis.2022.03.007 인용 PDF KSCI HTML

Framework for Reconstructing 2D Data Imported from Mobile Devices into 3D Models

Shin, WooSung;Min, JaeEun;Han, WooRi;Kim, YoungSeop
- Journal of the Semiconductor & Display Technology
- /
- v.20 no.4
- /
- pp.6-9
- /
- 2021
The 3D industry is drawing attention for its applications in various markets, including architecture, media, VR/AR, metaverse, imperial broadcast, and etc.. The current feature of the architecture we are introducing is to make 3D models more easily created and modified than conventional ones. Existing methods for generating 3D models mainly obtain values using specialized equipment such as RGB-D cameras and Lidar cameras, through which 3D models are constructed and used. This requires the purchase of equipment and allows the generated 3D model to be verified by the computer. However, our framework allows users to collect data in an easier and cheaper manner using cell phone cameras instead of specialized equipment, and uses 2D data to proceed with 3D modeling on the server and output it to cell phone application screens. This gives users a more accessible environment. In addition, in the 3D modeling process, object classification is attempted through deep learning without user intervention, and mesh and texture suitable for the object can be applied to obtain a lively 3D model. It also allows users to modify mesh and texture through requests, allowing them to obtain sophisticated 3D models.
PDF KSCI

Super-Resolution Reconstruction of Humidity Fields based on Wasserstein Generative Adversarial Network with Gradient Penalty

Tao Li;Liang Wang;Lina Wang;Rui Han
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.18 no.5
- /
- pp.1141-1162
- /
- 2024
Humidity is an important parameter in meteorology and is closely related to weather, human health, and the environment. Due to the limitations of the number of observation stations and other factors, humidity data are often not as good as expected, so high-resolution humidity fields are of great interest and have been the object of desire in the research field and industry. This study presents a novel super-resolution algorithm for humidity fields based on the Wasserstein generative adversarial network(WGAN) framework, with the objective of enhancing the resolution of low-resolution humidity field information. WGAN is a more stable generative adversarial networks(GANs) with Wasserstein metric, and to make the training more stable and simple, the gradient cropping is replaced with gradient penalty, and the network feature representation is improved by sub-pixel convolution, residual block combined with convolutional block attention module(CBAM) and other techniques. We evaluate the proposed algorithm using ERA5 relative humidity data with an hourly resolution of 0.25°×0.25°. Experimental results demonstrate that our approach outperforms not only conventional interpolation techniques, but also the super-resolution generative adversarial network(SRGAN) algorithm.
https://doi.org/10.3837/tiis.2024.05.001 인용 PDF HTML

Single Low-Light Ghost-Free Image Enhancement via Deep Retinex Model

Liu, Yan;Lv, Bingxue;Wang, Jingwen;Huang, Wei;Qiu, Tiantian;Chen, Yunzhong
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.15 no.5
- /
- pp.1814-1828
- /
- 2021
Low-light image enhancement is a key technique to overcome the quality degradation of photos taken under scotopic vision illumination conditions. The degradation includes low brightness, low contrast, and outstanding noise, which would seriously affect the vision of the human eye recognition ability and subsequent image processing. In this paper, we propose an approach based on deep learning and Retinex theory to enhance the low-light image, which includes image decomposition, illumination prediction, image reconstruction, and image optimization. The first three parts can reconstruct the enhanced image that suffers from low-resolution. To reduce the noise of the enhanced image and improve the image quality, a super-resolution algorithm based on the Laplacian pyramid network is introduced to optimize the image. The Laplacian pyramid network can improve the resolution of the enhanced image through multiple feature extraction and deconvolution operations. Furthermore, a combination loss function is explored in the network training stage to improve the efficiency of the algorithm. Extensive experiments and comprehensive evaluations demonstrate the strength of the proposed method, the result is closer to the real-world scene in lightness, color, and details. Besides, experiments also demonstrate that the proposed method with the single low-light image can achieve the same effect as multi-exposure image fusion algorithm and no ghost is introduced.
https://doi.org/10.3837/tiis.2021.05.013 인용 PDF KSCI HTML

A study on speech disentanglement framework based on adversarial learning for speaker recognition (화자 인식을 위한 적대학습 기반 음성 분리 프레임워크에 대한 연구)

Kwon, Yoohwan;Chung, Soo-Whan;Kang, Hong-Goo
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.5
- /
- pp.447-453
- /
- 2020
In this paper, we propose a system to extract effective speaker representations from a speech signal using a deep learning method. Based on the fact that speech signal contains identity unrelated information such as text content, emotion, background noise, and so on, we perform a training such that the extracted features only represent speaker-related information but do not represent speaker-unrelated information. Specifically, we propose an auto-encoder based disentanglement method that outputs both speaker-related and speaker-unrelated embeddings using effective loss functions. To further improve the reconstruction performance in the decoding process, we also introduce a discriminator popularly used in Generative Adversarial Network (GAN) structure. Since improving the decoding capability is helpful for preserving speaker information and disentanglement, it results in the improvement of speaker verification performance. Experimental results demonstrate the effectiveness of our proposed method by improving Equal Error Rate (EER) on benchmark dataset, Voxceleb1.
https://doi.org/10.7776/ASK.2020.39.5.447 인용 PDF KSCI

Search Result 101, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)