• Title/Summary/Keyword: Multiple images

Search Result 1,390, Processing Time 0.037 seconds

Performance Improvement Method of Convolutional Neural Network Using Combined Parametric Activation Functions (결합된 파라메트릭 활성함수를 이용한 합성곱 신경망의 성능 향상)

  • Ko, Young Min;Li, Peng Hang;Ko, Sun Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.9
    • /
    • pp.371-380
    • /
    • 2022
  • Convolutional neural networks are widely used to manipulate data arranged in a grid, such as images. A general convolutional neural network consists of a convolutional layers and a fully connected layers, and each layer contains a nonlinear activation functions. This paper proposes a combined parametric activation function to improve the performance of convolutional neural networks. The combined parametric activation function is created by adding the parametric activation functions to which parameters that convert the scale and location of the activation function are applied. Various nonlinear intervals can be created according to parameters that convert multiple scales and locations, and parameters can be learned in the direction of minimizing the loss function calculated by the given input data. As a result of testing the performance of the convolutional neural network using the combined parametric activation function on the MNIST, Fashion MNIST, CIFAR10 and CIFAR100 classification problems, it was confirmed that it had better performance than other activation functions.

COVID-19 Diagnosis from CXR images through pre-trained Deep Visual Embeddings

  • Khalid, Shahzaib;Syed, Muhammad Shehram Shah;Saba, Erum;Pirzada, Nasrullah
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.175-181
    • /
    • 2022
  • COVID-19 is an acute respiratory syndrome that affects the host's breathing and respiratory system. The novel disease's first case was reported in 2019 and has created a state of emergency in the whole world and declared a global pandemic within months after the first case. The disease created elements of socioeconomic crisis globally. The emergency has made it imperative for professionals to take the necessary measures to make early diagnoses of the disease. The conventional diagnosis for COVID-19 is through Polymerase Chain Reaction (PCR) testing. However, in a lot of rural societies, these tests are not available or take a lot of time to provide results. Hence, we propose a COVID-19 classification system by means of machine learning and transfer learning models. The proposed approach identifies individuals with COVID-19 and distinguishes them from those who are healthy with the help of Deep Visual Embeddings (DVE). Five state-of-the-art models: VGG-19, ResNet50, Inceptionv3, MobileNetv3, and EfficientNetB7, were used in this study along with five different pooling schemes to perform deep feature extraction. In addition, the features are normalized using standard scaling, and 4-fold cross-validation is used to validate the performance over multiple versions of the validation data. The best results of 88.86% UAR, 88.27% Specificity, 89.44% Sensitivity, 88.62% Accuracy, 89.06% Precision, and 87.52% F1-score were obtained using ResNet-50 with Average Pooling and Logistic regression with class weight as the classifier.

The efficacy of different implant surface decontamination methods using spectrophotometric analysis: an in vitro study

  • Roberto Giffi;Davide Pietropaoli;Leonardo Mancini;Francesco Tarallo;Philipp Sahrmann;Enrico Marchetti
    • Journal of Periodontal and Implant Science
    • /
    • v.53 no.4
    • /
    • pp.295-305
    • /
    • 2023
  • Purpose: Various methods have been proposed to achieve the nearly complete decontamination of the surface of implants affected by peri-implantitis. We investigated the in vitro debridement efficiency of multiple decontamination methods (Gracey curettes [GC], glycine air-polishing [G-Air], erythritol air-polishing [E-Air] and titanium brushes [TiB]) using a novel spectrophotometric ink-model in 3 different bone defect settings (30°, 60°, and 90°). Methods: Forty-five dental implants were stained with indelible ink and mounted in resin models, which simulated standardised peri-implantitis defects with different bone defect angulations (30°, 60°, and 90°). After each run of instrumentation, the implants were removed from the resin model, and the ink was dissolved in ethanol (97%). A spectrophotometric analysis was performed to detect colour remnants in order to measure the cumulative uncleaned surface area of the implants. Scanning electron microscopy images were taken to assess micromorphological surface changes. Results: Generally, the 60° bone defects were the easiest to debride, and the 30° defects were the most difficult (ink absorption peak: 0.26±0.04 for 60° defects; 0.32±0.06 for 30° defects; 0.27±0.04 for 90° defects). The most effective debridement method was TiB, independently of the bone defect type (TiB vs. GC: P<0.0001; TiB vs. G-Air: P=0.0017; TiB vs. GE-Air: P=0.0007). GE-Air appeared to be the least efficient method for biofilm debridement. Conclusions: T-brushes seem to be a promising decontamination method compared to the other techniques, whereas G-Air was less aggressive on the implant surface. The use of a spectrophotometric model was shown to be a novel but promising assessment method for in vitro ink studies.

3D Rigid Body Tracking Algorithm Using 2D Passive Marker Image (2D 패시브마커 영상을 이용한 3차원 리지드 바디 추적 알고리즘)

  • Park, Byung-Seo;Kim, Dong-Wook;Seo, Young-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.587-588
    • /
    • 2022
  • In this paper, we propose a rigid body tracking method in 3D space using 2D passive marker images from multiple motion capture cameras. First, a calibration process using a chess board is performed to obtain the internal variables of individual cameras, and in the second calibration process, the triangular structure with three markers is moved so that all cameras can observe it, and then the accumulated data for each frame is calculated. Correction and update of relative position information between cameras. After that, the three-dimensional coordinates of the three markers were restored through the process of converting the coordinate system of each camera into the 3D world coordinate system, the distance between each marker was calculated, and the difference with the actual distance was compared. As a result, an error within an average of 2mm was measured.

  • PDF

Soil moisture estimation of YongdamDam watershed using vegetation index from Sentinel-1 and -2 satellite images (Sentinel-1 및 Sentinel-2 위성영상기반 식생지수를 활용한 용담댐 유역의 토양수분 산정)

  • Son, Moobeen;Chung, Jeehun;Lee, Yonggwan;Woo, Soyoung;Kim, Seongjoon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.161-161
    • /
    • 2021
  • 본 연구에서는 금강 상류의 용담댐 유역(930.0 km2)을 대상으로 Sentinel-1 SAR(Synthetic Aperture Radar) 및 Sentinel-2 MultiSpectral Instrument(MSI) 위성영상을 활용한 토양수분 산출연구를 수행하였다. 연구에 사용된 자료는 10 m 해상도의 Sentinel-1 IW(Interferometric Wide swath) mode GRD(Ground Range Detected) product의 VV(Vertical transmit-Vertical receive) 및 VH(Vertical transmit-Horizontal receive) 편파자료와 Sentinel-2 Level-2A Bottom of Atmosphere(BOA) reflectance 자료를 2019년에 대해 각 6일 및 5일 간격으로 구축하였다. 위성영상의 Image processing은 SNAP(SentiNel Application Platform)을 활용하여 Sentinel-1 영상의 편파 별(VV, VH) 후방산란계수와 Sentinel-2의 적색(Band-4) 및 근적외(Band-8) 영상을 생성하였다. 토양수분 산출 모형은 다중선형회귀모형(Multiple Linear Regression Model)을 활용하였으며, 각 지점에 해당하는 토양 속성별로 모형을 생성하였다. 모형의 입력자료는 Sentinel-1 위성의 편파별 후방산란계수, Sentinel-1 위성에서 산출된 식생지수 RVI(Radar Vegetation Index)와 Sentinel-2 위성에서 산출된 NDVI(Normalized Difference Vegetation Index)를 활용하여 식생의 영향을 반영하고자 하였다. 모의 된 토양수분을 검증하기 위해 6개 지점의 TDR(Time Domain Reflectometry) 기반 실측 토양수분 자료를 수집하고, 상관계수(Correlation Coefficient, R), 평균제곱근오차(Root Mean Square Error, RMSE) 및 IOA(Index of Agreement)를 활용하여 전체 기간 및 계절별로 나누어 검증할 예정이다.

  • PDF

Privacy Preserving Techniques for Deep Learning in Multi-Party System (멀티 파티 시스템에서 딥러닝을 위한 프라이버시 보존 기술)

  • Hye-Kyeong Ko
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.647-654
    • /
    • 2023
  • Deep Learning is a useful method for classifying and recognizing complex data such as images and text, and the accuracy of the deep learning method is the basis for making artificial intelligence-based services on the Internet useful. However, the vast amount of user da vita used for training in deep learning has led to privacy violation problems, and it is worried that companies that have collected personal and sensitive data of users, such as photographs and voices, own the data indefinitely. Users cannot delete their data and cannot limit the purpose of use. For example, data owners such as medical institutions that want to apply deep learning technology to patients' medical records cannot share patient data because of privacy and confidentiality issues, making it difficult to benefit from deep learning technology. In this paper, we have designed a privacy preservation technique-applied deep learning technique that allows multiple workers to use a neural network model jointly, without sharing input datasets, in multi-party system. We proposed a method that can selectively share small subsets using an optimization algorithm based on modified stochastic gradient descent, confirming that it could facilitate training with increased learning accuracy while protecting private information.

Lightweight Attention-Guided Network with Frequency Domain Reconstruction for High Dynamic Range Image Fusion

  • Park, Jae Hyun;Lee, Keuntek;Cho, Nam Ik
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.205-208
    • /
    • 2022
  • Multi-exposure high dynamic range (HDR) image reconstruction, the task of reconstructing an HDR image from multiple low dynamic range (LDR) images in a dynamic scene, often produces ghosting artifacts caused by camera motion and moving objects and also cannot deal with washed-out regions due to over or under-exposures. While there has been many deep-learning-based methods with motion estimation to alleviate these problems, they still have limitations for severely moving scenes. They also require large parameter counts, especially in the case of state-of-the-art methods that employ attention modules. To address these issues, we propose a frequency domain approach based on the idea that the transform domain coefficients inherently involve the global information from whole image pixels to cope with large motions. Specifically we adopt Residual Fast Fourier Transform (RFFT) blocks, which allows for global interactions of pixels. Moreover, we also employ Depthwise Overparametrized convolution (DO-conv) blocks, a convolution in which each input channel is convolved with its own 2D kernel, for faster convergence and performance gains. We call this LFFNet (Lightweight Frequency Fusion Network), and experiments on the benchmarks show reduced ghosting artifacts and improved performance up to 0.6dB tonemapped PSNR compared to recent state-of-the-art methods. Our architecture also requires fewer parameters and converges faster in training.

  • PDF

Generating Radiology Reports via Multi-feature Optimization Transformer

  • Rui Wang;Rong Hua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2768-2787
    • /
    • 2023
  • As an important research direction of the application of computer science in the medical field, the automatic generation technology of radiology report has attracted wide attention in the academic community. Because the proportion of normal regions in radiology images is much larger than that of abnormal regions, words describing diseases are often masked by other words, resulting in significant feature loss during the calculation process, which affects the quality of generated reports. In addition, the huge difference between visual features and semantic features causes traditional multi-modal fusion method to fail to generate long narrative structures consisting of multiple sentences, which are required for medical reports. To address these challenges, we propose a multi-feature optimization Transformer (MFOT) for generating radiology reports. In detail, a multi-dimensional mapping attention (MDMA) module is designed to encode the visual grid features from different dimensions to reduce the loss of primary features in the encoding process; a feature pre-fusion (FP) module is constructed to enhance the interaction ability between multi-modal features, so as to generate a reasonably structured radiology report; a detail enhanced attention (DEA) module is proposed to enhance the extraction and utilization of key features and reduce the loss of key features. In conclusion, we evaluate the performance of our proposed model against prevailing mainstream models by utilizing widely-recognized radiology report datasets, namely IU X-Ray and MIMIC-CXR. The experimental outcomes demonstrate that our model achieves SOTA performance on both datasets, compared with the base model, the average improvement of six key indicators is 19.9% and 18.0% respectively. These findings substantiate the efficacy of our model in the domain of automated radiology report generation.

Multi-Emotion Regression Model for Recognizing Inherent Emotions in Speech Data (음성 데이터의 내재된 감정인식을 위한 다중 감정 회귀 모델)

  • Moung Ho Yi;Myung Jin Lim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.81-88
    • /
    • 2023
  • Recently, communication through online is increasing due to the spread of non-face-to-face services due to COVID-19. In non-face-to-face situations, the other person's opinions and emotions are recognized through modalities such as text, speech, and images. Currently, research on multimodal emotion recognition that combines various modalities is actively underway. Among them, emotion recognition using speech data is attracting attention as a means of understanding emotions through sound and language information, but most of the time, emotions are recognized using a single speech feature value. However, because a variety of emotions exist in a complex manner in a conversation, a method for recognizing multiple emotions is needed. Therefore, in this paper, we propose a multi-emotion regression model that extracts feature vectors after preprocessing speech data to recognize complex, inherent emotions and takes into account the passage of time.

Fashion Category Oversampling Automation System

  • Minsun Yeu;Do Hyeok Yoo;SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.31-40
    • /
    • 2024
  • In the realm of domestic online fashion platform industry the manual registration of product information by individual business owners leads to inconvenience and reliability issues, especially when dealing with simultaneous registrations of numerous product groups. Moreover, bias is significantly heightened due to the low quality of product images and an imbalance in data quantity. Therefore, this study proposes a ResNet50 model aimed at minimizing data bias through oversampling techniques and conducting multiple classifications for 13 fashion categories. Transfer learning is employed to optimize resource utilization and reduce prolonged learning times. The results indicate improved discrimination of up to 33.4% for data augmentation in classes with insufficient data compared to the basic convolution neural network (CNN) model. The reliability of all outcomes is underscored by precision and affirmed by the recall curve. This study is suggested to advance the development of the domestic online fashion platform industry to a higher echelon.