• Title/Summary/Keyword: Encoder Model

Search Result 354, Processing Time 0.021 seconds

Variational Auto Encoder Distributed Restrictions for Image Generation (이미지 생성을 위한 변동 자동 인코더 분산 제약)

  • Yong-Gil Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.91-97
    • /
    • 2023
  • Recent research shows that latent directions can be used to image process towards certain attributes. However, controlling the generation process of generative model is very difficult. Though the latent directions are used to image process for certain attributes, many restrictions are required to enhance the attributes received the latent vectors according to certain text and prompts and other attributes largely unaffected. This study presents a generative model having certain restriction to the latent vectors for image generation and manipulation. The suggested method requires only few minutes per manipulation, and the simulation results through Tensorflow Variational Auto-encoder show the effectiveness of the suggested approach with extensive results.

A study on the auto encoder-based anomaly detection technique for pipeline inspection (관로 조사를 위한 오토 인코더 기반 이상 탐지기법에 관한 연구)

  • Gwantae Kim;Junewon Lee
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.38 no.2
    • /
    • pp.83-93
    • /
    • 2024
  • In this study, we present a sewer pipe inspection technique through a combination of active sonar technology and deep learning algorithms. It is difficult to inspect pipes containing water using conventional CCTV inspection methods, and there are various limitations, so a new approach is needed. In this paper, we introduce a inspection method using active sonar, and apply an auto encoder deep learning model to process sonar data to distinguish between normal and abnormal pipelines. This model underwent training on sonar data from a controlled environment under the assumption of normal pipeline conditions and utilized anomaly detection techniques to identify deviations from established standards. This approach presents a new perspective in pipeline inspection, promising to reduce the time and resources required for sewer system management and to enhance the reliability of pipeline inspections.

Accuracy Assessment of Forest Degradation Detection in Semantic Segmentation based Deep Learning Models with Time-series Satellite Imagery

  • Woo-Dam Sim;Jung-Soo Lee
    • Journal of Forest and Environmental Science
    • /
    • v.40 no.1
    • /
    • pp.15-23
    • /
    • 2024
  • This research aimed to assess the possibility of detecting forest degradation using time-series satellite imagery and three different deep learning-based change detection techniques. The dataset used for the deep learning models was composed of two sets, one based on surface reflectance (SR) spectral information from satellite imagery, combined with Texture Information (GLCM; Gray-Level Co-occurrence Matrix) and terrain information. The deep learning models employed for land cover change detection included image differencing using the Unet semantic segmentation model, multi-encoder Unet model, and multi-encoder Unet++ model. The study found that there was no significant difference in accuracy between the deep learning models for forest degradation detection. Both training and validation accuracies were approx-imately 89% and 92%, respectively. Among the three deep learning models, the multi-encoder Unet model showed the most efficient analysis time and comparable accuracy. Moreover, models that incorporated both texture and gradient information in addition to spectral information were found to have a higher classification accuracy compared to models that used only spectral information. Overall, the accuracy of forest degradation extraction was outstanding, achieving 98%.

System-level Function and Architecture Codesign for Optimization of MPEG Encoder

  • Choi, Jin-Ku;Togawa, Nozomu;Yanagisawa, Masao;Ohtsuki, Tatsuo
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1736-1739
    • /
    • 2002
  • The advanced in semiconductor, hardware, and software technologies enables the integration of more com- plex systems and the increasing design complexity. As system design complexity becomes more complicated, System-level design based on the If block and processor model is more needed in most of the RTL level or low level. In this paper, we present a novel approach fur the system-level design, which satisfies the various required constraints and an optimization method of image encoder based on codesign of function, algorithm, and architecture. In addition, we show an MPEG-4 encoder as a design case study. The best tradeoffs between algorithm and architecture are necessary to deliver the design with satisfying performance and area constraints. The evaluations provide the effective optimization of motion estimation, which is in charge of an amount of performance in the MPEG-4 encoder module.

  • PDF

Deep Reference-based Dynamic Scene Deblurring

  • Cunzhe Liu;Zhen Hua;Jinjiang Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.3
    • /
    • pp.653-669
    • /
    • 2024
  • Dynamic scene deblurring is a complex computer vision problem owing to its difficulty to model mathematically. In this paper, we present a novel approach for image deblurring with the help of the sharp reference image, which utilizes the reference image for high-quality and high-frequency detail results. To better utilize the clear reference image, we develop an encoder-decoder network and two novel modules are designed to guide the network for better image restoration. The proposed Reference Extraction and Aggregation Module can effectively establish the correspondence between blurry image and reference image and explore the most relevant features for better blur removal and the proposed Spatial Feature Fusion Module enables the encoder to perceive blur information at different spatial scales. In the final, the multi-scale feature maps from the encoder and cascaded Reference Extraction and Aggregation Modules are integrated into the decoder for a global fusion and representation. Extensive quantitative and qualitative experimental results from the different benchmarks show the effectiveness of our proposed method.

Implementation of Encoder/Decoder to Support SNN Model in an IoT Integrated Development Environment based on Neuromorphic Architecture (뉴로모픽 구조 기반 IoT 통합 개발환경에서 SNN 모델을 지원하기 위한 인코더/디코더 구현)

  • Kim, Hoinam;Yun, Young-Sun
    • Journal of Software Assessment and Valuation
    • /
    • v.17 no.2
    • /
    • pp.47-57
    • /
    • 2021
  • Neuromorphic technology is proposed to complement the shortcomings of existing artificial intelligence technology by mimicking the human brain structure and computational process with hardware. NA-IDE has also been proposed for developing neuromorphic hardware-based IoT applications. To implement an SNN model in NA-IDE, commonly used input data must be transformed for use in the SNN model. In this paper, we implemented a neural coding method encoder component that converts image data into a spike train signal and uses it as an SNN input. The decoder component is implemented to convert the output back to image data when the SNN model generates a spike train signal. If the decoder component uses the same parameters as the encoding process, it can generate static data similar to the original data. It can be used in fields such as image-to-image and speech-to-speech to transform and regenerate input data using the proposed encoder and decoder.

Ensemble UNet 3+ for Medical Image Segmentation

  • JongJin, Park
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.269-274
    • /
    • 2023
  • In this paper, we proposed a new UNet 3+ model for medical image segmentation. The proposed ensemble(E) UNet 3+ model consists of UNet 3+s of varying depths into one unified architecture. UNet 3+s of varying depths have same encoder, but have their own decoders. They can bridge semantic gap between encoder and decoder nodes of UNet 3+. Deep supervision was used for learning on a total of 8 nodes of the E-UNet 3+ to improve performance. The proposed E-UNet 3+ model shows better segmentation results than those of the UNet 3+. As a result of the simulation, the E-UNet 3+ model using deep supervision was the best with loss function values of 0.8904 and 0.8562 for training and validation data. For the test data, the UNet 3+ model using deep supervision was the best with a value of 0.7406. Qualitative comparison of the simulation results shows the results of the proposed model are better than those of existing UNet 3+.

Zero-shot voice conversion with HuBERT

  • Hyelee Chung;Hosung Nam
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.69-74
    • /
    • 2023
  • This study introduces an innovative model for zero-shot voice conversion that utilizes the capabilities of HuBERT. Zero-shot voice conversion models can transform the speech of one speaker to mimic that of another, even when the model has not been exposed to the target speaker's voice during the training phase. Comprising five main components (HuBERT, feature encoder, flow, speaker encoder, and vocoder), the model offers remarkable performance across a range of scenarios. Notably, it excels in the challenging unseen-to-unseen voice-conversion tasks. The effectiveness of the model was assessed based on the mean opinion scores and similarity scores, reflecting high voice quality and similarity to the target speakers. This model demonstrates considerable promise for a range of real-world applications demanding high-quality voice conversion. This study sets a precedent in the exploration of HuBERT-based models for voice conversion, and presents new directions for future research in this domain. Despite its complexities, the robust performance of this model underscores the viability of HuBERT in advancing voice conversion technology, making it a significant contributor to the field.

Design and Implementation of a Reusable and Extensible HL7 Encoding/Decoding Framework (재사용성과 확장성 있는 HL7 인코딩/디코딩 프레임워크의 설계 및 구현)

  • Kim, Jung-Sun;Park, Seung-Hun;Nah, Yun-Mook
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.96-106
    • /
    • 2002
  • this paper, we propose a flexible, reusable, and extensible HL7 encoding and decoding framework using a Message Object Model (MOM) and Message Definition Repository (MDR). The MOM provides an abstract HL7 message form represented by a group of objects and their relationships. It reflects logical relationships among the standard HL7 message elements such as segments, fields, and components, while enforcing the key structural constraints imposed by the standard. Since the MOM completely eliminates the dependency of the HL7 encoder and decoder on platform-specific data formats, it makes it possible to build the encoder and decoder as reusable standalone software components, enabling the interconnection of arbitrary heterogeneous hospital information systems(HISs) with little effort. Moreover, the MDR, an external database of key definitions for HL7 messages, helps make the encoder and decoder as resilient as possible to future modifications of the standard HL7 message formats. It is also used by the encoder and decoder to perform a well formedness check for their respective inputs (i. e., HL7 message objects expressed in the MOM and encoded HL7 message strings). Although we implemented a prototype version of the encoder and decoder using JAVA, they can be easily packaged and delivered as standalone components using the standard component frameworks like ActiveX, JAVABEAN, or CORBA component.

Arabic Stock News Sentiments Using the Bidirectional Encoder Representations from Transformers Model

  • Eman Alasmari;Mohamed Hamdy;Khaled H. Alyoubi;Fahd Saleh Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.113-123
    • /
    • 2024
  • Stock market news sentiment analysis (SA) aims to identify the attitudes of the news of the stock on the official platforms toward companies' stocks. It supports making the right decision in investing or analysts' evaluation. However, the research on Arabic SA is limited compared to that on English SA due to the complexity and limited corpora of the Arabic language. This paper develops a model of sentiment classification to predict the polarity of Arabic stock news in microblogs. Also, it aims to extract the reasons which lead to polarity categorization as the main economic causes or aspects based on semantic unity. Therefore, this paper presents an Arabic SA approach based on the logistic regression model and the Bidirectional Encoder Representations from Transformers (BERT) model. The proposed model is used to classify articles as positive, negative, or neutral. It was trained on the basis of data collected from an official Saudi stock market article platform that was later preprocessed and labeled. Moreover, the economic reasons for the articles based on semantic unit, divided into seven economic aspects to highlight the polarity of the articles, were investigated. The supervised BERT model obtained 88% article classification accuracy based on SA, and the unsupervised mean Word2Vec encoder obtained 80% economic-aspect clustering accuracy. Predicting polarity classification on the Arabic stock market news and their economic reasons would provide valuable benefits to the stock SA field.