• Title/Summary/Keyword: convolutional networks

Search Result 708, Processing Time 0.032 seconds

Feature Extraction and Recognition of Myanmar Characters Based on Deep Learning (딥러닝 기반 미얀마 문자의 특징 추출 및 인식)

  • Ohnmar, Khin;Lee, Sung-Keun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.5
    • /
    • pp.977-984
    • /
    • 2022
  • Recently, with the economic development of Southeast Asia, the use of information devices is widely spreading, and the demand for application services using intelligent character recognition is increasing. This paper discusses deep learning-based feature extraction and recognition of Myanmar, one of the Southeast Asian countries. Myanmar alphabet (33 letters) and Myanmar numerals (10 numbers) are used for feature extraction. In this paper, the number of nine features are extracted and more than three new features are proposed. Extracted features of each characters and numbers are expressed with successful results. In the recognition part, convolutional neural networks are used to assess its execution on character distinction. Its algorithm is implemented on captured image data-sets and its implementation is evaluated. The precision of models on the input data set is 96 % and uses a real-time input image.

Lightweight multiple scale-patch dehazing network for real-world hazy image

  • Wang, Juan;Ding, Chang;Wu, Minghu;Liu, Yuanyuan;Chen, Guanhai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4420-4438
    • /
    • 2021
  • Image dehazing is an ill-posed problem which is far from being solved. Traditional image dehazing methods often yield mediocre effects and possess substandard processing speed, while modern deep learning methods perform best only in certain datasets. The haze removal effect when processed by said methods is unsatisfactory, meaning the generalization performance fails to meet the requirements. Concurrently, due to the limited processing speed, most dehazing algorithms cannot be employed in the industry. To alleviate said problems, a lightweight fast dehazing network based on a multiple scale-patch framework (MSP) is proposed in the present paper. Firstly, the multi-scale structure is employed as the backbone network and the multi-patch structure as the supplementary network. Dehazing through a single network causes problems, such as loss of object details and color in some image areas, the multi-patch structure was employed for MSP as an information supplement. In the algorithm image processing module, the image is segmented up and down for processed separately. Secondly, MSP generates a clear dehazing effect and significant robustness when targeting real-world homogeneous and nonhomogeneous hazy maps and different datasets. Compared with existing dehazing methods, MSP demonstrated a fast inference speed and the feasibility of real-time processing. The overall size and model parameters of the entire dehazing model are 20.75M and 6.8M, and the processing time for the single image is 0.026s. Experiments on NTIRE 2018 and NTIRE 2020 demonstrate that MSP can achieve superior performance among the state-of-the-art methods, such as PSNR, SSIM, LPIPS, and individual subjective evaluation.

Korean Text Image Super-Resolution for Improving Text Recognition Accuracy (텍스트 인식률 개선을 위한 한글 텍스트 이미지 초해상화)

  • Junhyeong Kwon;Nam Ik Cho
    • Journal of Broadcast Engineering
    • /
    • v.28 no.2
    • /
    • pp.178-184
    • /
    • 2023
  • Finding texts in general scene images and recognizing their contents is a very important task that can be used as a basis for robot vision, visual assistance, and so on. However, for the low-resolution text images, the degradations, such as noise or blur included in text images, are more noticeable, which leads to severe performance degradation of text recognition accuracy. In this paper, we propose a new Korean text image super-resolution based on a Transformer-based model, which generally shows higher performance than convolutional neural networks. In the experiments, we show that text recognition accuracy for Korean text images can be improved when our proposed text image super-resolution method is used. We also propose a new Korean text image dataset for training our model, which contains massive HR-LR Korean text image pairs.

Deep learning-based apical lesion segmentation from panoramic radiographs

  • Il-Seok, Song;Hak-Kyun, Shin;Ju-Hee, Kang;Jo-Eun, Kim;Kyung-Hoe, Huh;Won-Jin, Yi;Sam-Sun, Lee;Min-Suk, Heo
    • Imaging Science in Dentistry
    • /
    • v.52 no.4
    • /
    • pp.351-357
    • /
    • 2022
  • Purpose: Convolutional neural networks (CNNs) have rapidly emerged as one of the most promising artificial intelligence methods in the field of medical and dental research. CNNs can provide an effective diagnostic methodology allowing for the detection of early-staged diseases. Therefore, this study aimed to evaluate the performance of a deep CNN algorithm for apical lesion segmentation from panoramic radiographs. Materials and Methods: A total of 1000 panoramic images showing apical lesions were separated into training (n=800, 80%), validation (n=100, 10%), and test (n=100, 10%) datasets. The performance of identifying apical lesions was evaluated by calculating the precision, recall, and F1-score. Results: In the test group of 180 apical lesions, 147 lesions were segmented from panoramic radiographs with an intersection over union (IoU) threshold of 0.3. The F1-score values, as a measure of performance, were 0.828, 0.815, and 0.742, respectively, with IoU thresholds of 0.3, 0.4, and 0.5. Conclusion: This study showed the potential utility of a deep learning-guided approach for the segmentation of apical lesions. The deep CNN algorithm using U-Net demonstrated considerably high performance in detecting apical lesions.

A novel framework for correcting satellite-based precipitation products in Mekong river basin with discontinuous observed data

  • Xuan-Hien Le;Giang V. Nguyen;Sungho Jung;Giha Lee
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.173-173
    • /
    • 2023
  • The Mekong River Basin (MRB) is a crucial watershed in Asia, impacting over 60 million people across six developing nations. Accurate satellite-based precipitation products (SPPs) are essential for effective hydrological and watershed management in this region. However, the performance of SPPs has been varied and limited. The APHRODITE product, a unique gauge-based dataset for MRB, is widely used but is only available until 2015. In this study, we present a novel framework for correcting SPPs in the MRB by employing a deep learning approach that combines convolutional neural networks and encoder-decoder architecture to address pixel-by-pixel bias and enhance accuracy. The DLF was applied to four widely used SPPs (TRMM, CMORPH, CHIRPS, and PERSIANN-CDR) in MRB. For the original SPPs, the TRMM product outperformed the other SPPs. Results revealed that the DLF effectively bridged the spatial-temporal gap between the SPPs and the gauge-based dataset (APHRODITE). Among the four corrected products, ADJ-TRMM demonstrated the best performance, followed by ADJ-CDR, ADJ-CHIRPS, and ADJ-CMORPH. The DLF offered a robust and adaptable solution for bias correction in the MRB and beyond, capable of detecting intricate patterns and learning from data to make appropriate adjustments. With the discontinuation of the APHRODITE product, DLF represents a promising solution for generating a more current and reliable dataset for MRB research. This research showcased the potential of deep learning-based methods for improving the accuracy of SPPs, particularly in regions like the MRB, where gauge-based datasets are limited or discontinued.

  • PDF

A Dual-Structured Self-Attention for improving the Performance of Vision Transformers (비전 트랜스포머 성능향상을 위한 이중 구조 셀프 어텐션)

  • Kwang-Yeob Lee;Hwang-Hee Moon;Tae-Ryong Park
    • Journal of IKEEE
    • /
    • v.27 no.3
    • /
    • pp.251-257
    • /
    • 2023
  • In this paper, we propose a dual-structured self-attention method that improves the lack of regional features of the vision transformer's self-attention. Vision Transformers, which are more computationally efficient than convolutional neural networks in object classification, object segmentation, and video image recognition, lack the ability to extract regional features relatively. To solve this problem, many studies are conducted based on Windows or Shift Windows, but these methods weaken the advantages of self-attention-based transformers by increasing computational complexity using multiple levels of encoders. This paper proposes a dual-structure self-attention using self-attention and neighborhood network to improve locality inductive bias compared to the existing method. The neighborhood network for extracting local context information provides a much simpler computational complexity than the window structure. CIFAR-10 and CIFAR-100 were used to compare the performance of the proposed dual-structure self-attention transformer and the existing transformer, and the experiment showed improvements of 0.63% and 1.57% in Top-1 accuracy, respectively.

A study on training DenseNet-Recurrent Neural Network for sound event detection (음향 이벤트 검출을 위한 DenseNet-Recurrent Neural Network 학습 방법에 관한 연구)

  • Hyeonjin Cha;Sangwook Park
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.5
    • /
    • pp.395-401
    • /
    • 2023
  • Sound Event Detection (SED) aims to identify not only sound category but also time interval for target sounds in an audio waveform. It is a critical technique in field of acoustic surveillance system and monitoring system. Recently, various models have introduced through Detection and Classification of Acoustic Scenes and Events (DCASE) Task 4. This paper explored how to design optimal parameters of DenseNet based model, which has led to outstanding performance in other recognition system. In experiment, DenseRNN as an SED model consists of DensNet-BC and bi-directional Gated Recurrent Units (GRU). This model is trained with Mean teacher model. With an event-based f-score, evaluation is performed depending on parameters, related to model architecture as well as model training, under the assessment protocol of DCASE task4. Experimental result shows that the performance goes up and has been saturated to near the best. Also, DenseRNN would be trained more effectively without dropout technique.

Using machine learning to forecast and assess the uncertainty in the response of a typical PWR undergoing a steam generator tube rupture accident

  • Tran Canh Hai Nguyen ;Aya Diab
    • Nuclear Engineering and Technology
    • /
    • v.55 no.9
    • /
    • pp.3423-3440
    • /
    • 2023
  • In this work, a multivariate time-series machine learning meta-model is developed to predict the transient response of a typical nuclear power plant (NPP) undergoing a steam generator tube rupture (SGTR). The model employs Recurrent Neural Networks (RNNs), including the Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and a hybrid CNN-LSTM model. To address the uncertainty inherent in such predictions, a Bayesian Neural Network (BNN) was implemented. The models were trained using a database generated by the Best Estimate Plus Uncertainty (BEPU) methodology; coupling the thermal hydraulics code, RELAP5/SCDAP/MOD3.4 to the statistical tool, DAKOTA, to predict the variation in system response under various operational and phenomenological uncertainties. The RNN models successfully captures the underlying characteristics of the data with reasonable accuracy, and the BNN-LSTM approach offers an additional layer of insight into the level of uncertainty associated with the predictions. The results demonstrate that LSTM outperforms GRU, while the hybrid CNN-LSTM model is computationally the most efficient. This study aims to gain a better understanding of the capabilities and limitations of machine learning models in the context of nuclear safety. By expanding the application of ML models to more severe accident scenarios, where operators are under extreme stress and prone to errors, ML models can provide valuable support and act as expert systems to assist in decision-making while minimizing the chances of human error.

Comparative Analysis of Machine Learning Techniques for IoT Anomaly Detection Using the NSL-KDD Dataset

  • Zaryn, Good;Waleed, Farag;Xin-Wen, Wu;Soundararajan, Ezekiel;Maria, Balega;Franklin, May;Alicia, Deak
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.46-52
    • /
    • 2023
  • With billions of IoT (Internet of Things) devices populating various emerging applications across the world, detecting anomalies on these devices has become incredibly important. Advanced Intrusion Detection Systems (IDS) are trained to detect abnormal network traffic, and Machine Learning (ML) algorithms are used to create detection models. In this paper, the NSL-KDD dataset was adopted to comparatively study the performance and efficiency of IoT anomaly detection models. The dataset was developed for various research purposes and is especially useful for anomaly detection. This data was used with typical machine learning algorithms including eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Deep Convolutional Neural Networks (DCNN) to identify and classify any anomalies present within the IoT applications. Our research results show that the XGBoost algorithm outperformed both the SVM and DCNN algorithms achieving the highest accuracy. In our research, each algorithm was assessed based on accuracy, precision, recall, and F1 score. Furthermore, we obtained interesting results on the execution time taken for each algorithm when running the anomaly detection. Precisely, the XGBoost algorithm was 425.53% faster when compared to the SVM algorithm and 2,075.49% faster than the DCNN algorithm. According to our experimental testing, XGBoost is the most accurate and efficient method.

Contextual Modeling in Context-Aware Conversation Systems

  • Quoc-Dai Luong Tran;Dinh-Hong Vu;Anh-Cuong Le;Ashwin Ittoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.5
    • /
    • pp.1396-1412
    • /
    • 2023
  • Conversation modeling is an important and challenging task in the field of natural language processing because it is a key component promoting the development of automated humanmachine conversation. Most recent research concerning conversation modeling focuses only on the current utterance (considered as the current question) to generate a response, and thus fails to capture the conversation's logic from its beginning. Some studies concatenate the current question with previous conversation sentences and use it as input for response generation. Another approach is to use an encoder to store all previous utterances. Each time a new question is encountered, the encoder is updated and used to generate the response. Our approach in this paper differs from previous studies in that we explicitly separate the encoding of the question from the encoding of its context. This results in different encoding models for the question and the context, capturing the specificity of each. In this way, we have access to the entire context when generating the response. To this end, we propose a deep neural network-based model, called the Context Model, to encode previous utterances' information and combine it with the current question. This approach satisfies the need for context information while keeping the different roles of the current question and its context separate while generating a response. We investigate two approaches for representing the context: Long short-term memory and Convolutional neural network. Experiments show that our Context Model outperforms a baseline model on both ConvAI2 Dataset and a collected dataset of conversational English.