• Title/Summary/Keyword: Improved deep learning

Search Result 548, Processing Time 0.027 seconds

Recognition of Dog Breeds based on Deep Learning using a Random-Label and Web Image Mining (웹 이미지 마이닝과 랜덤 레이블을 이용한 딥러닝 기반 개 품종 인식)

  • Kang, Min-Seok;Hong, Kwang-Seok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.201-202
    • /
    • 2018
  • In this paper, a dog breed image provided by Dataset of existing ImageNet and Oxford-IIIT Pet Image is combined with a dog breed image obtained through data mining on Internet and a random-label is added. this paper introduces to recognize 122 classes of dog breeds and 1 class that is not dog breeds. The recognition rate of dog breeds using both conventional DB and collection DB was improved 1.5% over Top-1 compared to recognition rate of dog breeds using only existing DB. The image recognition rate about non-dog image, was 93% recognition rate in case of 10000 random DBs.

  • PDF

Deep Learning Based Object Recognition in Spherical Panoramic Image (구면 파노라마 영상에서의 딥러닝 기반 객체 인식)

  • Jung, Minsuk;Park, Jong-Seung
    • Journal of Korea Game Society
    • /
    • v.18 no.5
    • /
    • pp.5-14
    • /
    • 2018
  • A lot of research has been done on image recognition technique for planar images and the performance has also been improved. However, it is difficult to recognize objects in spherical panoramic images or images in special form which are given in various environments because of the spherical distortion given in different form from the planar case. In this paper, we show that the neural network recognition approach can be used for object recognition in spherical image and suggest a method of using cubemap transform in order to increase recognition accuracy in spherical image.

Reinforced Feature of Dynamic Search Area for the Discriminative Model Prediction Tracker based on Multi-domain Dataset (다중 도메인 데이터 기반 구별적 모델 예측 트레커를 위한 동적 탐색 영역 특징 강화 기법)

  • Lee, Jun Ha;Won, Hong-In;Kim, Byeong Hak
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.6
    • /
    • pp.323-330
    • /
    • 2021
  • Visual object tracking is a challenging area of study in the field of computer vision due to many difficult problems, including a fast variation of target shape, occlusion, and arbitrary ground truth object designation. In this paper, we focus on the reinforced feature of the dynamic search area to get better performance than conventional discriminative model prediction trackers on the condition when the accuracy deteriorates since low feature discrimination. We propose a reinforced input feature method shown like the spotlight effect on the dynamic search area of the target tracking. This method can be used to improve performances for deep learning based discriminative model prediction tracker, also various types of trackers which are used to infer the center of the target based on the visual object tracking. The proposed method shows the improved tracking performance than the baseline trackers, achieving a relative gain of 38% quantitative improvement from 0.433 to 0.601 F-score at the visual object tracking evaluation.

Diagnosis of Parkinson's disease based on audio voice using wav2vec (Wav2vec을 이용한 오디오 음성 기반의 파킨슨병 진단)

  • Yoon, Hee-Jin
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.353-358
    • /
    • 2021
  • Parkinson's disease is the second most common degenerative brain disease after Alzheimer's in old age. Symptoms of Parkinson's disease are factors that reduce the quality of life in daily life, such as shaking hands, slowing behavior and cognitive function. Parkinson's disease that can slow the progression of the disease through early diagnosis. To diagnoze Parkinson's disease early, an algorithm was implemented to extract features using wav2vec and to diagnose the presence or absence of Parkinson's disease with deep learning(ANN). As a results of the experiment, the accuracy was 97.47%. It was better than the results of diagnosing Parkinson's disease using the existing neural network. The audio voice file could simply reduce the experiment process and obtain improved results.

Integral Regression Network for Facial Landmark Detection (얼굴 특징점 검출을 위한 적분 회귀 네트워크)

  • Kim, Do Yeop;Chang, Ju Yong
    • Journal of Broadcast Engineering
    • /
    • v.24 no.4
    • /
    • pp.564-572
    • /
    • 2019
  • With the development of deep learning, the performance of facial landmark detection methods has been greatly improved. The heat map regression method, which is a representative facial landmark detection method, is widely used as an efficient and robust method. However, the landmark coordinates cannot be directly obtained through a single network, and the accuracy is reduced in determining the landmark coordinates from the heat map. To solve these problems, we propose to combine integral regression with the existing heat map regression method. Through experiments using various datasets, we show that the proposed integral regression network significantly improves the performance of facial landmark detection.

A Comparative Study on OCR using Super-Resolution for Small Fonts

  • Cho, Wooyeong;Kwon, Juwon;Kwon, Soonchu;Yoo, Jisang
    • International journal of advanced smart convergence
    • /
    • v.8 no.3
    • /
    • pp.95-101
    • /
    • 2019
  • Recently, there have been many issues related to text recognition using Tesseract. One of these issues is that the text recognition accuracy is significantly lower for smaller fonts. Tesseract extracts text by creating an outline with direction in the image. By searching the Tesseract database, template matching with characters with similar feature points is used to select the character with the lowest error. Because of the poor text extraction, the recognition accuracy is lowerd. In this paper, we compared text recognition accuracy after applying various super-resolution methods to smaller text images and experimented with how the recognition accuracy varies for various image size. In order to recognize small Korean text images, we have used super-resolution algorithms based on deep learning models such as SRCNN, ESRCNN, DSRCNN, and DCSCN. The dataset for training and testing consisted of Korean-based scanned images. The images was resized from 0.5 times to 0.8 times with 12pt font size. The experiment was performed on x0.5 resized images, and the experimental result showed that DCSCN super-resolution is the most efficient method to reduce precision error rate by 7.8%, and reduce the recall error rate by 8.4%. The experimental results have demonstrated that the accuracy of text recognition for smaller Korean fonts can be improved by adding super-resolution methods to the OCR preprocessing module.

Development of Exhibits Preference Analysis Method using Deep Learning for Science Museum (딥러닝을 활용한 과학관 전시품 선호도 분석 방법 개발)

  • Yu, Jun Sang;Kang, Bo-Yeong
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.40-50
    • /
    • 2021
  • Science museum are dealing with exhibits on field of changing science and technology, and previous research suggested that exhibits replacement should carried out at least every 5 years. In order to efficiently replace exhibits within a limited budget, various studies analyzed visitors' preferences to exhibits. Recently, studies use various technologies to collect the data on visitors' preferences automatically, but almost of studies had a high dependency on their visitors such as visitors needed to carry specific sub-devices in the museums for gathering data. As complementing the limitations of previous research, this study introduces the improved method which is able to automatically collect and quantify visitors' preferences to exhibits using TensorFlow, a deep learning technology. By the proposed analysis method, it was possible to collect 2,520 data of visitors' experience on exhibits in totality. Based on collected data, attraction power and holding power indicating the preference of visitors on exhibits were able to be calculated. The result also confirmed antecedent research conclusion that the attraction power and holding power of the exhibit which consists of 3 dimensional structures work are higher than other exhibits. As a conclusion, the proposed method will provide more convenient data collection method for detecting visitors' preference.

A Case Study of Creative Art Based on AI Generation Technology

  • Qianqian Jiang;Jeanhun Chung
    • International journal of advanced smart convergence
    • /
    • v.12 no.2
    • /
    • pp.84-89
    • /
    • 2023
  • In recent years, with the breakthrough of Artificial Intelligence (AI) technology in deep learning algorithms such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAE), AI generation technology has rapidly expanded in various sub-sectors in the art field. 2022 as the explosive year of AI-generated art, especially in the creation of AI-generated art creative design, many excellent works have been born, which has improved the work efficiency of art design. This study analyzed the application design characteristics of AI generation technology in two sub fields of artistic creative design of AI painting and AI animation production , and compares the differences between traditional painting and AI painting in the field of painting. Through the research of this paper, the advantages and problems in the process of AI creative design are summarized. Although AI art designs are affected by technical limitations, there are still flaws in artworks and practical problems such as copyright and income, but it provides a strong technical guarantee in the expansion of subdivisions of artistic innovation and technology integration, and has extremely high research value.

Deep-learning-based gestational sac detection in ultrasound images using modified YOLOv7-E6E model

  • Tae-kyeong Kim;Jin Soo Kim;Hyun-chong Cho
    • Journal of Animal Science and Technology
    • /
    • v.65 no.3
    • /
    • pp.627-637
    • /
    • 2023
  • As the population and income levels rise, meat consumption steadily increases annually. However, the number of farms and farmers producing meat decrease during the same period, reducing meat sufficiency. Information and Communications Technology (ICT) has begun to be applied to reduce labor and production costs of livestock farms and improve productivity. This technology can be used for rapid pregnancy diagnosis of sows; the location and size of the gestation sacs of sows are directly related to the productivity of the farm. In this study, a system proposes to determine the number of gestation sacs of sows from ultrasound images. The system used the YOLOv7-E6E model, changing the activation function from sigmoid-weighted linear unit (SiLU) to a multi-activation function (SiLU + Mish). Also, the upsampling method was modified from nearest to bicubic to improve performance. The model trained with the original model using the original data achieved mean average precision of 86.3%. When the proposed multi-activation function, upsampling, and AutoAugment were applied, the performance improved by 0.3%, 0.9%, and 0.9%, respectively. When all three proposed methods were simultaneously applied, a significant performance improvement of 3.5% to 89.8% was achieved.

Study on 2D Sprite *3.Generation Using the Impersonator Network

  • Yongjun Choi;Beomjoo Seo;Shinjin Kang;Jongin Choi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1794-1806
    • /
    • 2023
  • This study presents a method for capturing photographs of users as input and converting them into 2D character animation sprites using a generative adversarial network-based artificial intelligence network. Traditionally, 2D character animations have been created by manually creating an entire sequence of sprite images, which incurs high development costs. To address this issue, this study proposes a technique that combines motion videos and sample 2D images. In the 2D sprite generation process that uses the proposed technique, a sequence of images is extracted from real-life images captured by the user, and these are combined with character images from within the game. Our research aims to leverage cutting-edge deep learning-based image manipulation techniques, such as the GAN-based motion transfer network (impersonator) and background noise removal (U2 -Net), to generate a sequence of animation sprites from a single image. The proposed technique enables the creation of diverse animations and motions just one image. By utilizing these advancements, we focus on enhancing productivity in the game and animation industry through improved efficiency and streamlined production processes. By employing state-of-the-art techniques, our research enables the generation of 2D sprite images with various motions, offering significant potential for boosting productivity and creativity in the industry.