• Title/Summary/Keyword: Training Image

Search Result 1,390, Processing Time 0.034 seconds

Revolutionizing Traffic Sign Recognition with YOLOv9 and CNNs

  • Muteb Alshammari;Aadil Alshammari
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.8
    • /
    • pp.14-20
    • /
    • 2024
  • Traffic sign recognition is an essential feature of intelligent transportation systems and Advanced Driver Assistance Systems (ADAS), which are necessary for improving road safety and advancing the development of autonomous cars. This research investigates the incorporation of the YOLOv9 model into traffic sign recognition systems, utilizing its sophisticated functionalities such as Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN) to tackle enduring difficulties in object detection. We employed a publically accessible dataset obtained from Roboflow, which consisted of 3130 images classified into five distinct categories: speed_40, speed_60, stop, green, and red. The dataset was separated into training (68%), validation (21%), and testing (12%) subsets in a methodical manner to ensure a thorough examination. Our comprehensive trials have shown that YOLOv9 obtains a mean Average Precision (mAP@0.5) of 0.959, suggesting exceptional precision and recall for the majority of traffic sign classes. However, there is still potential for improvement specifically in the red traffic sign class. An analysis was conducted on the distribution of instances among different traffic sign categories and the differences in size within the dataset. This analysis aimed to guarantee that the model would perform well in real-world circumstances. The findings validate that YOLOv9 substantially improves the precision and dependability of traffic sign identification, establishing it as a dependable option for implementation in intelligent transportation systems and ADAS. The incorporation of YOLOv9 in real-world traffic sign recognition and classification tasks demonstrates its promise in making roadways safer and more efficient.

Deep Learning-Based Lumen and Vessel Segmentation of Intravascular Ultrasound Images in Coronary Artery Disease

  • Gyu-Jun Jeong;Gaeun Lee;June-Goo Lee;Soo-Jin Kang
    • Korean Circulation Journal
    • /
    • v.54 no.1
    • /
    • pp.30-39
    • /
    • 2024
  • Background and Objectives: Intravascular ultrasound (IVUS) evaluation of coronary artery morphology is based on the lumen and vessel segmentation. This study aimed to develop an automatic segmentation algorithm and validate the performances for measuring quantitative IVUS parameters. Methods: A total of 1,063 patients were randomly assigned, with a ratio of 4:1 to the training and test sets. The independent data set of 111 IVUS pullbacks was obtained to assess the vessel-level performance. The lumen and external elastic membrane (EEM) boundaries were labeled manually in every IVUS frame with a 0.2-mm interval. The Efficient-UNet was utilized for the automatic segmentation of IVUS images. Results: At the frame-level, Efficient-UNet showed a high dice similarity coefficient (DSC, 0.93±0.05) and Jaccard index (JI, 0.87±0.08) for lumen segmentation, and demonstrated a high DSC (0.97±0.03) and JI (0.94±0.04) for EEM segmentation. At the vessel-level, there were close correlations between model-derived vs. experts-measured IVUS parameters; minimal lumen image area (r=0.92), EEM area (r=0.88), lumen volume (r=0.99) and plaque volume (r=0.95). The agreement between model-derived vs. expert-measured minimal lumen area was similarly excellent compared to the experts' agreement. The model-based lumen and EEM segmentation for a 20-mm lesion segment required 13.2 seconds, whereas manual segmentation with a 0.2-mm interval by an expert took 187.5 minutes on average. Conclusions: The deep learning models can accurately and quickly delineate vascular geometry. The artificial intelligence-based methodology may support clinicians' decision-making by real-time application in the catheterization laboratory.

Research on damage detection and assessment of civil engineering structures based on DeepLabV3+ deep learning model

  • Chengyan Song
    • Structural Engineering and Mechanics
    • /
    • v.91 no.5
    • /
    • pp.443-457
    • /
    • 2024
  • At present, the traditional concrete surface inspection methods based on artificial vision have the problems of high cost and insecurity, while the computer vision methods rely on artificial selection features in the case of sensitive environmental changes and difficult promotion. In order to solve these problems, this paper introduces deep learning technology in the field of computer vision to achieve automatic feature extraction of structural damage, with excellent detection speed and strong generalization ability. The main contents of this study are as follows: (1) A method based on DeepLabV3+ convolutional neural network model is proposed for surface detection of post-earthquake structural damage, including surface damage such as concrete cracks, spaling and exposed steel bars. The key semantic information is extracted by different backbone networks, and the data sets containing various surface damage are trained, tested and evaluated. The intersection ratios of 54.4%, 44.2%, and 89.9% in the test set demonstrate the network's capability to accurately identify different types of structural surface damages in pixel-level segmentation, highlighting its effectiveness in varied testing scenarios. (2) A semantic segmentation model based on DeepLabV3+ convolutional neural network is proposed for the detection and evaluation of post-earthquake structural components. Using a dataset that includes building structural components and their damage degrees for training, testing, and evaluation, semantic segmentation detection accuracies were recorded at 98.5% and 56.9%. To provide a comprehensive assessment that considers both false positives and false negatives, the Mean Intersection over Union (Mean IoU) was employed as the primary evaluation metric. This choice ensures that the network's performance in detecting and evaluating pixel-level damage in post-earthquake structural components is evaluated uniformly across all experiments. By incorporating deep learning technology, this study not only offers an innovative solution for accurately identifying post-earthquake damage in civil engineering structures but also contributes significantly to empirical research in automated detection and evaluation within the field of structural health monitoring.

Research on depth information based object-tracking and stage size estimation for immersive audio panning (이머시브 오디오 패닝을 위한 깊이 정보 기반 객체 추적 및 무대 크기 예측에 관한 연구)

  • Kangeun Lee;Hongjun Park;Sungyoung Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.5
    • /
    • pp.529-535
    • /
    • 2024
  • This paper presents our research on automatic audio panning for media content production. Previously, tracking an audio was done manually. With the advent of the immersive audio era, the need for an automatic audio panning system has increased, yet no substantial research has been progressed to date. Therefore, we propose a computer vision-based human tracking and depth feature processing system which processes depth feature through using 2-dimensional coordinates and models 3-dimensional view transformation for automatic audio panning to ensure audiovisual congruence. Also, this system applies stage size estimation model which gets input as an image and extrapolates stage width and depth as meter unit. Since our system estimates stage sizes and directly applies them to view transformation, no additional depth data training is required. To validate the proposed system, we also conducted a pilot test with Unity based sample video. Our team expects that our system will enable automated audio panning, assisting many audio engineers.

Deep Learning-Based Plant Health State Classification Using Image Data (영상 데이터를 이용한 딥러닝 기반 작물 건강 상태 분류 연구)

  • Ali Asgher Syed;Jaehawn Lee;Alvaro Fuentes;Sook Yoon;Dong Sun Park
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.4
    • /
    • pp.43-53
    • /
    • 2024
  • Tomatoes are rich in nutrients like lycopene, β-carotene, and vitamin C. However, they often suffer from biological and environmental stressors, resulting in significant yield losses. Traditional manual plant health assessments are error-prone and inefficient for large-scale production. To address this need, we collected a comprehensive dataset covering the entire life span of tomato plants, annotated across 5 health states from 1 to 5. Our study introduces an Attention-Enhanced DS-ResNet architecture with Channel-wise attention and Grouped convolution, refined with new training techniques. Our model achieved an overall accuracy of 80.2% using 5-fold cross-validation, showcasing its robustness in precisely classifying the health states of tomato plants.

Recognizing the Direction of Action using Generalized 4D Features (일반화된 4차원 특징을 이용한 행동 방향 인식)

  • Kim, Sun-Jung;Kim, Soo-Wan;Choi, Jin-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.518-528
    • /
    • 2014
  • In this paper, we propose a method to recognize the action direction of human by developing 4D space-time (4D-ST, [x,y,z,t]) features. For this, we propose 4D space-time interest points (4D-STIPs, [x,y,z,t]) which are extracted using 3D space (3D-S, [x,y,z]) volumes reconstructed from images of a finite number of different views. Since the proposed features are constructed using volumetric information, the features for arbitrary 2D space (2D-S, [x,y]) viewpoint can be generated by projecting the 3D-S volumes and 4D-STIPs on corresponding image planes in training step. We can recognize the directions of actors in the test video since our training sets, which are projections of 3D-S volumes and 4D-STIPs to various image planes, contain the direction information. The process for recognizing action direction is divided into two steps, firstly we recognize the class of actions and then recognize the action direction using direction information. For the action and direction of action recognition, with the projected 3D-S volumes and 4D-STIPs we construct motion history images (MHIs) and non-motion history images (NMHIs) which encode the moving and non-moving parts of an action respectively. For the action recognition, features are trained by support vector data description (SVDD) according to the action class and recognized by support vector domain density description (SVDDD). For the action direction recognition after recognizing actions, each actions are trained using SVDD according to the direction class and then recognized by SVDDD. In experiments, we train the models using 3D-S volumes from INRIA Xmas Motion Acquisition Sequences (IXMAS) dataset and recognize action direction by constructing a new SNU dataset made for evaluating the action direction recognition.

Comparison of Convolutional Neural Network (CNN) Models for Lettuce Leaf Width and Length Prediction (상추잎 너비와 길이 예측을 위한 합성곱 신경망 모델 비교)

  • Ji Su Song;Dong Suk Kim;Hyo Sung Kim;Eun Ji Jung;Hyun Jung Hwang;Jaesung Park
    • Journal of Bio-Environment Control
    • /
    • v.32 no.4
    • /
    • pp.434-441
    • /
    • 2023
  • Determining the size or area of a plant's leaves is an important factor in predicting plant growth and improving the productivity of indoor farms. In this study, we developed a convolutional neural network (CNN)-based model to accurately predict the length and width of lettuce leaves using photographs of the leaves. A callback function was applied to overcome data limitations and overfitting problems, and K-fold cross-validation was used to improve the generalization ability of the model. In addition, ImageDataGenerator function was used to increase the diversity of training data through data augmentation. To compare model performance, we evaluated pre-trained models such as VGG16, Resnet152, and NASNetMobile. As a result, NASNetMobile showed the highest performance, especially in width prediction, with an R_squared value of 0.9436, and RMSE of 0.5659. In length prediction, the R_squared value was 0.9537, and RMSE of 0.8713. The optimized model adopted the NASNetMobile architecture, the RMSprop optimization tool, the MSE loss functions, and the ELU activation functions. The training time of the model averaged 73 minutes per Epoch, and it took the model an average of 0.29 seconds to process a single lettuce leaf photo. In this study, we developed a CNN-based model to predict the leaf length and leaf width of plants in indoor farms, which is expected to enable rapid and accurate assessment of plant growth status by simply taking images. It is also expected to contribute to increasing the productivity and resource efficiency of farms by taking appropriate agricultural measures such as adjusting nutrient solution in real time.

Very short-term rainfall prediction based on radar image learning using deep neural network (심층신경망을 이용한 레이더 영상 학습 기반 초단시간 강우예측)

  • Yoon, Seongsim;Park, Heeseong;Shin, Hongjoon
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.12
    • /
    • pp.1159-1172
    • /
    • 2020
  • This study applied deep convolution neural network based on U-Net and SegNet using long period weather radar data to very short-term rainfall prediction. And the results were compared and evaluated with the translation model. For training and validation of deep neural network, Mt. Gwanak and Mt. Gwangdeoksan radar data were collected from 2010 to 2016 and converted to a gray-scale image file in an HDF5 format with a 1km spatial resolution. The deep neural network model was trained to predict precipitation after 10 minutes by using the four consecutive radar image data, and the recursive method of repeating forecasts was applied to carry out lead time 60 minutes with the pretrained deep neural network model. To evaluate the performance of deep neural network prediction model, 24 rain cases in 2017 were forecast for rainfall up to 60 minutes in advance. As a result of evaluating the predicted performance by calculating the mean absolute error (MAE) and critical success index (CSI) at the threshold of 0.1, 1, and 5 mm/hr, the deep neural network model showed better performance in the case of rainfall threshold of 0.1, 1 mm/hr in terms of MAE, and showed better performance than the translation model for lead time 50 minutes in terms of CSI. In particular, although the deep neural network prediction model performed generally better than the translation model for weak rainfall of 5 mm/hr or less, the deep neural network prediction model had limitations in predicting distinct precipitation characteristics of high intensity as a result of the evaluation of threshold of 5 mm/hr. The longer lead time, the spatial smoothness increase with lead time thereby reducing the accuracy of rainfall prediction The translation model turned out to be superior in predicting the exceedance of higher intensity thresholds (> 5 mm/hr) because it preserves distinct precipitation characteristics, but the rainfall position tends to shift incorrectly. This study are expected to be helpful for the improvement of radar rainfall prediction model using deep neural networks in the future. In addition, the massive weather radar data established in this study will be provided through open repositories for future use in subsequent studies.

A Control Method for designing Object Interactions in 3D Game (3차원 게임에서 객체들의 상호 작용을 디자인하기 위한 제어 기법)

  • 김기현;김상욱
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.3
    • /
    • pp.322-331
    • /
    • 2003
  • As the complexity of a 3D game is increased by various factors of the game scenario, it has a problem for controlling the interrelation of the game objects. Therefore, a game system has a necessity of the coordination of the responses of the game objects. Also, it is necessary to control the behaviors of animations of the game objects in terms of the game scenario. To produce realistic game simulations, a system has to include a structure for designing the interactions among the game objects. This paper presents a method that designs the dynamic control mechanism for the interaction of the game objects in the game scenario. For the method, we suggest a game agent system as a framework that is based on intelligent agents who can make decisions using specific rules. Game agent systems are used in order to manage environment data, to simulate the game objects, to control interactions among game objects, and to support visual authoring interface that ran define a various interrelations of the game objects. These techniques can process the autonomy level of the game objects and the associated collision avoidance method, etc. Also, it is possible to make the coherent decision-making ability of the game objects about a change of the scene. In this paper, the rule-based behavior control was designed to guide the simulation of the game objects. The rules are pre-defined by the user using visual interface for designing their interaction. The Agent State Decision Network, which is composed of the visual elements, is able to pass the information and infers the current state of the game objects. All of such methods can monitor and check a variation of motion state between game objects in real time. Finally, we present a validation of the control method together with a simple case-study example. In this paper, we design and implement the supervised classification systems for high resolution satellite images. The systems support various interfaces and statistical data of training samples so that we can select the most effective training data. In addition, the efficient extension of new classification algorithms and satellite image formats are applied easily through the modularized systems. The classifiers are considered the characteristics of spectral bands from the selected training data. They provide various supervised classification algorithms which include Parallelepiped, Minimum distance, Mahalanobis distance, Maximum likelihood and Fuzzy theory. We used IKONOS images for the input and verified the systems for the classification of high resolution satellite images.

A Comparative Study on the Effective Deep Learning for Fingerprint Recognition with Scar and Wrinkle (상처와 주름이 있는 지문 판별에 효율적인 심층 학습 비교연구)

  • Kim, JunSeob;Rim, BeanBonyka;Sung, Nak-Jun;Hong, Min
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.17-23
    • /
    • 2020
  • Biometric information indicating measurement items related to human characteristics has attracted great attention as security technology with high reliability since there is no fear of theft or loss. Among these biometric information, fingerprints are mainly used in fields such as identity verification and identification. If there is a problem such as a wound, wrinkle, or moisture that is difficult to authenticate to the fingerprint image when identifying the identity, the fingerprint expert can identify the problem with the fingerprint directly through the preprocessing step, and apply the image processing algorithm appropriate to the problem. Solve the problem. In this case, by implementing artificial intelligence software that distinguishes fingerprint images with cuts and wrinkles on the fingerprint, it is easy to check whether there are cuts or wrinkles, and by selecting an appropriate algorithm, the fingerprint image can be easily improved. In this study, we developed a total of 17,080 fingerprint databases by acquiring all finger prints of 1,010 students from the Royal University of Cambodia, 600 Sokoto open data sets, and 98 Korean students. In order to determine if there are any injuries or wrinkles in the built database, criteria were established, and the data were validated by experts. The training and test datasets consisted of Cambodian data and Sokoto data, and the ratio was set to 8: 2. The data of 98 Korean students were set up as a validation data set. Using the constructed data set, five CNN-based architectures such as Classic CNN, AlexNet, VGG-16, Resnet50, and Yolo v3 were implemented. A study was conducted to find the model that performed best on the readings. Among the five architectures, ResNet50 showed the best performance with 81.51%.