• Title/Summary/Keyword: Vision processing

Search Result 1,545, Processing Time 0.034 seconds

Structure, Method, and Improved Performance Evaluation Function of SRCNN and VDSR (SRCNN과 VDSR의 구조와 방법 및 개선된 성능평가 함수)

  • Lee, Kwang-Chan;Wang, Guangxing;Shin, Seong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.4
    • /
    • pp.543-548
    • /
    • 2021
  • The higher the resolution of the image, the higher the satisfaction of the viewers of the image, and the super-resolution imaging has a considerable increase in research value among the fields of computer vision and image processing. In this study, the main features of low-resolution image LR are extracted mainly using deep learning super-resolution models. It learns and reconstructs the extracted features, and focuses on reconstruction-based algorithms that generate high-resolution image HR. In this paper, we investigate SRCNN and VDSR in a super-resolution algorithm model based on reconstruction. The structure and algorithm process of the SRCNN and VDSR model are briefly introduced, and the multi-channel and special form are also examined in the improved performance evaluation function, and understand the performance of each algorithm through experiments. In the experiment, an experiment was performed to compare the results of the SRCNN and VDSR models with the peak signal-to-noise ratio and image structure similarity, so that the results can be easily judged.

Defect Diagnosis and Classification of Machine Parts Based on Deep Learning

  • Kim, Hyun-Tae;Lee, Sang-Hyeop;Wesonga, Sheilla;Park, Jang-Sik
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.25 no.2_1
    • /
    • pp.177-184
    • /
    • 2022
  • The automatic defect sorting function of machinery parts is being introduced to the automation of the manufacturing process. In the final stage of automation of the manufacturing process, it is necessary to apply computer vision rather than human visual judgment to determine whether there is a defect. In this paper, we introduce a deep learning method to improve the classification performance of typical mechanical parts, such as welding parts, galvanized round plugs, and electro galvanized nuts, based on the results of experiments. In the case of poor welding, the method to further increase the depth of layer of the basic deep learning model was effective, and in the case of a circular plug, the surrounding data outside the defective target area affected it, so it could be solved through an appropriate pre-processing technique. Finally, in the case of a nut plated with zinc, since it receives data from multiple cameras due to its three-dimensional structure, it is greatly affected by lighting and has a problem in that it also affects the background image. To solve this problem, methods such as two-dimensional connectivity were applied in the object segmentation preprocessing process. Although the experiments suggested that the proposed methods are effective, most of the provided good/defective images data sets are relatively small, which may cause a learning balance problem of the deep learning model, so we plan to secure more data in the future.

Manipulator with Camera for Mobile Robots (모바일 로봇을 위한 카메라 탑재 매니퓰레이터)

  • Lee Jun-Woo;Choe, Kyoung-Geun;Cho, Hun-Hee;Jeong, Seong-Kyun;Bong, Jae-Hwan
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.3
    • /
    • pp.507-514
    • /
    • 2022
  • Mobile manipulators are getting lime light in the field of home automation due to their mobility and manipulation capabilities. In this paper, we developed a small size manipulator system that can be mounted on a mobile robot as a preliminary study to develop a mobile manipulator. The developed manipulator has four degree-of-freedom. At the end-effector of manipulator, there are a camera and a gripper to recognize and manipulate the object. One of four degree-of-freedom is linear motion in vertical direction for better interaction with human hands which are located higher than the mobile manipulator. The developed manipulator was designed to dispose the four actuators close to the base of the manipulator to reduce rotational inertia of the manipulator, which improves stability of manipulation and reduces the risk of rollover. The developed manipulator repeatedly performed a pick and place task and successfully manipulate the object within the workspace of manipulator.

A Study on Design and Interpretation of Pattern Laser Coordinate Tracking Method for Curved Screen Using Multiple Cameras (다중카메라를 이용한 곡면 스크린의 패턴 레이저 좌표 추적 방법 설계와 해석 연구)

  • Jo, Jinpyo;Kim, Jeongho;Jeong, Yongbae
    • Journal of Platform Technology
    • /
    • v.9 no.4
    • /
    • pp.60-70
    • /
    • 2021
  • This paper proposes a method capable of stably tracking the coordinates of a patterned laser image in a curved screen shooting system using two or more channels of multiple cameras. This method can track and acquire target points very effectively when applied to a multi-screen shooting method that can replace the HMD shooting method. Images of curved screens with severe deformation obtained from individual cameras are corrected through image normalization, image binarization, and noise removal. This corrected image is created and applied as an Euclidean space map that is easy to track the firing point based on the matching point. As a result of the experiment, the image coordinates of the pattern laser were stably extracted in the curved screen shooting system, and the error of the target point position of the real-world coordinate position and the broadband Euclidean map was minimized. The reliability of the proposed method was confirmed through the experiment.

Assessment and Comparison of Three Dimensional Exoscopes for Near-Infrared Fluorescence-Guided Surgery Using Second-Window Indocyanine-Green

  • Cho, Steve S.;Teng, Clare W.;Ravin, Emma De;Singh, Yash B.;Lee, John Y.K.
    • Journal of Korean Neurosurgical Society
    • /
    • v.65 no.4
    • /
    • pp.572-581
    • /
    • 2022
  • Objective : Compared to microscopes, exoscopes have advantages in field-depth, ergonomics, and educational value. Exoscopes are especially well-poised for adaptation into fluorescence-guided surgery (FGS) due to their excitation source, light path, and image processing capabilities. We evaluated the feasibility of near-infrared FGS using a 3-dimensional (3D), 4 K exoscope with near-infrared fluorescence imaging capability. We then compared it to the most sensitive, commercially-available near-infrared exoscope system (3D and 960 p). In-vitro and intraoperative comparisons were performed. Methods : Serial dilutions of indocyanine-green (1-2000 ㎍/mL) were imaged with the 3D, 4 K Olympus Orbeye (system 1) and the 3D, 960 p VisionSense Iridium (system 2). Near-infrared sensitivity was calculated using signal-to-background ratios (SBRs). In addition, three patients with brain tumors were administered indocyanine-green and imaged with system 1, with two also imaged with system 2 for comparison. Results : Systems 1 and 2 detected near-infrared fluorescence from indocyanine green concentrations of >250 ㎍/L and >31.3 ㎍/L, respectively. Intraoperatively, system 1 visualized strong near-infrared fluorescence from two, strongly gadolinium-enhancing meningiomas (SBR=2.4, 1.7). The high-resolution, bright images were sufficient for the surgeon to appreciate the underlying anatomy in the near-infrared mode. However, system 1 was not able to visualize fluorescence from a weakly-enhancing intraparenchymal metastasis. In contrast, system 2 successfully visualized both the meningioma and the metastasis but lacked high resolution stereopsis. Conclusion : Three-dimensional exoscope systems provide an alternative visualization platform for both standard microsurgery and near-infrared fluorescent guided surgery. However, when tumor fluorescence is weak (i.e., low fluorophore uptake, deep tumors), highly sensitive near-infrared visualization systems may be required.

Development of an intelligent camera for multiple body temperature detection (다중 체온 감지용 지능형 카메라 개발)

  • Lee, Su-In;Kim, Yun-Su;Seok, Jong-Won
    • Journal of IKEEE
    • /
    • v.26 no.3
    • /
    • pp.430-436
    • /
    • 2022
  • In this paper, we propose an intelligent camera for multiple body temperature detection. The proposed camera is composed of optical(4056*3040) and thermal(640*480), which detects abnormal symptoms by analyzing a person's facial expression and body temperature from the acquired image. The optical and thermal imaging cameras are operated simultaneously and detect an object in the optical image, in which the facial region and expression analysis are calculated from the object. Additionally, the calculated coordinate values from the optical image facial region are applied to the thermal image, also the maximum temperature is measured from the region and displayed on the screen. Abnormal symptom detection is determined by using the analyzed three facial expressions(neutral, happy, sadness) and body temperature values. In order to evaluate the performance of the proposed camera, the optical image processing part is tested on Caltech, WIDER FACE, and CK+ datasets for three algorithms(object detection, facial region detection, and expression analysis). Experimental results have shown 91%, 91%, and 84% accuracy scores each.

Single Shot Detector for Detecting Clickable Object in Mobile Device Screen (모바일 디바이스 화면의 클릭 가능한 객체 탐지를 위한 싱글 샷 디텍터)

  • Jo, Min-Seok;Chun, Hye-won;Han, Seong-Soo;Jeong, Chang-Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.1
    • /
    • pp.29-34
    • /
    • 2022
  • We propose a novel network architecture and build dataset for recognizing clickable objects on mobile device screens. The data was collected based on clickable objects on the mobile device screen that have numerous resolution, and a total of 24,937 annotation data were subdivided into seven categories: text, edit text, image, button, region, status bar, and navigation bar. We use the Deconvolution Single Shot Detector as a baseline, the backbone network with Squeeze-and-Excitation blocks, the Single Shot Detector layer structure to derive inference results and the Feature pyramid networks structure. Also we efficiently extract features by changing the input resolution of the existing 1:1 ratio of the network to a 1:2 ratio similar to the mobile device screen. As a result of experimenting with the dataset we have built, the mean average precision was improved by up to 101% compared to baseline.

CNN-based Online Sign Language Translation Counseling System (CNN기반의 온라인 수어통역 상담 시스템에 관한 연구)

  • Park, Won-Cheol;Park, Koo-Rack
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.5
    • /
    • pp.17-22
    • /
    • 2021
  • It is difficult for the hearing impaired to use the counseling service without sign language interpretation. Due to the shortage of sign language interpreters, it takes a lot of time to connect to sign language interpreters, or there are many cases where the connection is not available. Therefore, in this paper, we propose a system that captures sign language as an image using OpenCV and CNN (Convolutional Neural Network), recognizes sign language motion, and converts the meaning of sign language into textual data and provides it to users. The counselor can conduct counseling by reading the stored sign language translation counseling contents. Consultation is possible without a professional sign language interpreter, reducing the burden of waiting for a sign language interpreter. If the proposed system is applied to counseling services for the hearing impaired, it is expected to improve the effectiveness of counseling and promote academic research on counseling for the hearing impaired in the future.

2-Stage Detection and Classification Network for Kiosk User Analysis (디스플레이형 자판기 사용자 분석을 위한 이중 단계 검출 및 분류 망)

  • Seo, Ji-Won;Kim, Mi-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.5
    • /
    • pp.668-674
    • /
    • 2022
  • Machine learning techniques using visual data have high usability in fields of industry and service such as scene recognition, fault detection, security and user analysis. Among these, user analysis through the videos from CCTV is one of the practical way of using vision data. Also, many studies about lightweight artificial neural network have been published to increase high usability for mobile and embedded environment so far. In this study, we propose the network combining the object detection and classification for mobile graphic processing unit. This network detects pedestrian and face, classifies age and gender from detected face. Proposed network is constructed based on MobileNet, YOLOv2 and skip connection. Both detection and classification models are trained individually and combined as 2-stage structure. Also, attention mechanism is used to improve detection and classification ability. Nvidia Jetson Nano is used to run and evaluate the proposed system.

Twin models for high-resolution visual inspections

  • Seyedomid Sajedi;Kareem A. Eltouny;Xiao Liang
    • Smart Structures and Systems
    • /
    • v.31 no.4
    • /
    • pp.351-363
    • /
    • 2023
  • Visual structural inspections are an inseparable part of post-earthquake damage assessments. With unmanned aerial vehicles (UAVs) establishing a new frontier in visual inspections, there are major computational challenges in processing the collected massive amounts of high-resolution visual data. We propose twin deep learning models that can provide accurate high-resolution structural components and damage segmentation masks efficiently. The traditional approach to cope with high memory computational demands is to either uniformly downsample the raw images at the price of losing fine local details or cropping smaller parts of the images leading to a loss of global contextual information. Therefore, our twin models comprising Trainable Resizing for high-resolution Segmentation Network (TRS-Net) and DmgFormer approaches the global and local semantics from different perspectives. TRS-Net is a compound, high-resolution segmentation architecture equipped with learnable downsampler and upsampler modules to minimize information loss for optimal performance and efficiency. DmgFormer utilizes a transformer backbone and a convolutional decoder head with skip connections on a grid of crops aiming for high precision learning without downsizing. An augmented inference technique is used to boost performance further and reduce the possible loss of context due to grid cropping. Comprehensive experiments have been performed on the 3D physics-based graphics models (PBGMs) synthetic environments in the QuakeCity dataset. The proposed framework is evaluated using several metrics on three segmentation tasks: component type, component damage state, and global damage (crack, rebar, spalling). The models were developed as part of the 2nd International Competition for Structural Health Monitoring.