• Title/Summary/Keyword: Model keypoint

Search Result 23, Processing Time 0.025 seconds

Keypoint-based Deep Learning Approach for Building Footprint Extraction Using Aerial Images

  • Jeong, Doyoung;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.1
    • /
    • pp.111-122
    • /
    • 2021
  • Building footprint extraction is an active topic in the domain of remote sensing, since buildings are a fundamental unit of urban areas. Deep convolutional neural networks successfully perform footprint extraction from optical satellite images. However, semantic segmentation produces coarse results in the output, such as blurred and rounded boundaries, which are caused by the use of convolutional layers with large receptive fields and pooling layers. The objective of this study is to generate visually enhanced building objects by directly extracting the vertices of individual buildings by combining instance segmentation and keypoint detection. The target keypoints in building extraction are defined as points of interest based on the local image gradient direction, that is, the vertices of a building polygon. The proposed framework follows a two-stage, top-down approach that is divided into object detection and keypoint estimation. Keypoints between instances are distinguished by merging the rough segmentation masks and the local features of regions of interest. A building polygon is created by grouping the predicted keypoints through a simple geometric method. Our model achieved an F1-score of 0.650 with an mIoU of 62.6 for building footprint extraction using the OpenCitesAI dataset. The results demonstrated that the proposed framework using keypoint estimation exhibited better segmentation performance when compared with Mask R-CNN in terms of both qualitative and quantitative results.

A comparative study on keypoint detection for developmental dysplasia of hip diagnosis using deep learning models in X-ray and ultrasound images (X-ray 및 초음파 영상을 활용한 고관절 이형성증 진단을 위한 특징점 검출 딥러닝 모델 비교 연구)

  • Sung-Hyun Kim;Kyungsu Lee;Si-Wook Lee;Jin Ho Chang;Jae Youn Hwang;Jihun Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.5
    • /
    • pp.460-468
    • /
    • 2023
  • Developmental Dysplasia of the Hip (DDH) is a pathological condition commonly occurring during the growth phase of infants. It acts as one of the factors that can disrupt an infant's growth and trigger potential complications. Therefore, it is critically important to detect and treat this condition early. The traditional diagnostic methods for DDH involve palpation techniques and diagnosis methods based on the detection of keypoints in the hip joint using X-ray or ultrasound imaging. However, there exist limitations in objectivity and productivity during keypoint detection in the hip joint. This study proposes a deep learning model-based keypoint detection method using X-ray and ultrasound imaging and analyzes the performance of keypoint detection using various deep learning models. Additionally, the study introduces and evaluates various data augmentation techniques to compensate the lack of medical data. This research demonstrated the highest keypoint detection performance when applying the residual network 152 (ResNet152) model with simple & complex augmentation techniques, with average Object Keypoint Similarity (OKS) of approximately 95.33 % and 81.21 % in X-ray and ultrasound images, respectively. These results demonstrate that the application of deep learning models to ultrasound and X-ray images to detect the keypoints in the hip joint could enhance the objectivity and productivity in DDH diagnosis.

A Method for Body Keypoint Localization based on Object Detection using the RGB-D information (RGB-D 정보를 이용한 객체 탐지 기반의 신체 키포인트 검출 방법)

  • Park, Seohee;Chun, Junchul
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.85-92
    • /
    • 2017
  • Recently, in the field of video surveillance, a Deep Learning based learning method has been applied to a method of detecting a moving person in a video and analyzing the behavior of a detected person. The human activity recognition, which is one of the fields this intelligent image analysis technology, detects the object and goes through the process of detecting the body keypoint to recognize the behavior of the detected object. In this paper, we propose a method for Body Keypoint Localization based on Object Detection using RGB-D information. First, the moving object is segmented and detected from the background using color information and depth information generated by the two cameras. The input image generated by rescaling the detected object region using RGB-D information is applied to Convolutional Pose Machines for one person's pose estimation. CPM are used to generate Belief Maps for 14 body parts per person and to detect body keypoints based on Belief Maps. This method provides an accurate region for objects to detect keypoints an can be extended from single Body Keypoint Localization to multiple Body Keypoint Localization through the integration of individual Body Keypoint Localization. In the future, it is possible to generate a model for human pose estimation using the detected keypoints and contribute to the field of human activity recognition.

Implementation of a Deep Learning-based Keypoint Detection Model for Industrial Shape Quality Inspection Vision (산업용 형상 품질 검사 비전을 위한 딥러닝 기반 형상 키포인트 검출 모델 구현)

  • Sukchoo Kim;JoongJang Kwan
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.37-38
    • /
    • 2023
  • 본 논문에서는 딥러닝을 기반으로 하는 키포인트 인식 모델을 산업용 품질검사 머신비전에 응용하는 방법을 제안한다. 전이학습 방법을 이용하여 딥러닝 모델의 인식률을 높이는 방법을 제시하였고, 전이시킨 특성 추출 모델에 대해 추가로 데이터 세트에 대한 학습을 진행하는 것이 특성추출 모델의 초기 ImageNet 가중치를 동결시켜 학습하는 것보다 학습 속도나 정확도가 높다는 것을 보여준다. 실험을 통해 딥러닝을 응용하는 산업용 품질 검사 공정에는 특성추출 모델의 추가 학습이 중요하다는 점을 확인할 수 있었다.

  • PDF

Fixed-Point Modeling and Performance Analysis of a SIFT Keypoints Localization Algorithm for SoC Hardware Design (SoC 하드웨어 설계를 위한 SIFT 특징점 위치 결정 알고리즘의 고정 소수점 모델링 및 성능 분석)

  • Park, Chan-Ill;Lee, Su-Hyun;Jeong, Yong-Jin
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.6
    • /
    • pp.49-59
    • /
    • 2008
  • SIFT(Scale Invariant Feature Transform) is an algorithm to extract vectors at pixels around keypoints, in which the pixel colors are very different from neighbors, such as vortices and edges of an object. The SIFT algorithm is being actively researched for various image processing applications including 3-D image constructions, and its most computation-intensive stage is a keypoint localization. In this paper, we develope a fixed-point model of the keypoint localization and propose its efficient hardware architecture for embedded applications. The bit-length of key variables are determined based on two performance measures: localization accuracy and error rate. Comparing with the original algorithm (implemented in Matlab), the accuracy and error rate of the proposed fixed point model are 93.57% and 2.72% respectively. In addition, we found that most of missing keypoints appeared at the edges of an object which are not very important in the case of keypoints matching. We estimate that the hardware implementation will give processing speed of $10{\sim}15\;frame/sec$, while its fixed point implementation on Pentium Core2Duo (2.13 GHz) and ARM9 (400 MHz) takes 10 seconds and one hour each to process a frame.

Fall Detection Based on 2-Stacked Bi-LSTM and Human-Skeleton Keypoints of RGBD Camera (RGBD 카메라 기반의 Human-Skeleton Keypoints와 2-Stacked Bi-LSTM 모델을 이용한 낙상 탐지)

  • Shin, Byung Geun;Kim, Uung Ho;Lee, Sang Woo;Yang, Jae Young;Kim, Wongyum
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.491-500
    • /
    • 2021
  • In this study, we propose a method for detecting fall behavior using MS Kinect v2 RGBD Camera-based Human-Skeleton Keypoints and a 2-Stacked Bi-LSTM model. In previous studies, skeletal information was extracted from RGB images using a deep learning model such as OpenPose, and then recognition was performed using a recurrent neural network model such as LSTM and GRU. The proposed method receives skeletal information directly from the camera, extracts 2 time-series features of acceleration and distance, and then recognizes the fall behavior using the 2-Stacked Bi-LSTM model. The central joint was obtained for the major skeletons such as the shoulder, spine, and pelvis, and the movement acceleration and distance from the floor were proposed as features of the central joint. The extracted features were compared with models such as Stacked LSTM and Bi-LSTM, and improved detection performance compared to existing studies such as GRU and LSTM was demonstrated through experiments.

Fast and Accurate Visual Place Recognition Using Street-View Images

  • Lee, Keundong;Lee, Seungjae;Jung, Won Jo;Kim, Kee Tae
    • ETRI Journal
    • /
    • v.39 no.1
    • /
    • pp.97-107
    • /
    • 2017
  • A fast and accurate building-level visual place recognition method built on an image-retrieval scheme using street-view images is proposed. Reference images generated from street-view images usually depict multiple buildings and confusing regions, such as roads, sky, and vehicles, which degrades retrieval accuracy and causes matching ambiguity. The proposed practical database refinement method uses informative reference image and keypoint selection. For database refinement, the method uses a spatial layout of the buildings in the reference image, specifically a building-identification mask image, which is obtained from a prebuilt three-dimensional model of the site. A global-positioning-system-aware retrieval structure is incorporated in it. To evaluate the method, we constructed a dataset over an area of $0.26km^2$. It was comprised of 38,700 reference images and corresponding building-identification mask images. The proposed method removed 25% of the database images using informative reference image selection. It achieved 85.6% recall of the top five candidates in 1.25 s of full processing. The method thus achieved high accuracy at a low computational complexity.

CNN3D-Based Bus Passenger Prediction Model Using Skeleton Keypoints (Skeleton Keypoints를 활용한 CNN3D 기반의 버스 승객 승하차 예측모델)

  • Jang, Jin;Kim, Soo Hyung
    • Smart Media Journal
    • /
    • v.11 no.3
    • /
    • pp.90-101
    • /
    • 2022
  • Buses are a popular means of transportation. As such, thorough preparation is needed for passenger safety management. However, the safety system is insufficient because there are accidents such as a death accident occurred when the bus departed without recognizing the elderly approaching to get on in 2018. There is a safety system that prevents pinching accidents through sensors on the back door stairs, but such a system does not prevent accidents that occur in the process of getting on and off like the above accident. If it is possible to predict the intention of bus passengers to get on and off, it will help to develop a safety system to prevent such accidents. However, studies predicting the intention of passengers to get on and off are insufficient. Therefore, in this paper, we propose a 1×1 CNN3D-based getting on and off intention prediction model using skeleton keypoints of passengers extracted from the camera image attached to the bus through UDP-Pose. The proposed model shows approximately 1~2% higher accuracy than the RNN and LSTM models in predicting passenger's getting on and off intentions.

도로 위 숫자 및 기호 인식을 위한 광각렌즈 기반 Camera Calibration 연구

  • Gang, Jin-Gyu;Hong, Hyeong-Gil;Hoang, Toan Minh;Vokhidov, Husan;Park, Gang-Ryeong;Jo, Hyeong-O
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1406-1407
    • /
    • 2015
  • 본 논문에서는 도로 위 숫자 및 기호인식에 적합한 Calibration Model에 대하여 연구하였다. 기존에 제시된 Geometric Transform, Fisheye Projection, Caltech Toolbox 기반 방법으로 얻은 Calibration Model의 성능을 비교하였다. Geometric Transform은 Fisheye Distortion Correction에 부적합한 결과를 얻었고, Fisheye Projection은 성능은 좋으나 시스템에 사용할 Camera Lens의 Specification을 모르기 때문에 이를 예측해야 하는 단점이 있다. 마지막으로 Caltech Tool box 기반 방법은 Calibration을 위한 Keypoint를 수동으로 지정하다 보니까 이로 인한 오차가 존재하게 된다. Calibration을 시도 할 때마다 결과에 차이가 있었으며, Calibration 결과의 측면에서 Fisheye Projection이 가시적으로 가장 좋은 결과를 나타냈다.

Post Sender Recognition using SIFT (SIFT를 이용한 우편영상의 송신자 인식)

  • Kim, Young-Won;Jang, Seung-Ick;Lee, Sung-Jun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.11
    • /
    • pp.48-57
    • /
    • 2010
  • Previous post sender recognition study was focused on recognizing the address of receiver. Relatively, there was lack of study to recognize the information of sender's address. Post sender recognition study is necessary for the service and application using sender information such as returning. This paper did the experiment and suggested how to recognize post sender using SIFT. Although SIFT shows great recognition rate, SIFT had problems with time and mis-recognition. One is increased time to match keypoints in proportion as the number of registered model. The other is mis-recognition of many similar keypoints even though they are all different models due to the nature of post sender. To solve the problem, this paper suggested SIFT adding distance function and did the experiment to compare time and function. In addition, it is suggested how to register and classify models automatically without the manual process of registering models.