• 제목/요약/키워드: Image processing

검색결과 9,967건 처리시간 0.039초

Grasping a Target Object in Clutter with an Anthropomorphic Robot Hand via RGB-D Vision Intelligence, Target Path Planning and Deep Reinforcement Learning (RGB-D 환경인식 시각 지능, 목표 사물 경로 탐색 및 심층 강화학습에 기반한 사람형 로봇손의 목표 사물 파지)

  • Ryu, Ga Hyeon;Oh, Ji-Heon;Jeong, Jin Gyun;Jung, Hwanseok;Lee, Jin Hyuk;Lopez, Patricio Rivera;Kim, Tae-Seong
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제11권9호
    • /
    • pp.363-370
    • /
    • 2022
  • Grasping a target object among clutter objects without collision requires machine intelligence. Machine intelligence includes environment recognition, target & obstacle recognition, collision-free path planning, and object grasping intelligence of robot hands. In this work, we implement such system in simulation and hardware to grasp a target object without collision. We use a RGB-D image sensor to recognize the environment and objects. Various path-finding algorithms been implemented and tested to find collision-free paths. Finally for an anthropomorphic robot hand, object grasping intelligence is learned through deep reinforcement learning. In our simulation environment, grasping a target out of five clutter objects, showed an average success rate of 78.8%and a collision rate of 34% without path planning. Whereas our system combined with path planning showed an average success rate of 94% and an average collision rate of 20%. In our hardware environment grasping a target out of three clutter objects showed an average success rate of 30% and a collision rate of 97% without path planning whereas our system combined with path planning showed an average success rate of 90% and an average collision rate of 23%. Our results show that grasping a target object in clutter is feasible with vision intelligence, path planning, and deep RL.

The Design of Smart Factory System using AI Edge Device (AI 엣지 디바이스를 이용한 스마트 팩토리 시스템 설계)

  • Han, Seong-Il;Lee, Dae-Sik;Han, Ji-Hwan;Shin, Han Jae
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • 제15권4호
    • /
    • pp.257-270
    • /
    • 2022
  • In this paper, we design a smart factory risk improvement system and risk improvement method using AI edge devices. The smart factory risk improvement system collects, analyzes, prevents, and promptly responds to the worker's work performance process in the smart factory using AI edge devices, and can reduce the risk that may occur during work with improving the defect rate when workers perfom jobs. In particular, based on worker image information, worker biometric information, equipment operation information, and quality information of manufactured products, it is possible to set an abnormal risk condition, and it is possible to improve the risk so that the work is efficient and for the accurate performance. In addition, all data collected from cameras and IoT sensors inside the smart factory are processed by the AI edge device instead of all data being sent to the cloud, and only necessary data can be transmitted to the cloud, so the processing speed is fast and it has the advantage that security problems are low. Additionally, the use of AI edge devices has the advantage of reducing of data communication costs and the costs of data transmission bandwidth acquisition due to decrease of the amount of data transmission to the cloud.

Sign Language Dataset Built from S. Korean Government Briefing on COVID-19 (대한민국 정부의 코로나 19 브리핑을 기반으로 구축된 수어 데이터셋 연구)

  • Sim, Hohyun;Sung, Horyeol;Lee, Seungjae;Cho, Hyeonjoong
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제11권8호
    • /
    • pp.325-330
    • /
    • 2022
  • This paper conducts the collection and experiment of datasets for deep learning research on sign language such as sign language recognition, sign language translation, and sign language segmentation for Korean sign language. There exist difficulties for deep learning research of sign language. First, it is difficult to recognize sign languages since they contain multiple modalities including hand movements, hand directions, and facial expressions. Second, it is the absence of training data to conduct deep learning research. Currently, KETI dataset is the only known dataset for Korean sign language for deep learning. Sign language datasets for deep learning research are classified into two categories: Isolated sign language and Continuous sign language. Although several foreign sign language datasets have been collected over time. they are also insufficient for deep learning research of sign language. Therefore, we attempted to collect a large-scale Korean sign language dataset and evaluate it using a baseline model named TSPNet which has the performance of SOTA in the field of sign language translation. The collected dataset consists of a total of 11,402 image and text. Our experimental result with the baseline model using the dataset shows BLEU-4 score 3.63, which would be used as a basic performance of a baseline model for Korean sign language dataset. We hope that our experience of collecting Korean sign language dataset helps facilitate further research directions on Korean sign language.

Dental Surgery Simulation Using Haptic Feedback Device (햅틱 피드백 장치를 이용한 치과 수술 시뮬레이션)

  • Yoon Sang Yeun;Sung Su Kyung;Shin Byeong Seok
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제12권6호
    • /
    • pp.275-284
    • /
    • 2023
  • Virtual reality simulations are used for education and training in various fields, and are especially widely used in the medical field recently. The education/training simulator consists of tactile/force feedback generation and image/sound output hardware that provides a sense similar to a doctor's treatment of a real patient using real surgical tools, and software that produces realistic images and tactile feedback. Existing simulators are complicated and expensive because they have to use various types of hardware to simulate various surgical instruments used during surgery. In this paper, we propose a dental surgical simulation system using a force feedback device and a morphable haptic controller. Haptic hardware determines whether the surgical tool collides with the surgical site and provides a sense of resistance and vibration. In particular, haptic controllers that can be deformed, such as length changes and bending, can express various senses felt depending on the shape of various surgical tools. When the user manipulates the haptic feedback device, events such as movement of the haptic feedback device or button clicks are delivered to the simulation system, resulting in interaction between dental surgical tools and oral internal models, and thus haptic feedback is delivered to the haptic feedback device. Using these basic techniques, we provide a realistic training experience of impacted wisdom tooth extraction surgery, a representative dental surgery technique, in a virtual environment represented by sophisticated three-dimensional models.

Deep Learning Braille Block Recognition Method for Embedded Devices (임베디드 기기를 위한 딥러닝 점자블록 인식 방법)

  • Hee-jin Kim;Jae-hyuk Yoon;Soon-kak Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • 제28권4호
    • /
    • pp.1-9
    • /
    • 2023
  • In this paper, we propose a method to recognize the braille blocks for embedded devices in real time through deep learning. First, a deep learning model for braille block recognition is trained on a high-performance computer, and the learning model is applied to a lightweight tool to apply to an embedded device. To recognize the walking information of the braille block, an algorithm is used to determine the path using the distance from the braille block in the image. After detecting braille blocks, bollards, and crosswalks through the YOLOv8 model in the video captured by the embedded device, the walking information is recognized through the braille block path discrimination algorithm. We apply the model lightweight tool to YOLOv8 to detect braille blocks in real time. The precision of YOLOv8 model weights is lowered from the existing 32 bits to 8 bits, and the model is optimized by applying the TensorRT optimization engine. As the result of comparing the lightweight model through the proposed method with the existing model, the path recognition accuracy is 99.05%, which is almost the same as the existing model, but the recognition speed is reduced by 59% compared to the existing model, processing about 15 frames per second.

Method of Biological Information Analysis Based-on Object Contextual (대상객체 맥락 기반 생체정보 분석방법)

  • Kim, Kyung-jun;Kim, Ju-yeon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 한국정보통신학회 2022년도 춘계학술대회
    • /
    • pp.41-43
    • /
    • 2022
  • In order to prevent and block infectious diseases caused by the recent COVID-19 pandemic, non-contact biometric information acquisition and analysis technology is attracting attention. The invasive and attached biometric information acquisition method accurately has the advantage of measuring biometric information, but has a risk of increasing contagious diseases due to the close contact. To solve these problems, the non-contact method of extracting biometric information such as human fingerprints, faces, iris, veins, voice, and signatures with automated devices is increasing in various industries as data processing speed increases and recognition accuracy increases. However, although the accuracy of the non-contact biometric data acquisition technology is improved, the non-contact method is greatly influenced by the surrounding environment of the object to be measured, which is resulting in distortion of measurement information and poor accuracy. In this paper, we propose a context-based bio-signal modeling technique for the interpretation of personalized information (image, signal, etc.) for bio-information analysis. Context-based biometric information modeling techniques present a model that considers contextual and user information in biometric information measurement in order to improve performance. The proposed model analyzes signal information based on the feature probability distribution through context-based signal analysis that can maximize the predicted value probability.

  • PDF

Threat Situation Determination System Through AWS-Based Behavior and Object Recognition (AWS 기반 행위와 객체 인식을 통한 위협 상황 판단 시스템)

  • Ye-Young Kim;Su-Hyun Jeong;So-Hyun Park;Young-Ho Park
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제12권4호
    • /
    • pp.189-198
    • /
    • 2023
  • As crimes frequently occur on the street, the spread of CCTV is increasing. However, due to the shortcomings of passively operated CCTV, the need for intelligent CCTV is attracting attention. Due to the heavy system of such intelligent CCTV, high-performance devices are required, which has a problem in that it is expensive to replace the general CCTV. To solve this problem, an intelligent CCTV system that recognizes low-quality images and operates even on devices with low performance is required. Therefore, this paper proposes a Saying CCTV system that can detect threats in real time by using the AWS cloud platform to lighten the system and convert images into text. Based on the data extracted using YOLO v4 and OpenPose, it is implemented to determine the risk object, threat behavior, and threat situation, and calculate the risk using machine learning. Through this, the system can be operated anytime and anywhere as long as the network is connected, and the system can be used even with devices with minimal performance for video shooting and image upload. Furthermore, it is possible to quickly prevent crime by automating meaningful statistics on crime by analyzing the video and using the data stored as text.

BIM Mesh Optimization Algorithm Using K-Nearest Neighbors for Augmented Reality Visualization (증강현실 시각화를 위해 K-최근접 이웃을 사용한 BIM 메쉬 경량화 알고리즘)

  • Pa, Pa Win Aung;Lee, Donghwan;Park, Jooyoung;Cho, Mingeon;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • 제42권2호
    • /
    • pp.249-256
    • /
    • 2022
  • Various studies are being actively conducted to show that the real-time visualization technology that combines BIM (Building Information Modeling) and AR (Augmented Reality) helps to increase construction management decision-making and processing efficiency. However, when large-capacity BIM data is projected into AR, there are various limitations such as data transmission and connection problems and the image cut-off issue. To improve the high efficiency of visualizing, a mesh optimization algorithm based on the k-nearest neighbors (KNN) classification framework to reconstruct BIM data is proposed in place of existing mesh optimization methods that are complicated and cannot adequately handle meshes with numerous boundaries of the 3D models. In the proposed algorithm, our target BIM model is optimized with the Unity C# code based on triangle centroid concepts and classified using the KNN. As a result, the algorithm can check the number of mesh vertices and triangles before and after optimization of the entire model and each structure. In addition, it is able to optimize the mesh vertices of the original model by approximately 56 % and the triangles by about 42 %. Moreover, compared to the original model, the optimized model shows no visual differences in the model elements and information, meaning that high-performance visualization can be expected when using AR devices.

Context-Dependent Video Data Augmentation for Human Instance Segmentation (인물 개체 분할을 위한 맥락-의존적 비디오 데이터 보강)

  • HyunJin Chun;JongHun Lee;InCheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제12권5호
    • /
    • pp.217-228
    • /
    • 2023
  • Video instance segmentation is an intelligent visual task with high complexity because it not only requires object instance segmentation for each image frame constituting a video, but also requires accurate tracking of instances throughout the frame sequence of the video. In special, human instance segmentation in drama videos has an unique characteristic that requires accurate tracking of several main characters interacting in various places and times. Also, it is also characterized by a kind of the class imbalance problem because there is a significant difference between the frequency of main characters and that of supporting or auxiliary characters in drama videos. In this paper, we introduce a new human instance datatset called MHIS, which is built upon drama videos, Miseang, and then propose a novel video data augmentation method, CDVA, in order to overcome the data imbalance problem between character classes. Different from the previous video data augmentation methods, the proposed CDVA generates more realistic augmented videos by deciding the optimal location within the background clip for a target human instance to be inserted with taking rich spatio-temporal context embedded in videos into account. Therefore, the proposed augmentation method, CDVA, can improve the performance of a deep neural network model for video instance segmentation. Conducting both quantitative and qualitative experiments using the MHIS dataset, we prove the usefulness and effectiveness of the proposed video data augmentation method.

Detecting Vehicles That Are Illegally Driving on Road Shoulders Using Faster R-CNN (Faster R-CNN을 이용한 갓길 차로 위반 차량 검출)

  • Go, MyungJin;Park, Minju;Yeo, Jiho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • 제21권1호
    • /
    • pp.105-122
    • /
    • 2022
  • According to the statistics about the fatal crashes that have occurred on the expressways for the last 5 years, those who died on the shoulders of the road has been as 3 times high as the others who died on the expressways. It suggests that the crashes on the shoulders of the road should be fatal, and that it would be important to prevent the traffic crashes by cracking down on the vehicles intruding the shoulders of the road. Therefore, this study proposed a method to detect a vehicle that violates the shoulder lane by using the Faster R-CNN. The vehicle was detected based on the Faster R-CNN, and an additional reading module was configured to determine whether there was a shoulder violation. For experiments and evaluations, GTAV, a simulation game that can reproduce situations similar to the real world, was used. 1,800 images of training data and 800 evaluation data were processed and generated, and the performance according to the change of the threshold value was measured in ZFNet and VGG16. As a result, the detection rate of ZFNet was 99.2% based on Threshold 0.8 and VGG16 93.9% based on Threshold 0.7, and the average detection speed for each model was 0.0468 seconds for ZFNet and 0.16 seconds for VGG16, so the detection rate of ZFNet was about 7% higher. The speed was also confirmed to be about 3.4 times faster. These results show that even in a relatively uncomplicated network, it is possible to detect a vehicle that violates the shoulder lane at a high speed without pre-processing the input image. It suggests that this algorithm can be used to detect violations of designated lanes if sufficient training datasets based on actual video data are obtained.