• 제목/요약/키워드: Image Recognition Technology

Search Result 985, Processing Time 0.032 seconds

Inexpensive Visual Motion Data Glove for Human-Computer Interface Via Hand Gesture Recognition (손 동작 인식을 통한 인간 - 컴퓨터 인터페이스용 저가형 비주얼 모션 데이터 글러브)

  • Han, Young-Mo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.341-346
    • /
    • 2009
  • The motion data glove is a representative human-computer interaction tool that inputs human hand gestures to computers by measuring their motions. The motion data glove is essential equipment used for new computer technologiesincluding home automation, virtual reality, biometrics, motion capture. For its popular usage, this paper attempts to develop an inexpensive visual.type motion data glove that can be used without any special equipment. The proposed approach has the special feature; it can be developed as a low-cost one becauseof not using high-cost motion-sensing fibers that were used in the conventional approaches. That makes its easy production and popular use possible. This approach adopts a visual method that is obtained by improving conventional optic motion capture technology, instead of mechanical method using motion-sensing fibers. Compared to conventional visual methods, the proposed method has the following advantages and originalities Firstly, conventional visual methods use many cameras and equipments to reconstruct 3D pose with eliminating occlusions But the proposed method adopts a mono vision approachthat makes simple and low cost equipments possible. Secondly, conventional mono vision methods have difficulty in reconstructing 3D pose of occluded parts in images because they have weak points about occlusions. But the proposed approach can reconstruct occluded parts in images by using originally designed thin-bar-shaped optic indicators. Thirdly, many cases of conventional methods use nonlinear numerical computation image analysis algorithm, so they have inconvenience about their initialization and computation times. But the proposed method improves these inconveniences by using a closed-form image analysis algorithm that is obtained from original formulation. Fourthly, many cases of conventional closed-form algorithms use approximations in their formulations processes, so they have disadvantages of low accuracy and confined applications due to singularities. But the proposed method improves these disadvantages by original formulation techniques where a closed-form algorithm is derived by using exponential-form twist coordinates, instead of using approximations or local parameterizations such as Euler angels.

Comparison on the recognition characteristic of the designer and consumer about the formative elements (디자이너와 소비자의 조형요소 인지특성 비교)

  • Min, Kyung-Taek;Heo, Seong-Cheol
    • Science of Emotion and Sensibility
    • /
    • v.12 no.1
    • /
    • pp.97-108
    • /
    • 2009
  • In the process of product design, shaping is the process of making a substantive existence, and ultimately it generates the outcome. The process of shaping is generally led by designer's initiative work, and in this process, various formative elements are used to generate the outcome. In this research, the basic purposes are to figure out the differences of elements which generated by the differences of consumer's and designer's view in the process of shaping of the product, and the characteristics of the affective responses caused by those differences. Also, it will examine how the consumers can directly participate in the process of the shaping of the consumer-participated product, and the feasible guidelines of design in which consumers' needs can be reflected more efficiently to the process of shaping. As a result, consumers and designers have certain degree of difference of view-point about the formative element of the shape. The difference was due to subjective common ideas of design in case of designers, and in case of consumers, it was due to their immature visual understanding. There is another experiment of affective response about the shape of the product. First, I established the sensible image vocabulary based on the shape of the product. And based on the vocabulary, I carried out the same experiments to the consumers and designers.

  • PDF

Welfare Interface using Multiple Facial Features Tracking (다중 얼굴 특징 추적을 이용한 복지형 인터페이스)

  • Ju, Jin-Sun;Shin, Yun-Hee;Kim, Eun-Yi
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.1
    • /
    • pp.75-83
    • /
    • 2008
  • We propose a welfare interface using multiple fecial features tracking, which can efficiently implement various mouse operations. The proposed system consist of five modules: face detection, eye detection, mouth detection, facial feature tracking, and mouse control. The facial region is first obtained using skin-color model and connected-component analysis(CCs). Thereafter the eye regions are localized using neutral network(NN)-based texture classifier that discriminates the facial region into eye class and non-eye class, and then mouth region is localized using edge detector. Once eye and mouth regions are localized they are continuously and correctly tracking by mean-shift algorithm and template matching, respectively. Based on the tracking results, mouse operations such as movement or click are implemented. To assess the validity of the proposed system, it was applied to the interface system for web browser and was tested on a group of 25 users. The results show that our system have the accuracy of 99% and process more than 21 frame/sec on PC for the $320{\times}240$ size input image, as such it can supply a user-friendly and convenient access to a computer in real-time operation.

A System of Audio Data Analysis and Masking Personal Information Using Audio Partitioning and Artificial Intelligence API (오디오 데이터 내 개인 신상 정보 검출과 마스킹을 위한 인공지능 API의 활용 및 음성 분할 방법의 연구)

  • Kim, TaeYoung;Hong, Ji Won;Kim, Do Hee;Kim, Hyung-Jong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.5
    • /
    • pp.895-907
    • /
    • 2020
  • With the recent increasing influence of multimedia content other than the text-based content, services that help to process information in content brings us great convenience. These services' representative features are searching and masking the sensitive data. It is not difficult to find the solutions that provide searching and masking function for text information and image. However, even though we recognize the necessity of the technology for searching and masking a part of the audio data, it is not easy to find the solution because of the difficulty of the technology. In this study, we propose web application that provides searching and masking functions for audio data using audio partitioning method. While we are achieving the research goal, we evaluated several speech to text conversion APIs to choose a proper API for our purpose and developed regular expressions for searching sensitive information. Lastly we evaluated the accuracy of the developed searching and masking feature. The contribution of this work is in design and implementation of searching and masking a sensitive information from the audio data by the various functionality proving experiments.

A Method for 3D Human Pose Estimation based on 2D Keypoint Detection using RGB-D information (RGB-D 정보를 이용한 2차원 키포인트 탐지 기반 3차원 인간 자세 추정 방법)

  • Park, Seohee;Ji, Myunggeun;Chun, Junchul
    • Journal of Internet Computing and Services
    • /
    • v.19 no.6
    • /
    • pp.41-51
    • /
    • 2018
  • Recently, in the field of video surveillance, deep learning based learning method is applied to intelligent video surveillance system, and various events such as crime, fire, and abnormal phenomenon can be robustly detected. However, since occlusion occurs due to the loss of 3d information generated by projecting the 3d real-world in 2d image, it is need to consider the occlusion problem in order to accurately detect the object and to estimate the pose. Therefore, in this paper, we detect moving objects by solving the occlusion problem of object detection process by adding depth information to existing RGB information. Then, using the convolution neural network in the detected region, the positions of the 14 keypoints of the human joint region can be predicted. Finally, in order to solve the self-occlusion problem occurring in the pose estimation process, the method for 3d human pose estimation is described by extending the range of estimation to the 3d space using the predicted result of 2d keypoint and the deep neural network. In the future, the result of 2d and 3d pose estimation of this research can be used as easy data for future human behavior recognition and contribute to the development of industrial technology.

A Quality Prediction Model for Ginseng Sprouts based on CNN (CNN을 활용한 새싹삼의 품질 예측 모델 개발)

  • Lee, Chung-Gu;Jeong, Seok-Bong
    • Journal of the Korea Society for Simulation
    • /
    • v.30 no.2
    • /
    • pp.41-48
    • /
    • 2021
  • As the rural population continues to decline and aging, the improvement of agricultural productivity is becoming more important. Early prediction of crop quality can play an important role in improving agricultural productivity and profitability. Although many researches have been conducted recently to classify diseases and predict crop yield using CNN based deep learning and transfer learning technology, there are few studies which predict postharvest crop quality early in the planting stage. In this study, a early quality prediction model is proposed for sprout ginseng, which is drawing attention as a healthy functional foods. For this end, we took pictures of ginseng seedlings in the planting stage and cultivated them through hydroponic cultivation. After harvest, quality data were labeled by classifying the quality of ginseng sprout. With this data, we build early quality prediction models using several pre-trained CNN models through transfer learning technology. And we compare the prediction performance such as learning period and accuracy between each model. The results show more than 80% prediction accuracy in all proposed models, especially ResNet152V2 based model shows the highest accuracy. Through this study, it is expected that it will be able to contribute to production and profitability by automating the existing seedling screening works, which primarily rely on manpower.

Individual Ortho-rectification of Coast Guard Aerial Images for Oil Spill Monitoring (유출유 모니터링을 위한 해경 항공 영상의 개별정사보정)

  • Oh, Youngon;Bui, An Ngoc;Choi, Kyoungah;Lee, Impyeong
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1479-1488
    • /
    • 2022
  • Accidents in which oil spills occur intermittently in the ocean due to ship collisions and sinkings. In order to prepare prompt countermeasures when such an accident occurs, it is necessary to accurately identify the current status of spilled oil. To this end, the Coast Guard patrols the target area with a fixed-wing airplane or helicopter and checks it with the naked eye or video, but it was difficult to determine the area contaminated by the spilled oil and its exact location on the map. Accordingly, this study develops a technology for direct ortho-rectification by automatically geo-referencing aerial images collected by the Coast Guard without individual ground reference points to identify the current status of spilled oil. First, meta information required for georeferencing is extracted from a visualized screen of sensor information such as video by optical character recognition (OCR). Based on the extracted information, the external orientation parameters of the image are determined. Images are individually orthorectified using the determined the external orientation parameters. The accuracy of individual orthoimages generated through this method was evaluated to be about tens of meters up to 100 m. The accuracy level was reasonably acceptable considering the inherent errors of the position and attitude sensors, the inaccuracies in the internal orientation parameters such as camera focal length, without using no ground control points. It is judged to be an appropriate level for identifying the current status of spilled oil contaminated areas in the sea. In the future, if real-time transmission of images captured during flight becomes possible, individual orthoimages can be generated in real time through the proposed individual orthorectification technology. Based on this, it can be effectively used to quickly identify the current status of spilled oil contamination and establish countermeasures.

A Study on the Type and Sense of Place of the Lighting Design of Urban Public Space (도시 공공공간 조명디자인 유형과 장소성에 관한 연구)

  • Ma, Dong Qing;Yoon, Ji Young
    • Korea Science and Art Forum
    • /
    • v.27
    • /
    • pp.101-114
    • /
    • 2017
  • Based on the relationship between urban public space, urban lighting and the sense of place, this paper aims to analyze the lighting environment types with the sense of place and their characteristics. First, with the theory study as the research foundation, it extracts six spatial factors of public space lighting design and then analyzes 12 relevant cases on the basis. Finally, it divides the 12 cases into four types, Basic types, Storytelling, Interactive and Multi-Media and analyzes the core design factor and characteristics of various types. The results show that: first, functionality, sustainability and aesthetics are the basic factors to realize the urban public space lighting places. Second, the six cases of "Storytelling" show that the theme of specific areas, namely the exploration of "story" is conducive for lighting design to form clear and definite environment recognition. Third, for "Interactive" and "Multi-Media", the intervention of new media technology and new lighting way has made the wide expansion of urban lighting design connotation and extension. The research results show that strengthening the urban location performance by the lighting design could improve the city image, which provides the basis for the development of urban public space lighting design.

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID (계층적 군집화 기반 Re-ID를 활용한 객체별 행동 및 표정 검출용 영상 분석 시스템)

  • Lee, Sang-Hyun;Yang, Seong-Hun;Oh, Seung-Jin;Kang, Jinbeom
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.89-106
    • /
    • 2022
  • Recently, the amount of video data collected from smartphones, CCTVs, black boxes, and high-definition cameras has increased rapidly. According to the increasing video data, the requirements for analysis and utilization are increasing. Due to the lack of skilled manpower to analyze videos in many industries, machine learning and artificial intelligence are actively used to assist manpower. In this situation, the demand for various computer vision technologies such as object detection and tracking, action detection, emotion detection, and Re-ID also increased rapidly. However, the object detection and tracking technology has many difficulties that degrade performance, such as re-appearance after the object's departure from the video recording location, and occlusion. Accordingly, action and emotion detection models based on object detection and tracking models also have difficulties in extracting data for each object. In addition, deep learning architectures consist of various models suffer from performance degradation due to bottlenects and lack of optimization. In this study, we propose an video analysis system consists of YOLOv5 based DeepSORT object tracking model, SlowFast based action recognition model, Torchreid based Re-ID model, and AWS Rekognition which is emotion recognition service. Proposed model uses single-linkage hierarchical clustering based Re-ID and some processing method which maximize hardware throughput. It has higher accuracy than the performance of the re-identification model using simple metrics, near real-time processing performance, and prevents tracking failure due to object departure and re-emergence, occlusion, etc. By continuously linking the action and facial emotion detection results of each object to the same object, it is possible to efficiently analyze videos. The re-identification model extracts a feature vector from the bounding box of object image detected by the object tracking model for each frame, and applies the single-linkage hierarchical clustering from the past frame using the extracted feature vectors to identify the same object that failed to track. Through the above process, it is possible to re-track the same object that has failed to tracking in the case of re-appearance or occlusion after leaving the video location. As a result, action and facial emotion detection results of the newly recognized object due to the tracking fails can be linked to those of the object that appeared in the past. On the other hand, as a way to improve processing performance, we introduce Bounding Box Queue by Object and Feature Queue method that can reduce RAM memory requirements while maximizing GPU memory throughput. Also we introduce the IoF(Intersection over Face) algorithm that allows facial emotion recognized through AWS Rekognition to be linked with object tracking information. The academic significance of this study is that the two-stage re-identification model can have real-time performance even in a high-cost environment that performs action and facial emotion detection according to processing techniques without reducing the accuracy by using simple metrics to achieve real-time performance. The practical implication of this study is that in various industrial fields that require action and facial emotion detection but have many difficulties due to the fails in object tracking can analyze videos effectively through proposed model. Proposed model which has high accuracy of retrace and processing performance can be used in various fields such as intelligent monitoring, observation services and behavioral or psychological analysis services where the integration of tracking information and extracted metadata creates greate industrial and business value. In the future, in order to measure the object tracking performance more precisely, there is a need to conduct an experiment using the MOT Challenge dataset, which is data used by many international conferences. We will investigate the problem that the IoF algorithm cannot solve to develop an additional complementary algorithm. In addition, we plan to conduct additional research to apply this model to various fields' dataset related to intelligent video analysis.