• Title/Summary/Keyword: Feature vectors

Search Result 812, Processing Time 0.026 seconds

Analysis of Skin Color Pigments from Camera RGB Signal Using Skin Pigment Absorption Spectrum (피부색소 흡수 스펙트럼을 이용한 카메라 RGB 신호의 피부색 성분 분석)

  • Kim, Jeong Yeop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.1
    • /
    • pp.41-50
    • /
    • 2022
  • In this paper, a method to directly calculate the major elements of skin color such as melanin and hemoglobin from the RGB signal of the camera is proposed. The main elements of skin color typically measure spectral reflectance using specific equipment, and reconfigure the values at some wavelengths of the measured light. The values calculated by this method include such things as melanin index and erythema index, and require special equipment such as a spectral reflectance measuring device or a multi-spectral camera. It is difficult to find a direct calculation method for such component elements from a general digital camera, and a method of indirectly calculating the concentration of melanin and hemoglobin using independent component analysis has been proposed. This method targets a region of a certain RGB image, extracts characteristic vectors of melanin and hemoglobin, and calculates the concentration in a manner similar to that of Principal Component Analysis. The disadvantage of this method is that it is difficult to directly calculate the pixel unit because a group of pixels in a certain area is used as an input, and since the extracted feature vector is implemented by an optimization method, it tends to be calculated with a different value each time it is executed. The final calculation is determined in the form of an image representing the components of melanin and hemoglobin by converting it back to the RGB coordinate system without using the feature vector itself. In order to improve the disadvantages of this method, the proposed method is to calculate the component values of melanin and hemoglobin in a feature space rather than an RGB coordinate system using a feature vector, and calculate the spectral reflectance corresponding to the skin color using a general digital camera. Methods and methods of calculating detailed components constituting skin pigments such as melanin, oxidized hemoglobin, deoxidized hemoglobin, and carotenoid using spectral reflectance. The proposed method does not require special equipment such as a spectral reflectance measuring device or a multi-spectral camera, and unlike the existing method, direct calculation of the pixel unit is possible, and the same characteristics can be obtained even in repeated execution. The standard diviation of density for melanin and hemoglobin of proposed method was 15% compared to conventional and therefore gives 6 times stable.

Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID (계층적 군집화 기반 Re-ID를 활용한 객체별 행동 및 표정 검출용 영상 분석 시스템)

  • Lee, Sang-Hyun;Yang, Seong-Hun;Oh, Seung-Jin;Kang, Jinbeom
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.89-106
    • /
    • 2022
  • Recently, the amount of video data collected from smartphones, CCTVs, black boxes, and high-definition cameras has increased rapidly. According to the increasing video data, the requirements for analysis and utilization are increasing. Due to the lack of skilled manpower to analyze videos in many industries, machine learning and artificial intelligence are actively used to assist manpower. In this situation, the demand for various computer vision technologies such as object detection and tracking, action detection, emotion detection, and Re-ID also increased rapidly. However, the object detection and tracking technology has many difficulties that degrade performance, such as re-appearance after the object's departure from the video recording location, and occlusion. Accordingly, action and emotion detection models based on object detection and tracking models also have difficulties in extracting data for each object. In addition, deep learning architectures consist of various models suffer from performance degradation due to bottlenects and lack of optimization. In this study, we propose an video analysis system consists of YOLOv5 based DeepSORT object tracking model, SlowFast based action recognition model, Torchreid based Re-ID model, and AWS Rekognition which is emotion recognition service. Proposed model uses single-linkage hierarchical clustering based Re-ID and some processing method which maximize hardware throughput. It has higher accuracy than the performance of the re-identification model using simple metrics, near real-time processing performance, and prevents tracking failure due to object departure and re-emergence, occlusion, etc. By continuously linking the action and facial emotion detection results of each object to the same object, it is possible to efficiently analyze videos. The re-identification model extracts a feature vector from the bounding box of object image detected by the object tracking model for each frame, and applies the single-linkage hierarchical clustering from the past frame using the extracted feature vectors to identify the same object that failed to track. Through the above process, it is possible to re-track the same object that has failed to tracking in the case of re-appearance or occlusion after leaving the video location. As a result, action and facial emotion detection results of the newly recognized object due to the tracking fails can be linked to those of the object that appeared in the past. On the other hand, as a way to improve processing performance, we introduce Bounding Box Queue by Object and Feature Queue method that can reduce RAM memory requirements while maximizing GPU memory throughput. Also we introduce the IoF(Intersection over Face) algorithm that allows facial emotion recognized through AWS Rekognition to be linked with object tracking information. The academic significance of this study is that the two-stage re-identification model can have real-time performance even in a high-cost environment that performs action and facial emotion detection according to processing techniques without reducing the accuracy by using simple metrics to achieve real-time performance. The practical implication of this study is that in various industrial fields that require action and facial emotion detection but have many difficulties due to the fails in object tracking can analyze videos effectively through proposed model. Proposed model which has high accuracy of retrace and processing performance can be used in various fields such as intelligent monitoring, observation services and behavioral or psychological analysis services where the integration of tracking information and extracted metadata creates greate industrial and business value. In the future, in order to measure the object tracking performance more precisely, there is a need to conduct an experiment using the MOT Challenge dataset, which is data used by many international conferences. We will investigate the problem that the IoF algorithm cannot solve to develop an additional complementary algorithm. In addition, we plan to conduct additional research to apply this model to various fields' dataset related to intelligent video analysis.

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

Research on hybrid music recommendation system using metadata of music tracks and playlists (음악과 플레이리스트의 메타데이터를 활용한 하이브리드 음악 추천 시스템에 관한 연구)

  • Hyun Tae Lee;Gyoo Gun Lim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.145-165
    • /
    • 2023
  • Recommendation system plays a significant role on relieving difficulties of selecting information among rapidly increasing amount of information caused by the development of the Internet and on efficiently displaying information that fits individual personal interest. In particular, without the help of recommendation system, E-commerce and OTT companies cannot overcome the long-tail phenomenon, a phenomenon in which only popular products are consumed, as the number of products and contents are rapidly increasing. Therefore, the research on recommendation systems is being actively conducted to overcome the phenomenon and to provide information or contents that are aligned with users' individual interests, in order to induce customers to consume various products or contents. Usually, collaborative filtering which utilizes users' historical behavioral data shows better performance than contents-based filtering which utilizes users' preferred contents. However, collaborative filtering can suffer from cold-start problem which occurs when there is lack of users' historical behavioral data. In this paper, hybrid music recommendation system, which can solve cold-start problem, is proposed based on the playlist data of Melon music streaming service that is given by Kakao Arena for music playlist continuation competition. The goal of this research is to use music tracks, that are included in the playlists, and metadata of music tracks and playlists in order to predict other music tracks when the half or whole of the tracks are masked. Therefore, two different recommendation procedures were conducted depending on the two different situations. When music tracks are included in the playlist, LightFM is used in order to utilize the music track list of the playlists and metadata of each music tracks. Then, the result of Item2Vec model, which uses vector embeddings of music tracks, tags and titles for recommendation, is combined with the result of LightFM model to create final recommendation list. When there are no music tracks available in the playlists but only playlists' tags and titles are available, recommendation was made by finding similar playlists based on playlists vectors which was made by the aggregation of FastText pre-trained embedding vectors of tags and titles of each playlists. As a result, not only cold-start problem can be resolved, but also achieved better performance than ALS, BPR and Item2Vec by using the metadata of both music tracks and playlists. In addition, it was found that the LightFM model, which uses only artist information as an item feature, shows the best performance compared to other LightFM models which use other item features of music tracks.

Segmentation and Visualization of Human Anatomy using Medical Imagery (의료영상을 이용한 인체장기의 분할 및 시각화)

  • Lee, Joon-Ku;Kim, Yang-Mo;Kim, Do-Yeon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.1
    • /
    • pp.191-197
    • /
    • 2013
  • Conventional CT and MRI scans produce cross-section slices of body that are viewed sequentially by radiologists who must imagine or extrapolate from these views what the 3 dimensional anatomy should be. By using sophisticated algorithm and high performance computing, these cross-sections may be rendered as direct 3D representations of human anatomy. The 2D medical image analysis forced to use time-consuming, subjective, error-prone manual techniques, such as slice tracing and region painting, for extracting regions of interest. To overcome the drawbacks of 2D medical image analysis, combining with medical image processing, 3D visualization is essential for extracting anatomical structures and making measurements. We used the gray-level thresholding, region growing, contour following, deformable model to segment human organ and used the feature vectors from texture analysis to detect harmful cancer. We used the perspective projection and marching cube algorithm to render the surface from volumetric MR and CT image data. The 3D visualization of human anatomy and segmented human organ provides valuable benefits for radiation treatment planning, surgical planning, surgery simulation, image guided surgery and interventional imaging applications.

A Study on Real-time Tracking Method of Horizontal Face Position for Optimal 3D T-DMB Content Service (지상파 DMB 단말에서의 3D 컨텐츠 최적 서비스를 위한 경계 정보 기반 실시간 얼굴 수평 위치 추적 방법에 관한 연구)

  • Kang, Seong-Goo;Lee, Sang-Seop;Yi, June-Ho;Kim, Jung-Kyu
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.6
    • /
    • pp.88-95
    • /
    • 2011
  • An embedded mobile device mostly has lower computation power than a general purpose computer because of its relatively lower system specifications. Consequently, conventional face tracking and face detection methods, requiring complex algorithms for higher recognition rates, are unsuitable in a mobile environment aiming for real time detection. On the other hand, by applying a real-time tracking and detecting algorithm, we would be able to provide a two-way interactive multimedia service between an user and a mobile device thus providing a far better quality of service in comparison to a one-way service. Therefore it is necessary to develop a real-time face and eye tracking technique optimized to a mobile environment. For this reason, in this paper, we proposes a method of tracking horizontal face position of a user on a T-DMB device for enhancing the quality of 3D DMB content. The proposed method uses the orientation of edges to estimate the left and right boundary of the face, and by the color edge information, the horizontal position and size of face is determined finally to decide the horizontal face. The sobel gradient vector is projected vertically and candidates of face boundaries are selected, and we proposed a smoothing method and a peak-detection method for the precise decision. Because general face detection algorithms use multi-scale feature vectors, the detection time is too long on a mobile environment. However the proposed algorithm which uses the single-scale detection method can detect the face more faster than conventional face detection methods.

Estimation and Weighting of Sub-band Reliability for Multi-band Speech Recognition (다중대역 음성인식을 위한 부대역 신뢰도의 추정 및 가중)

  • 조훈영;지상문;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.552-558
    • /
    • 2002
  • Recently, based on the human speech recognition (HSR) model of Fletcher, the multi-band speech recognition has been intensively studied by many researchers. As a new automatic speech recognition (ASR) technique, the multi-band speech recognition splits the frequency domain into several sub-bands and recognizes each sub-band independently. The likelihood scores of sub-bands are weighted according to reliabilities of sub-bands and re-combined to make a final decision. This approach is known to be robust under noisy environments. When the noise is stationary a sub-band SNR can be estimated using the noise information in non-speech interval. However, if the noise is non-stationary it is not feasible to obtain the sub-band SNR. This paper proposes the inverse sub-band distance (ISD) weighting, where a distance of each sub-band is calculated by a stochastic matching of input feature vectors and hidden Markov models. The inverse distance is used as a sub-band weight. Experiments on 1500∼1800㎐ band-limited white noise and classical guitar sound revealed that the proposed method could represent the sub-band reliability effectively and improve the performance under both stationary and non-stationary band-limited noise environments.

Positioning Analysis for Branding in Hanwoo (한우 브랜드의 포지셔닝 분석)

  • Kim, Yun Ho;Lee, Na Ra;Rhee, Sang Young;Hwang, Seong Won
    • Journal of Agricultural Extension & Community Development
    • /
    • v.20 no.4
    • /
    • pp.1181-1216
    • /
    • 2013
  • This study was accomplished to enhance brand value for hanwoo and to develop strategy for brand positioning that move customer's heart. This study in order to achieve the research was carried out as follows: First, the cluster analysis based on demographic characteristics for consumer on the basis of three types segmentation on market was conducted. Market A was consisted of a well-educated, high-income and young bracket. Market B was consisted of a well-educated, high-income and middle-aged bracket. Market C was consisted of a low-income and middle-aged class. Second, consumer's positioning map was analyzed based on perceiving data which are products' functional, emotional, and self-expressive benefits about consumer's feeling beef products. This study was analyzed each relative brand advantage and structure of competition on segmented market by conjoining each brands positioning map and feature vectors map. By the result of the analysis, each brand's positioning strategy was devised. As a result of the study, the hoengseong hanwoo is competitive about all kinds of market. We chooses that hoengseong hanwoo's target is A market, because that brand is evaluated as a high-ranked quality by high-class image of luxury price, quality, brand image. For management improvement sake, this brand(the hoengseong hanwoo) is needed to effort for promoting consumer's awareness about safety and reliability.

Recognition of Partially Occluded Binary Objects using Elastic Deformation Energy Measure (탄성변형에너지 측도를 이용한 부분적으로 가려진 이진 객체의 인식)

  • Moon, Young-In;Koo, Ja-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.63-70
    • /
    • 2014
  • Process of recognizing objects in binary images consists of image segmentation and pattern matching. If binary objects in the image are assumed to be separated, global features such as area, length of perimeter, or the ratio of the two can be used to recognize the objects in the image. However, if such an assumption is not valid, the global features can not be used but local features such as points or line segments should be used to recognize the objects. In this paper points with large curvature along the perimeter are chosen to be the feature points, and pairs of points selected from them are used as local features. Similarity of two local features are defined using elastic deformation energy for making the lengths and angles between gradient vectors at the end points same. Neighbour support value is defined and used for robust recognition of partially occluded binary objects. An experiment on Kimia-25 data showed that the proposed algorithm runs 4.5 times faster than the maximum clique algorithm with same recognition rate.

A Study on the Lower Body Muscle Strengthening System Using Kinect Sensor (Kinect 센서를 활용하는 노인 하체 근력 강화 시스템 연구)

  • Lee, Won-hee;Kang, Bo-yun;Kim, Yoon-jung;Kim, Hyun-kyung;Park, Jung Kyu;Park, Su E
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2095-2102
    • /
    • 2017
  • In this paper, we implemented the elderly home training contents provide individual exercise prescription according to the user's athletic ability and provide personalized program to the elderly individual. Health promotion is essential for overcoming the low health longevity of senior citizens preparing for aging population. Therefore, the lower body strengthening exercise to prevent falls is crucial to prevent a fall in the number of deaths of senior citizens. In this game model, the elderly are aiming at home training contents that can be found to feel that the elderly are going out of walk and exercising in the natural environment. To achieve this, Kinect extracts a specific bone model provide by the Kinect Sensor to generate the feature vectors and recognizes the movements and motion of the user. The recognition test using the Kinect sensor showed a recognition rate of about 80 to 97%.