DOI QR코드

DOI QR Code

A deep learning-based approach for feeding behavior recognition of weanling pigs

  • Kim, MinJu (Centre for Nutrition and Food Sciences, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland) ;
  • Choi, YoHan (Swine Division, National Institute of Animal Science, Rural Development Administration) ;
  • Lee, Jeong-nam (Interdisciplinary Graduate Program for BIT Medical Convergence, Kangwon National University) ;
  • Sa, SooJin (Swine Division, National Institute of Animal Science, Rural Development Administration) ;
  • Cho, Hyun-chong (Dept. of Electronics Engineering and Interdisciplinary Graduate Program for BIT Medical Convergence, Kangwon National University)
  • Received : 2021.11.11
  • Accepted : 2021.11.16
  • Published : 2021.11.30

Abstract

Feeding is the most important behavior that represents the health and welfare of weanling pigs. The early detection of feed refusal is crucial for the control of disease in the initial stages and the detection of empty feeders for adding feed in a timely manner. This paper proposes a real-time technique for the detection and recognition of small pigs using a deep-leaning-based method. The proposed model focuses on detecting pigs on a feeder in a feeding position. Conventional methods detect pigs and then classify them into different behavior gestures. In contrast, in the proposed method, these two tasks are combined into a single process to detect only feeding behavior to increase the speed of detection. Considering the significant differences between pig behaviors at different sizes, adaptive adjustments are introduced into a you-only-look-once (YOLO) model, including an angle optimization strategy between the head and body for detecting a head in a feeder. According to experimental results, this method can detect the feeding behavior of pigs and screen non-feeding positions with 95.66%, 94.22%, and 96.56% average precision (AP) at an intersection over union (IoU) threshold of 0.5 for YOLOv3, YOLOv4, and an additional layer and with the proposed activation function, respectively. Drinking behavior was detected with 86.86%, 89.16%, and 86.41% AP at a 0.5 IoU threshold for YOLOv3, YOLOv4, and the proposed activation function, respectively. In terms of detection and classification, the results of our study demonstrate that the proposed method yields higher precision and recall compared to conventional methods.

Keywords

INTRODUCTION

Feeding behavior represents the welfare and health status of pigs, so it provides adequate information to evaluate economic implications [1–3]. Several studies have reported that pig feeding behavior can be affected by diseases [4,5], environmental factors [1,3], and management systems [2,6]. Providing adequate water and feed increases the performance of farm animals and the frequency of use of feeders and drinkers. Aditionally, the amounts of water and feed intake are determinant factors representing health status, environmental changes, and feed delivery interruptions [7]. For instance, a sudden decrease in water consumption (20% to 30%) is an indicator of swine influenza outbreaks [8]. Currently, onsite and offsite visual monitoring is the most common procedure for evaluating pig behavior. In terms of accuracy and practicality, manual observation is a simple way to analyze the behavior of animals on small scale. However, manual detection is often time-consuming and laborious on a large scale sizes, particularly when there are several behaviors to be detected. Therefore, there is a need to develop automatic detection methods capable of handling large numbers of animal.

Several researchers have previously investigated computer-based systems for monitoring animal behavior based on image analysis [9–11]. Image processing is a non-invasive and practical technique for evaluating pig behavior over a long period of time. The evaluation of feeding and drinking behaviors has mainly been studied for large or restricted animals such as sows, finishing pigs, and cattle [4, 11, 12] because the recognition of large animals is easier than that of small and active animals. Experiments targeting the behavior of pigs have used body-part-based identification [13,14] or whole-body-based identification [11,15]. It has been reported that in both body-part- based identification and whole-body-based identification, tracking algorithms for pigs begin by designing support maps to recognize pig segments in captured images and then construct a 5D Gaussian model to detect individual pigs in different positions [16]. Kashiha et al. [17] reported that a faster region-based convolutional neural network (CNN) pig detector is preferred for pig segmentation when pigs cluster together. Alameer et al. [18] used a GoogLeNet-based deep learning method to identify feeding pigs without relying on pig tracking, which can distinguish between feeding behavior and non-feeding behavior in pigs. Another study was conducted based on the CNN architecture Xception for targeting spatiotemporal features to detect the feeding positions of group-housed pigs [19]. Although several different machine learning systems have been tested for detecting behavioral factors in pigs, there is still a lack of reports regarding their accuracy for evaluating feeding frequency in group-housed pigs. However, several machine learning systems have been tested to detect behavior factors of pigs, there is still a lack of reports on their accuracy in evaluating feeding frequency in group-housed pigs. In this study, we analyzed a pig image dataset from a real farm. Real farm image acquisition is influenced by parameters such as distance, picture resolution, and low-quality illumination. Therefore, the goal of this study was to develop a you- only-look-once (YOLO)-based method to classify pig image datasets to predict the frequency and duration of feeding behaviors using a suitable classifier for processing data.

MATERIALS AND METHODS

This study was approved by IACUC of Rural Development Adminstration (No. NIAS-2021-538). In the collected pig cage data, there are defined categories for bounding boxes of pigs that drink water and pigs that eat feed. The labeled data were divided into training data and testing data and a detection model was trained based on the YOLO algorithm. The results of the trained model were evaluated using classification performance indicators.

Data collection and the number of data

Videos were recorded on a JSK swine commercial farm (Busan, South Korea). Group-based weanling pigs were considered in this study. The average body weight of the pigs was 6.3 ± 1.4 kg. The weaned pigs were crossbred from Landrace × Yorkshire and Duroc composite male lines. The pigs were solid white. Each pen was 3.55 m × 2.44 m in size and contained two feeder types, namely a round feeder (54 cm diameter) and trough feeder (1.8 m length), as well as a nipple drinker. Fig. 1 presents the locations and sizes of the feed bins and the water supplies installed in the pig cages. A camera was installed at a height of 1.88 m high from the bottom of the pig cage. The camera was a Sony HDR-AS50 with a resolution of 1920 × 1080 pixels at 30 fps. Four pig cages were monitored from 10 AM to 4 PM. Three of the four pig cages were considered as training data and the remaining cage was considered as testing data. The videos were converted into still images by keeping every 20th frame. A total of 139, 040 images were obtained and the number of data labeled for drinking or feeding pigs was 9, 880. There were 7, 273 images in the training data and 2, 607 images the testing data. In the training data, there were 1, 906 pigs that drank water and 20, 847 pigs that ate feed. In the testing data, 1, 064 pigs drank water and 9, 536 pigs eat ate. The data are summarized in Table 1. As shown in Fig. 2, water supply facilities and feed barrels combined with pig heads create boundary boxes for training. The pigs have two water supply facilities and one feed container. In the water supply facilities, only one pig can be supplied at a time, whereas the feeder can supply up to 10 pigs at a time. Therefore, up to two pigs can drink water and up to 10 pigs can eat feed simultaneously.

Object detection algorithm

A YOLO-based detection algorithm that is advantageous for the real-time monitoring of pig behavior was adopted. Three different algorithms were tested: YOLOv4, YOLOv3, and YOLOv3 with an added detection layer and modified activity function.

Fig. 1. Haman piggy farm’s piggy cages.

Table 1. The number of pig’s data

Fig. 2. Pig data with labels.

Real time object detection YOLOv3

By using Darknet53 as a backbone network to extract features, continuous 3 × 3 convolutions, 1 × 1 convolutions, and shortcut layers can be used to construct deep networks and prevent overfitting. A feature map extracted by Darknet53 passes through a feature pyramid network (FPN). The FPN can learn from feature maps of three sizes using downsampling and upsampling. This is efficient because feature maps of various sizes can be used for learning one sample. Additionally, to reinforce the data lost during upsampling, each map can be combined with another feature map of the same size before downsampling.

The performance of YOLOv3 was analyzed based on image size. The image sizes (same width and height) were 320, 416, and 608 pixels, and the speeds were 22, 29, 51 ms, respectively, resulting in mean average precision (mAP) values of 51.5%, 55.3%, and 57.9%, respectively. In the YOLOv3 paper, the processing speed of the FPN-FRCN network, which achieved the highest mAP of 59.1%, was 172 ms. Compared to the slowest YOLOv3-608 network, the mAP is increased by 1.2 times, but the difference in processing speed is almost three times [20].

Faster and more accurate YOLOv4

Unlike the previous YOLOv3 network, YOLOv4 can be trained using a single GTX 1080TI GPU and has improved accuracy. Compared to YOLOv3, the YOLOv4 network structure improves performance by using bag of freebies (BoF) and bag of specials (BoS) components. When comparing the performances of YOLOv3 and YOLOv4, YOLOv4 improves the processing speed by 8 fps and the mAP by 12.4% [21]. BoF represents a group of methods for increasing the performance by maintaining inference costs. The first method is a data augmentation method that increases performance by augmenting data using tools such as CutOut, which is a method for randomly setting a pixel value to zero in a specific part of an image, and CutMix, which mixes a specified part of an image with other random images. However, in this study, when this method was used, the loss rate did not converge to zero, but diverged. Therefore, the image augmentation and mosaic methods included in the default YOLOv4 model were not used. Additionally, as a strategy to prevent overfitting during learning, a method for randomly disconnecting layers or connecting the outputs of previous layers to subsequent layer was adopted during training. The methods used in this study were DropOut, DropPath, Spatial DropOut, and DropBlock. Additionally, a loss function is used to adjust predicted bounding boxes to be more similar to ground-truth bounding boxes. The dropout methods used were generalized intersection over union (GIoU), complete IoU, and distance IoU (DIoU).

BoS is a method for increasing performance by increasing inference costs. BoS uses six techniques: enhancement of receptive fields, feature integration, activation functions, attention modules, normalization, and post-processing. To enhance the receptive field, spatial pyramid pooling (SPP) and atrous SPP were adopted. For feature integration, skip connections and an FPN were used. The rectified linear unit (ReLU) series, Swish, and Mish were used as activation functions. The attention module uses a squeeze-and-excitation module and a spatial attention module, which increases the inference cost slightly, but improves performance. For normalization, we use batch normalization, filter response normalization, and cross-iterative batch normalization to slow learning progress and prevent overfitting. Finally, for post-processing, non-maximum suppression (NMS), soft NMS, and DIoU NMS, which represents one of the multiple overlapping bounding boxes in one object, are applied [22].

Fig. 3 presents the learning structure of the YOLOv4 model. A feature map is extracted from the backbone using the CSPDarknet53 network proposed by Alexey [22]. The neck plays the role of connecting the extracted feature map to the detection layer. Additionally, YOLOv4 uses a two stage detector method. In one stage, the location of an object is determined and in the second stage, the object is classified [22].

Fig. 3. Learning structure of YOLOv4. YOLO, you-only-look-once.

Additional detection layers and changed activation function based on YOLOv3

The proposed algorithm changes the detection layer and activation functions of YOLOv3. YOLOv3 learns and detects three image sizes through downsampling, but the proposed algorithm learns and detects a total of four sizes by adding an additional downsampling layer. This is a more efficient learning method because feature maps can be extracted from diverse sizes through one learning process. The replaced active function uses the Mish function. The original activation function was the leaky ReLU function, which leads to poor connectivity to the output because there is a distortion at the point where the input is zero. In contrast, the Mish function yields a smooth curve where the input is zero, so it is possible to deliver a stable value to the next layer input [23]. Fig. 4 summarizes the structures of YOLOv3, YOLOv4, and YOLOv3 with the proposed modifications.

Fig. 4. The structure of YOLOv3, YOLOv4, YOLOv3 with layer added (A)YOLOv3 (B)YOLOv4 (C) YOLOv3 with additional detection layer.

Evaluation criteria

To evaluate the results of pig behavior detection, classification performance indicators and mAP were adopted. The classification performance indicators are the precision, recall, and F1-Score, and mAP uses an IoU threshold value to determine results.

Precision represents the number of true positives among all positively predicted samples, as shown in the following equation:

\(\text { Precision }=\frac{T P}{T P+F P}\)

Recall represents the number of true positives among all positive samples in the dataset and is expressed by the following equation:

\(\text { Recall }=\frac{T P}{T P+F N}\)

The F1-Score is the harmonic average of precision and recall, and this average is derived by weighting the lower of the two values. This measure indicates reduced performance when the difference between precision and recall is large. The IoU represents the extent to which the ground truth overlaps the predicted bounding box for object detection, as shown below.

\(\text { IoU }=\frac{\text { Area of Overlap }}{\text { Area of Union }}=\)

The mAP uses the IoU as a threshold to select bounding boxes that are above a certain threshold. The selected bounding boxes are sorted in descending order of their IoU values to draw a precision-recall curve. The area under the drawn curve is the mAP. mAP is an indicator of both identification and classification performance because the IoU, which represents location accuracy, and the precision-recall curve, which represents classification accuracy, are both considered.

RESULTS AND DISCUSSION

Because the health statuses of pigs can be determined based on their intake of feed and water, it is important to observe pigs continuously and check these intake levels. However, because humans cannot watch animals around the clock, technology for evaluating pig behavior based on recorded video is required. In this study, pig behavior was evaluated using YOLO, which is an object detection algorithm. Two behaviors were detected: drinking water and eating feed. Pig behavior detection identified a behavior corresponding to a class if the predicted bounding box overlapped by more than 50% with the relevant ground-truth bounding box. The networks used in this study were YOLOv3, YOLOv4, and a network in which additional layers and the Mish function were applied to YOLOv3. YOLOv3 uses the smallest amount of computing resources among the three networks and requires the smallest amount time to learn, but its performance is lower than that of the other two methods. YOLOv4 provides the best performance and fastest detection speed. However, it also uses the most computing resources. The modified YOLOv3 model incorporates an additional detection layer, so it has a longer detection time than the other networks, but it can detect pig locations better than YOLOv3 and requires fewer computing resources than YOLOv4. Additionally, the Mish function used in the modified network is a more complex activation function than the leaky ReLu function used in YOLOv3, so it uses more computing resources, but it also facilitates information flow inside the network and improves normalization performance to enhance feature extraction.

The considered IoU_threshold values were 0.5 and 0.6. In Table 2, when IoU_threshold is 0.5, the mAP values are greater than 90%. When IoU_threshold is 0.6, they fall to 73% to 77%. Additionally, the average IoU for each class is 0.72 for drinking water and 0.66 for eating feed. In Figs. 5 and 6, the actual number of pigs eating feed is large and the number of pigs drinking water is small. Additionally, the horizontal length of the water supply facility is similar to the size of a pig’s head and the boundaries of the water supply facility are clear, which facilitates a high IoU.

Table 2. Pig detection performance

IoU, intersection over union; YOLO, you-only-look-once; mAP, mean average precision.

Fig. 5. The predicted bounding box of the pig feeding (IoU_threshold=0.5). IoU, intersection over union.

Fig. 6. The predicted bounding box of the pig drinking (IoU_threshold=0.5).

Recall appears to be lower than precision in Table 2 because the overlap for feeding pigs is worse than that for pigs drinking water based on the large number of pigs eating food. When pigs overlap, two or three pigs feeding are recognized as having an increased false negative rate. Fig. 7 presents the behavior of most feed-eating pigs, but one can see that the two pigs at the top of Figs. 7A and B are identified as a single pig. Fig. 7A presents pigs that overlap horizontally and Fig. 7B presents pigs that overlap vertically, as indicated by the red boxes. In contrast, in the water drinking row, only one pig can drink water per water supply tank, and their head is located in the water supply tank, so it is clearly distinguished from pigs that are not drinking water. The cause of the increase in false negatives is the NMS used to solve the duplicate detection of multiple boundary boxes in a single object. NMS leaves only one bounding box with high predictability among bounding boxes that overlap by more than 50% [22]. In Fig. 7A, the overlapping pigs are recognized as a single pig as a result of NMS. The lower the IoU, the greater the rate of overlap with surrounding pigs. Therefore, a high-IoU bounding box can reduce the chance of a false negative. However, as shown in Fig. 7B, if the IoU overlaps vertically, then more than half of the IoU will overlap, even if the IoU value is high.

Fig. 7. Overlapping-pigs detection (A) horizontally overlapped pig (B) vertically overlapped pig.

As shown in Table 2, YOLOv4 with an SPP structure yields the highest mAP of 91.69%. Overall, mAP drops sharply when the IoU_threshold is 0.6, but YOLOv4 exhibits the smallest drop. In Table 3, when the IoU_threshold is 0.5, it generally yields high performance, and when IoU_threshold is 0.6, the feed-eating behavior mAP drops the most significantly for YOLO v3.

Table 3. Detection performance by pig behaviour

IoU, intersection over union; YOLO, you-only-look-once; AP, access point.

Because YOLOv4 learns pig features by subdividing them using SPP, the mAP drop for feed eating behavior caused by predicting a binding box closer to the pig is smaller than those in the other algorithms [24]. In contrast, in the modified YOLOv3, many overlapping objects occur at the smallest feature size and the mAP drops sharply for feed-eating behavior. Pig behavior detection performs well if pigs do not overlap, but when overlap occurs, it is difficult to detect behaviors accurately because multiple pigs may be recognized as a single pig. Additional research is required to address this problem.

CONCLUSION

This study aimed to check and manage water and feed intake continuously to support pig health and weight gain. A decline in pig water and feed intake can be attributed to the sensory and organizational properties of feed, animal physiological conditions, breeding environment, and specification management. Therefore, it is possible to manage the health and weight of pigs by improving their environment to encourage or suppress intake through continuous monitoring. To detect pigs, YOLOv3, YOLOv4, and modified YOLOv3 models were adopted. When the IoU threshold was 0.5, the F1-Score and mAP were generally greater than 90%. Overall, YOLOv4 produced good results, but in terms of drinking water, the modified network that used an additional detection layer and the Mish function performed best. This indicates that pig detection performs best in an environment where pigs do not overlap. If a network adopts an SPP structure, horizontal overlap can be solved by predicting tight bounding boxes, but vertical overlap is difficult to solve. Therefore, if an additional detection layer is added to YOLOv3 to resolve overlapping pigs and instance segmentation is applied to a network with the Mish function, it will yield high performance, even in pens containing many pigs. Because instance segmentation only extracts the pixels of objects inside the bounding boxes of detected objects, it is possible to learn from multiple objects. If we solve the failure to detect pig behaviors caused by overlapping pigs in the future, we will be able to confirm the exact amounts of water and feed intake of pigs. Accurate intake analysis can support efficient feed distribution and the need to improve the environment and signals of abnormal health conditions can be identified immediately. This will increase pig productivity and help combat future food shortages.

References

  1. Kim KH, Hosseindoust A, Ingale SL, Lee SH, Noh HS, Choi YH, et al. Effects of gestational housing on reproductive performance and behavior of sows with different backfat thickness. Asia-Australas J Anim Sci. 2016;29:142-8. https://doi.org/10.5713/ajas.14.0973
  2. Choi Y, Moturi J, Hosseindoust A, Kim M, Kim K, Lee J, et al. Night feeding in lactating sows is an essential management approach to decrease the detrimental impacts of heat stress. J Anim Sci Technol. 2019;61:333-9. https://doi.org/10.5187/jast.2019.61.6.333
  3. Hosseindoust AR, Lee SH, Kim JS, Choi YH, Kwon IK, Chae BJ. Productive performance of weanling piglets was improved by administration of a mixture of bacteriophages, targeted to control Coliforms and Clostridium spp. shedding in a challenging environment. J Anim Physiol Anim Nutr. 2017;101:e98-107. https://doi.org/10.1111/jpn.12567
  4. Miller AL, Dalton HA, Kanellos T, Kyriazakis I. How many pigs within a group need to be sick to lead to a diagnostic change in the group's behavior? J Anim Sci. 2019;97:1956-66. https://doi.org/10.1093/jas/skz083
  5. Hosseindoust AR, Lee SH, Kim JS, Choi YH, Noh HS, Lee JH, et al. Dietary bacteriophages as an alternative for zinc oxide or organic acids to control diarrhoea and improve the performance of weanling piglets. Vet Med. 2017;62:53-61. https://doi.org/10.17221/7/2016-VETMED
  6. Choi YH, Hosseindoust A, Kim JS, Lee SH, Kim MJ, Kumar A, et al. An overview of hourly rhythm of demand-feeding pattern by a controlled feeding system on productive performance of lactating sows during summer. Ital J Anim Sci. 2018;17:1001-9. https://doi.org/10.1080/1828051X.2018.1438214
  7. Nejad JG, Lohakare JD, West JW, Sung KI. Effects of water restriction after feeding during heat stress on nutrient digestibility, nitrogen balance, blood profile and characteristics in Corriedale ewes. Anim Feed Sci Technol. 2014;193:1-8. https://doi.org/10.1016/j.anifeeds-ci.2014.03.011
  8. Bernick K. Monitor water for health [Internet]. National hog farmer. 2007 [cited 2021 Aug 4]. https://www.nationalhogfarmer.com/health-diseases/monitor-water-health
  9. Chen C, Zhu W, Norton T. Behaviour recognition of pigs and cattle: journey from computer vision to deep learning. Comput Electron Agric. 2021;187:106255. https://doi.org/10.1016/j.compag.2021.106255
  10. Zhu W, Guo Y, Jiao P, Ma C, Chen C. Recognition and drinking behaviour analysis of individual pigs based on machine vision. Livest Sci. 2017;205:129-36. https://doi.org/10.1016/j.livsci.2017.09.003
  11. Yang Q, Xiao D, Lin S. Feeding behavior recognition for group-housed pigs with the Faster R-CNN. Comput Electron Agric. 2018;155:453-60. https://doi.org/10.1016/j.compag.2018.11.002
  12. Huang W, Zhu W, Ma C, Guo Y, Chen C. Identification of group-housed pigs based on gabor and local binary pattern features. Biosyst Eng. 2018;166:90-100. https://doi.org/10.1016/j.biosystemseng.2017.11.007
  13. Marsot M, Mei J, Shan X, Ye L, Feng P, Yan X, et al. An adaptive pig face recognition approach using convolutional neural networks. Comput Electron Agric. 2020;173:105386. https://doi.org/10.1016/j.compag.2020.105386
  14. Hansen MF, Smith ML, Smith LN, Salter MG, Baxter EM, Farish M, et al. Towards onfarm pig face recognition using convolutional neural networks. Comput Ind. 2018;98:145-52. https://doi.org/10.1016/j.compind.2018.02.016
  15. Nasirahmadi A, Sturm B, Olsson AC, Jeppsson KH, Muller S, Edwards S, et al. Automatic scoring of lateral and sternal lying posture in grouped pigs using image processing and support vector machine. Comput Electron Agric. 2019;156:475-81. https://doi.org/10.1016/j.compag.2018.12.009
  16. Ahrendt P, Gregersen T, Karstoft H. Development of a real-time computer vision system for tracking loose-housed pigs. Comput Electron Agric. 2011;76:169-74. https://doi.org/10.1016/j.compag.2011.01.011
  17. Kashiha M, Bahr C, Haredasht SA, Ott S, Moons CPH, Niewold TA, et al. The automatic monitoring of pigs water use by cameras. Comput Electron Agric. 2013;90:164-9. https://doi.org/10.1016/j.compag.2012.09.015
  18. Alameer A, Kyriazakis I, Dalton HA, Miller AL, Bacardit J. Automatic recognition of feeding and foraging behaviour in pigs using deep learning. Biosyst Eng. 2020;197:91-104. https://doi.org/10.1016/j.biosystemseng.2020.06.013
  19. Chen C, Zhu W, Steibel J, Siegford J, Han J, Norton T. Recognition of feeding behaviour of pigs and determination of feeding time of each pig by a video-based deep learning method. Comput Electron Agric. 2020;176:105642. https://doi.org/10.1016/j.compag.2020.105642
  20. Redmon J, Farhadi A. Yolov3: an incremental improvement [Internet]. 2018 [cited 2021 Aug 4]. https://arxiv.org/abs/1804.02767
  21. Jiang Z, Zhao L, Li S, Jia Y. Real-time object detection method based on improved YOLOv4-tiny [Internet]. 2020 [cited 2020 Nov 9]. https://arxiv.org/abs/2011.04244
  22. Bochkovskiy A, Wang CY, Liao HYM. Yolov4: optimal speed and accuracy of object detection. 2020 [cited 2020 Apr 23]. https://arxiv.org/abs/2004.10934
  23. Chae J, Cho H. Identifying the mating posture of cattle using deep learning-based object detection with networks of various settings. J Electr Eng Technol. 2021;16:1685-92. https://doi.org/10.1007/s42835-021-00701-z
  24. He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37:1904-16. https://doi.org/10.1109/TPAMI.2015.2389824