DOI QR코드

DOI QR Code

Hierarchical Flow-Based Anomaly Detection Model for Motor Gearbox Defect Detection

  • Younghwa Lee (Department of Information Technology and Media Engineering, The Graduate School of Nano Design Fusion, Seoul National University of Science and Technology) ;
  • Il-Sik Chang (Department of Information Technology and Media Engineering, The Graduate School of Nano Design Fusion, Seoul National University of Science and Technology) ;
  • Suseong Oh (IT Media Engineering Program, Department of IT and Media Engineering, Seoul National University of Science and Technology) ;
  • Youngjin Nam (IT Media Engineering Program, Department of IT and Media Engineering, Seoul National University of Science and Technology) ;
  • Youngteuk Chae (Advanced R&D Department, SMR Automotive Modules Korea Ltd) ;
  • Geonyoung Choi (Advanced R&D Department, SMR Automotive Modules Korea Ltd) ;
  • Gooman Park (Department of IT and Media Engineering, Seoul National University of Science and Technology)
  • Received : 2022.10.12
  • Accepted : 2023.06.05
  • Published : 2023.06.30

Abstract

In this paper, a motor gearbox fault-detection system based on a hierarchical flow-based model is proposed. The proposed system is used for the anomaly detection of a motion sound-based actuator module. The proposed flow-based model, which is a generative model, learns by directly modeling a data distribution function. As the objective function is the maximum likelihood value of the input data, the training is stable and simple to use for anomaly detection. The operation sound of a car's side-view mirror motor is converted into a Mel-spectrogram image, consisting of a folding signal and an unfolding signal, and used as training data in this experiment. The proposed system is composed of an encoder and a decoder. The data extracted from the layer of the pretrained feature extractor are used as the decoder input data in the encoder. This information is used in the decoder by performing an interlayer cross-scale convolution operation. The experimental results indicate that the context information of various dimensions extracted from the interlayer hierarchical data improves the defect detection accuracy. This paper is notable because it uses acoustic data and a normalizing flow model to detect outliers based on the features of experimental data.

Keywords

1. Introduction

Vehicles are made up of many different parts. If any of these components fail, the vehicle quality and performance will degrade. Defects in small parts may result in part failures, which, in turn, lead to significant financial loss, vehicle accidents, and even loss of lives. Therefore, the detection of defected parts is highly important. By exploiting the advances in artificial intelligence technology, various deep-learning-based defect detection technologies, which provide accurate and uniform defect detection, have been investigated. However, the occurrence frequency of abnormal samples is significantly lower than that of normal samples due to the nature of the manufacturing process. Also, the types of defects are diverse, and it is difficult to obtain training data. As a result, the detection problem is difficult to be approached as a classification problem based on supervised learning. Consequently, the manufacturing industry uses anomaly detection (AD). AD can efficiently learn from imbalanced data to distinguish between normal and abnormal parts [1].

AD (or outlier detection (OD)) is the problem of identifying patterns in data that do not conform to an expected behavior [2]. It is a method for developing a model, which is based on training data. These data have different features than the existing data. This model identifies the data rather than the noise. OD has been applied to various fields, such as the internet of things (IoT) [3-4], defect detection of industrial equipment, medical diagnosis [5-8], and abnormal image analysis (CCTV image) [9].

There are several research methods for AD. These include a network training method for reconstructing the characteristics of normal data [10], One-Class Classification Method [11-12], a feature matching method for distinguishing anomalies based on the feature distance or the probability distribution [13], and a method for directly predicting the probability values of test data using normalizing flow [14].

Among them, the AD technique, which uses normalizing flow, employs a generative flow model. By inversely transforming the continuous probability distribution function, the distribution of data can be obtained, and data loss becomes minimal. Furthermore, since the maximum likelihood of input data is used as the objective function, it is simple to use these data for AD.

In this study, we use training data obtained from the sound of a small-actuator module installed in an automobile side-view mirror. Currently, during the manufacturing process, tests are conducted in a booth, where the noise is blocked by a person directly listening to the sound. Defects can be classified using this process. However, the difference between normal and defective operation sounds obtained from the small-actuator module is very small. Consequently, different inspectors may not be able to distinguish these sounds, not even an experienced inspector. The training data used in this study have the drawback that the ground truth is not clear because of the different ways humans distinguish normal from abnormal signals and classify them. In this study, the features of the training data are examined, and an outlier identification model based on these features is proposed. The operation sound of the small-actuator module is converted into a Mel-spectrogram image and used as training data. A generative model based OD system using normalizing flow is proposed.

The proposed system is composed of an encoder and a decoder. Image data of various sizes are used as an input to the pretrained feature extractor in the encoder. Then, from the middle layer of the feature extractor, features of varying sizes are hierarchically extracted and used as input data in the decoder.

The contributions of this study are as follows:

⦁ The training data used in this study are the operation sound signals of a small-actuator module of an automobile side-view mirror.

⦁ The characteristics of the sound data collected for the experiment are examined and preprocessed. An AD method, which is based on analytical features, is also proposed. This method uses a novel flow-based generative model.

⦁ In the layering stage of the feature extractor, the AD performance can be improved using the hierarchically extracted features and the size variation of the input image.

This paper is organized as follows. In section 2, related studies, such as the change of variables theorem, the concept of normalizing flow, and the flow-based generative model, are presented. The structure of the proposed model, which consists of an encoder and a decoder, is described in section 3. The characteristics and preprocessing methods used in the experiment for the operation sound data of the side-view mirror motor are described in section 4. In section 5, the performance evaluation results of the proposed AD model are presented. The experimental results, limitations, and future challenges of the proposed AD model are summarized in section 6.

2. Related studies

2.1 Change of Variables Theorem

Flow-based generative models estimate probability distributions based on the normalizing flow technique (described in subsection 2.2), which is performed by employing variable cleanup changes. The change of variable theorem is a method used to simplify problems, where the original variable is replaced by a function of another variable or multiple variables.

𝑓: 𝑅𝑑 → 𝑅𝑑 , 𝑌 = 𝑓(𝑋), 𝑋 = 𝑓−1(𝑌)       (1)

\(\begin{aligned}P_{Y}(y)=P_{X}\left(f^{-1}(y)\right)\left|\operatorname{det} \frac{d f^{-1}}{d y}\right|=P_{x}(x)\left|\operatorname{det}\left(\frac{d f}{d x}\right)^{-1}\right|=P_{x}(x)\left|\operatorname{det} \frac{d f}{d x}\right|^{-1}\end{aligned}\)       (2)

\(\begin{aligned}\log P_{Y}(y)=\log P_{x}(x)-\log \left|\operatorname{det} \frac{d f}{d x}\right|\end{aligned}\)       (3)

When an invertible function 𝑓 is defined as in Equation (1), the probability distribution for the probability variable 𝑌 can be transformed into a probability distribution for the probability variable 𝑋 as in Equation (2).

By introducing the logarithmic function to both sides of Equation (2), a change of variables can be derived, as shown in Equation (3).

𝑓 is a function from 𝑅𝑑 → 𝑅𝑑 . 𝑅𝑑 denotes a d-dimensional range. 𝑃𝑋 represents the probability distribution for the random variable x, and 𝑃𝑌 represents the probability distribution for the random variable y.

2.2 Normalizing Flow

The flow-based generative model applies the invertible function 𝑓𝑖(∙) to the latent variable 𝑧 to model the random variable for the given data x by employing the change of variables theorem. In other words, it is possible to model complex probability distributions by calculating 𝑓 with inverse functions in any easy-to-find distribution 𝑧.

𝑥 = 𝑧𝐾 = 𝑓𝐾 ◦ 𝑓𝐾−1 ◦ … . ◦ 𝑓1(𝑧0)       (4)

\(\begin{aligned}\log P(x)=\log P_{K}\left(z_{K}\right)=\log P_{0}\left(z_{0}\right)-\sum_{i=1}^{K} \log \left|\operatorname{det} \frac{\partial f_{i}}{\partial z_{i-1}}\right|\end{aligned}\)       (5)

As shown in Equation (4), the transformation process of the latent variable 𝑧𝑖 = 𝑓𝑖 (𝑧𝑖−1) is called the flow. Also, as shown in Equation (5), the entire variable transformation process for modeling the data 𝑥 is performed. Collectively, it is defined as normalizing flow. During the training of the flow-based generative model, the process maximizes logP(𝑥) in Equation (5). Also, the process of calculating the Jacobian determinant of the variable transformation function 𝑓𝑖(∙) is included. If the Jacobian determinant becomes complex, the computation load increases, and the computation speed decreases. Therefore, the variable invertible function 𝑓𝑖(∙) is modeled in a form that is easy to obtain the Jacobian determinant.

𝑧𝐾 represents 𝑘 latent variables, 𝑓𝑘(∙) means the reversible transform function when i = k. And 𝑃(𝑥) is probability distribution.

2.3 Flow-based generative model

A flow-based deep-learning generative model directly models the probability distribution 𝑃(𝑥). That is, the objective function of the model, 𝐿(𝐷) can be obtained as a negative log-likelihood for the training data D, as shown below:

\(\begin{aligned}L(D)=-\frac{1}{|D|} \sum_{x \in D} \log P(x)\end{aligned}\)       (6)

To calculate Equation (6), P(x) must be transformed into an inverse function, and a variable transformation function of a form that is easy to obtain the Jacobian determinant must be formulated. For this purpose, the additive coupling layer, which is the most basic form of the bipartite flow series generative model [15], was proposed. The artificial neural network is learned by stacking layers, as shown in Equation (7). The part converted by 𝑚(∙) is crossed layer by layer to enable the modeling of all the dimensions of data X, where 𝑚(∙) means a complex function, and 𝑥1 and 𝑥2 are values obtained by splitting the input X.

\(\begin{aligned}L\left\{\begin{array}{c}y_{1=} x_{1} \\ y_{2=}=x_{2}+m\left(x_{1}\right)\end{array} \Leftrightarrow \quad\left\{\begin{array}{c}x_{1}=y_{1} \\ x_{2=} y_{2}-m\left(y_{1}\right)\end{array}\right.\right.\end{aligned}\)       (7)

In [16], an affine coupling layer that divides D-dimensional data X into 𝑥1, 𝑥2 and computes them was represented.

𝑥1 ∈ 𝑅𝑑, 𝑥1 ∈ 𝑅𝐷−𝑑, 𝑠,𝑡 ∶ 𝑅𝑑 → 𝑅𝐷−d

\(\begin{aligned}\left\{\begin{array}{c}y_{1}=x_{1} \\ y_{2}=x_{2} \odot \exp \left(s\left(x_{1}\right)\right)+t\left(x_{1}\right)\end{array}\right.\end{aligned}\)       (8)

\(\begin{aligned}\left\{\begin{array}{c}x_{1}=y_{1} \\ x_{2}=\left(y_{2}-t\left(y_{1}\right)\right) \odot \exp \left(-s\left(y_{1}\right)\right)\end{array}\right.\end{aligned}\)       (9)

\(\begin{aligned}J=\left[\begin{array}{lc}I_{d} & o \\ \frac{\partial y_{2}}{\partial x_{1}} & \operatorname{diag}\left(\exp \left(s\left(x_{1}\right)\right)\right)\end{array}\right]\end{aligned}\)       (10)

s and t are learned using an artificial neural network, and the transformation function is defined by Equation (8). Therefore, the Jacobian determinant is expressed in the form of a lower trigonometric function, and the matrix operation of a triangular matrix can be easily calculated by multiplying the diagonal matrix.

s and t stand for scale, translation and are functions from 𝑅𝑑 → 𝑅𝐷−𝑑(d < D). ⊙ represents element-wise product, 𝐼𝑑 means a d-dimensional identity matrix.

3. Hierarchical Anomaly Detection Model

As shown in Fig. 1, the proposed AD model consists of an encoder, which extracts input data features and a decoder, which is a normalizing flow part.

E1KOBZ_2023_v17n6_1516_f0001.png 이미지

Fig. 1. Overview of the proposed hierarchical anomaly detection model

3.1 Encoder

In this study, the feature extractor is a model based on a convolution neural network (CNN). This model was pretrained using ImageNet [17]. A receptive field is an efficient feature of a CNN-based encoder. Since abnormal data come in a variety of sizes and shapes that are not standardized, they must be processed using a receptive field. A CNN-based feature extractor is important in this process. The feature map has high resolution and low-level features such as edges, curves, and straight lines, close to the input layer. On the other hand, the feature map obtained from a deeper layer has low resolution and extracts high-level features, which can infer a class such as texture, pattern, or part of an object. As shown in Fig. 1, in this paper, the data are extracted from the middle layer by varying the feature size. These data are used as input data. The images used as encoder input data have the original size and 1/2 of the original size. These two images are fed into each feature extractor, and two features of varying sizes are hierarchically extracted from the intermediate layer of the feature extractor. Using channel concatenation, the second feature extracted from the original size image and the first feature extracted from the original ½-size image are combined into one feature. The decoder receives three features of varying sizes as inputs.

3.2 Decoder

In this paper, the cross-scale flow is used to process feature maps of different sizes that interact with each other. The cross-scale-flow method performs an extended affine transformation by employing the real valued non-volume preserving transformation (Real NVP) architecture [16] based on the coupling layer introduced in subsection 2.3. It internally divides each input tensor 𝑦(i)in equally into 𝑦(i)in,1 and 𝑦(i)in,2. This part calculates 𝑦(i)out,1, 𝑦(i)out,2 by regressing the element-wise scale and the shift parameters applied successively to each of them to obtain the output. As shown in Fig. 2, the element-wise scale and shift parameters are estimated by combining the individual subnetwork 𝑟1 and 𝑟2 of the blocks, which are divided into [𝑠1, 𝑡1] and [𝑠2, 𝑡2]. These can be expressed together, as shown in Equation (11), where 𝑠1 and 𝑠2 are scale parameters, and 𝑡1 and 𝑡2 are shift parameters. And ⊙ represents element-wise product.

E1KOBZ_2023_v17n6_1516_f0002.png 이미지

Fig. 2. Architecture of one block inside the cross-scale flow

𝑦out,2 = 𝑦in,2 ⊙ 𝑒𝑟1𝑠1(yin,1) + 𝑟1𝑡1(yin,1)

𝑦out,1 = 𝑦in,1 ⊙ 𝑒𝑟2𝑠2(yout,2) + 𝑟2𝑡2(yout,2)       (11)

4. Data analysis

In this study, the operation sound signals of a car’s side-view mirror motor were used as the experimental data. The external environmental sound may act as a hindrance factor in the analysis of the motion sound. Therefore, the recording was carried out in a recording studio equipped with soundproofing facilities to obtain accurate experimental data. The motor was fixed using a jig made to record the operation sound of the side-view mirror motor, and the acquired audio signal was saved in a wave format using a self-developed Python program. The motor consists of a normal motor and three types of abnormal motors. The motor sound has a total operation time of 7 seconds, including folding (3 seconds), unfolding (3 seconds), and an operation standby (1 second). The distinction between normal and defective motor gearboxes used in the experiment was determined by an experienced inspector working on the actual production line by directly listening to the operation sound. As shown in Table 1, a defect identification number was assigned according to the type of defect.

Table 1. Fault numbers of different noise types

E1KOBZ_2023_v17n6_1516_t0001.png 이미지

Mel spectrograms of normal data and three types of abnormal data are shown in Fig. 3. Fig. 3 (a) shows the normal motor operation sound. Shaft misalignment noise, also known as Fault 1, is a bad rotational fricative noise caused by the rotation shaft of a gear not being precisely aligned. This is a 'buzzing' noise, as illustrated in Fig. 3 (b).

E1KOBZ_2023_v17n6_1516_f0003.png 이미지

Fig. 3. Waveform and Mel-spectrogram of car’s side view mirror operation sound, (a) Normal sound; (b) Fault 1 sound; (c) Fault 2 sound; (d) Fault 3 sound.

Tooth scratch noise, classified as Fault 2, is a periodic bad friction noise caused by scratches in the slope of the toothed gear during rotation. Usually, it only happens once, when folding or unfolding the mirror. However, it can happen twice, when there are dents or ruptures in the toothed gear teeth. It can be identified as a periodic 'tick-tock' pattern, resulting in a regular noise form, as illustrated in Fig. 3 (c).

Shaft thrust noise, also known as Fault 7, is a thrusting sound produced when a motor shaft hits the motor wall. A single 'click' noise is generated at the start of folding or unfolding, and it has a different pattern than the normal sound, as shown in Fig. 3 (d). The motion sound was saved in a wave file using a 44.1-kHz sampling rate and converted into a Mel-spectrogram image [18] using an audio library (torchaudio) of PyTorch. The training data were divided into folding and unfolding motion sounds. The option value of torchaudio was set to n_fft = 4096, win_length = 2048, hop_length = 250, and n_mels = 700 to create an image file size of 162 × 375 pixels. The generated image files consisted of 327 normal data and 100 abnormal data, 228 normal data for training and 99 normal data, and 100 abnormal data for testing.

5. Experiments

5.1 Experimental Setup

The data used in the experiment were resized to 256 × 384 without cropping. The area-under-the-receiver operating characteristic (AUROC) curve was used as an evaluation metric. The cross-scale flow included four blocks; the optimizer used was Adam [19], the learning rate was set to 2 × 10−4, the weight decay was set to 10−5, the batch size was set to 64, and the epoch was set to 240.

The pretrained feature extractors used were Resnet 18 [20], WideResnet50 [21], MobileNetv3_Large [22], and MobileNetv3_Small [22]. The experiment was conducted three times, and the average values of the experimental results were obtained.

5.2 Experimental Results

5.2.1 Results according to the input data type

To examine the effect of the hierarchical data extraction method on the performance of the defect detection system in the proposed model, three types of input data were used in the experiments.

These data types can be classified according to the CS-Flow [23] method; a multiscale type with three different sizes of the input image, a multifeature type, which hierarchically extracts features of different sizes from the middle layer when a single-size data input passes through a feature extractor, and a hybrid type, which is a combination of the multiscale type and the multifeature type. The sizes used for the multiscale model were set to 256 × 384 pixels, 128 × 192 pixels, and 64 × 96 pixels. The feature sizes used in the multifeature model were (32, 48), (16, 24), and (8, 12). The hybrid type input size was used by extracting features of (16, 24), (8, 12), and (4, 6) from 256 × 384 pixels and 128 × 192 pixels sized images.

As shown in Table 2, the experimental results showed that the model that uses the hybrid type input data (by employing ResNet18 as a backbone) exhibits the highest performance. We observed that the performance of this model can be improved by employing the multifeature type rather than the multiscale type. However, by employing only the multifeature type, some performance degradation was observed compared to the hybrid type.

Table 2. AUROC of our dataset for the proposed model according to the input data type and feature extractor

E1KOBZ_2023_v17n6_1516_t0002.png 이미지

A histogram illustrating the distribution of testing data according to the type of data used (multifeature, multiscale, and hybrid) using ResNet18 as a backbone is shown in Fig. 4.

E1KOBZ_2023_v17n6_1516_f0004.png 이미지

Fig. 4. Distribution histogram according to types of input data

Fig. 4 (a) (multiscale type) shows that both the abnormal and normal data are widely spread, resulting in many overlapping parts. On the other hand, Fig. 4 (b) (multifeature type) shows that both the normal and abnormal data are well concentrated on one side. Fig. 4 (c) shows that the distribution of normal data is narrowly saturated, and some overlapping sections are observed. However, both the normal and abnormal data are more widely separated than those in Fig. 4 (a), indicating that various outliers have been identified well.

Reducing the size of input data is similar to the pooling process in the CNN standard. This is like the subsampling process. As the parameters are reduced, the representation of the network is also reduced. Thus, overfitting can be suppressed. However, this may reduce the network confidence. Therefore, by employing only the multiscale type, the amount of information in the data decreases, and the network representation also decreases. Consequently, the data distribution diverges because the characteristics cannot be clearly expressed in the generated data. However, by employing the multifeature type, relatively few pooling processes are required because features are hierarchically selected at the intermediate stage of the layer. Therefore, various levels of low-dimensional and high-dimensional information are obtained, so that it is possible to learn both local and global information about the data. Thus, the data tend to be distributed in a convergent direction. However, in our data, there was little difference between normal and abnormal data. By employing the multifeature type, many overlapping parts in the normal distribution and the defective distribution can be observed. Therefore, a multiscale factor was added to separate the distribution of normal and abnormal data. The size of the input data was configured differently, and features were extracted and used from the middle layer of the layer. Hierarchically extracted data has features of various dimensions, and this process improved the accuracy of defect detection.

The effect of the number of blocks on the performance was also investigated. As shown in Table 3, the experimental performance increases by employing up to four blocks. After that, it decreases.

Table 3. AUROC of our dataset of the proposed model according to the number of blocks inside the cross-scale flow

E1KOBZ_2023_v17n6_1516_t0005.png 이미지

5.2.2 Comparison results using MVTec AD

Table 4 shows the results obtained when applying MVTec AD [24] data to the proposed model. The encoder backbone network was set to Resnet18, and the input image size was set to 256 × 384, which is the same size used in our data images.

Table 4. Comparison of AUROC result values according to input data types applying MVTec AD to the proposed model

E1KOBZ_2023_v17n6_1516_t0003.png 이미지

MVTec AD are data acquired through real industrial sensors for AD research. They contain 15 classes, 3629 pieces of training data, and 1725 pieces of testing data. Since these data are created for research purposes, there are also images produced in artificial environments such as lighting environments. The abnormal image is created based on a real environment scenario, and the ground truth is clear. In contrast, in our data, the number of training data is smaller than that of MVTec AD, and there is little difference between normal and abnormal data. Also, since the ground truth was determined by the inspector, there may be a mixture of defects in the normal training dataset itself or a mixture of normal data in the bad testing dataset. Consequently, the performance of the MVTec AD with clear ground truth was higher than that in the experimental results of our data. The reason for the performance difference is the specificity of the dataset.

However, it can be confirmed that the proposed model works well not only on our data but also on a public dataset, although its performance is not state-of-the-art (SOTA).

5.2.3 Comparison results applying our dataset to other AD models

We applied our dataset to other AD models. AD models that had performed well on research datasets,such as MVTec AD, performed poorly on our dataset. Thisindicatesthat the proposed AD model operates well, reflecting the unique characteristics of our dataset. Table 5 shows the results.

Table 5. AUROC comparison of anomaly detection models using MVTec AD data and our data

E1KOBZ_2023_v17n6_1516_t0004.png 이미지

6. Conclusion

A motor gearbox defect detection system employing a hierarchical flow-based AD model was proposed. The proposed model has a network structure, in which the feature extractor extracts features of different sizes hierarchically from the layering stage, uses them as inputs to the decoder, and learns the data distribution through cross-scale flow. In this study, the operation sound of a small-actuator motor of a car’s side-view mirror was used as an experimental dataset. We analyzed the training data and proposed an AD model based on the data properties. This study can be used in the design of a model based on data generated in an actual manufacturing process.

Since the performance of the SOTA model has been verified using research data, it was difficult to achieve high performance by applying it to actual data. However, we studied AD using data from the car’s side-view motors as experimental data. Therefore, our results can be useful in designing fault-detection models for production lines of small motors.

In a future study, we will investigate the tradeoff relationship between input data resize and hierarchical features in the proposed AD model. If a tradeoff relationship between the two is identified, it is expected that a high-accuracy model can be built.

References

  1. Josh Patterson and Adam Gibson, "Deep Learning: A Practitioner's Approach," O'Reilly Media, Inc. 2017
  2. VARUN CHANDOLA, ARINDAM BANERJEE, VIPIN KUMAR "Anomaly Detection : A Survey," ACM Computing Surveys, September 2009.
  3. W. Zhang, W. Guo, X. Liu, Y. Liu, J. Zhou, B. Li, Q. Lu and S. Yang, "LSTM-Based Analysis of Industrial IoT Equipment," IEEE Access, Vol.6, pp.23551-23560, 2018. https://doi.org/10.1109/ACCESS.2018.2825538
  4. A. Gaddam, T. Wilkin, M. Angelova and J. Gaddam, "Detecting Sensor Faults, Anomalies and Outliers in Internet of Things: A Survey on the Challenges and Solutions," Electronics, Vol.9 No.3, 2020.
  5. D. Y. Oh and I. D. Yun, "Residual Error Based Anomaly Detection Using Auto-Encoder in SMD Machine Sound," Sensors, 18, 1308, 2018.
  6. K. Suefusa, T. Nishida, H. Purohit, R. Tanabe, T. endo, and Y. Kawaguchi, "Anomalous Sound Detection Based on Interpolation Deep Neural Network," in Proc. IEEE ICASP, 271-275, 2020.
  7. R. Lang, R. Lu, C. Zhao, H. Qin, and G. Liu, "Graph based semi-supervised one class support vector machine for detecting abnormal lung sounds," Applied Mathematics and Computation, Vol. 364, 124487, 2020.
  8. R. Banerjee and A. Ghose, "A Semi-Supervised Approach for Identifying Abnormal Heart Sounds Using Variational Autoencoder," in Proc. IEEE ICASP, 1249-1253, 2020.
  9. D. Li, D. Chen, J. Goh, S.K. Ng, "Anomaly Detection with Generative Adversarial Networks for Multivariate Time Series," arXiv preprint arXiv:1809.04758, 2018.
  10. Raghavendra Chalapathy and Sanjay Chawla, "Deep Learning for Anomaly Detection: A Survey," CoRR, (abs/1901.03407), 2019.
  11. G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, "Deep Learning for Anomaly Detection: A Review," ACM Computing Surveys (CSUR), 54, 1-38, 2021. https://doi.org/10.1145/3439950
  12. L. Ruff, R. A. Vandermeulen, L. Deecke, S. A. Siddiqui, A. Binder, E. Muller, and M. Kloft, "Deep One-Class Classification," in Proc. of Int. Conf. on machine learning (PMLR), 4393-4402, 2018.
  13. Liron Bergman, Niv Cohen, Yedid Hoshen, "Deep Nearest Neighbor Anomaly Detection," arXiv:2002.10445, 2020.
  14. Rezende, Danilo Jimenez, and Shakir Mohamed, "Variational Inference with Normalizing Flows," arXiv preprint arXiv:1505.05770, 2015.
  15. Dinh, Laurent, David Krueger, and Yoshua Bengio, "Nice: Non-linear Independent Components Estimation," in Proc. of ICLR 2015, 2015.
  16. Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio, "Density estimation using Real NVP," arXiv preprint arXiv:1605.08803, 2016.
  17. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, "ImageNet: A large-scale hierarchical image database," in Proc. of 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.
  18. Wongeun Oh, "Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods," The Journal of the Acoustical Society of Korea, pp. 143-149, 31 May 2020. https://doi.org/10.7776/ASK.2020.39.3.143
  19. Diederik P Kingma and Jimmy Ba, "Adam: A Method for Stochastic Optimization," in Proc. of International Conference on Learning Representations (ICLR), 2015.
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Deep Residual Learning for Image Recognition," in Proc. of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
  21. S Zagoruyko and N Komodakis, "Wide Residual Networks," in Proc. of the British Machine Vision Conference (BMVC), pp. 87.1-87.12, 2016.
  22. Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam, "Searching for MobileNetV3," in Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314-1324, 2019.
  23. Marco Rudolph, Tom Wehrbein, Bodo Rosenhahn, Bastian Wandt, "Fully Convolutional Cross-Scale-Flows for Image-based Defect Detection," arXiv:2110.02855, 2021.
  24. Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger, "Mvtec AD-A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9592-9600, 2019.
  25. Jiawei Yu, Ye Zheng, Xiang Wang, Wei Li, Yushuang Wu, Rui Zhao, Liwei Wu, "FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows," arXiv:2111.07677, 2021.
  26. Gudovskiy, D., Ishizaka, S., and Kozuka, K, "CFLOW AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows," arXiv preprint arXiv:2107.12571, 2021.
  27. Marco Rudolph, Bastian Wandt, and Bodo Rosenhahn, "Same but DifferNet: Semi-Supervised Defect Detection With Normalizing Flows," in Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1907-1916, 2021.
  28. Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier, "PaDim: A Patch Distribution Modeling Framework for Anomaly Detection and Localization," in Proc. of pattern Recognition, ICPR International Workshops and Challenges, 2020.
  29. Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister, "CutPaste: Self-Supervised Learning for Anomaly Detection and Localization," in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  30. Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Scholkop, Thomas Brox, Peter Gehler, "Towards Total Recall in Industrial Anomaly Detection," Computer Vision and Pattern Recognition (CVPR), 2022.