Gear Fault Diagnosis Using Deep Transfer Learning Based on Hilbert-Huang Spectrum

In the context of Industry 4.0, industrial equipment has become increasingly complex and intelligent. Gearboxes, as indispensable components of mechanical systems, determine operational reliability, with gear failures accounting for 60% of gearbox malfunctions. Consequently, gear fault diagnosis is critical for enhancing equipment dependability and reducing production costs. Recent advances in artificial intelligence have propelled deep learning into prominence for mechanical fault diagnosis. However, deep learning models like Convolutional Neural Networks (CNNs) face limitations under small-sample conditions, leading to overfitting. This study addresses this challenge by integrating Hilbert-Huang Transform (HHT) with a pre-trained VGG16 model, leveraging transfer learning for high-accuracy gear fault diagnosis with minimal data.

Hilbert-Huang Transform for Feature Extraction

HHT is an adaptive signal processing technique comprising Empirical Mode Decomposition (EMD) and Hilbert Transform, eliminating the need for predefined basis functions. EMD decomposes vibration signals into Intrinsic Mode Functions (IMFs) based on intrinsic time scales:

$$x(t) = \sum_{i=1}^{n} IMF_i(t) + r_n(t)$$

where $x(t)$ is the original signal, $IMF_i(t)$ are the IMFs, and $r_n(t)$ is the residual. The IMF with the highest correlation to the original signal is selected for Hilbert Transform to generate the Hilbert-Huang spectrum. The correlation coefficient $\rho_i$ is calculated as:

$$\rho_i = \frac{\sum_{j=1}^{L} (x_j – \bar{x})(R_{i,j} – \bar{R_i})}{\sqrt{\sum_{j=1}^{L} (x_j – \bar{x})^2 \sum_{j=1}^{L} (R_{i,j} – \bar{R_i})^2}}$$

where $R_i$ denotes the $i$-th IMF, $L$ is the signal length, $\bar{x}$ and $\bar{R_i}$ are means. For the selected IMF $I(t)$, its Hilbert Transform $y(t)$ is:

$$y(t) = \frac{1}{\pi} P \int_{-\infty}^{\infty} \frac{I(\tau)}{t – \tau} d\tau$$

where $P$ is the Cauchy principal value. The analytic signal $z(t)$ is then constructed as:

$$z(t) = I(t) + jy(t) = a(t)e^{j\theta(t)}$$

with instantaneous amplitude $a(t)$ and phase $\theta(t)$:

$$a(t) = \sqrt{I^2(t) + y^2(t)}, \quad \theta(t) = \arctan\left(\frac{y(t)}{I(t)}\right)$$

The instantaneous frequency $\omega(t)$ is derived as $\omega(t) = d\theta(t)/dt$, yielding the Hilbert-Huang spectrum $H(\omega, t)$:

$$H(\omega, t) = \text{Re} \sum a(t)e^{j \int \omega(t) dt}$$

This spectrum captures time-frequency-energy relationships, providing rich features for fault characterization in gear technology.

Transfer Learning Framework with Optimized VGG16

Transfer learning mitigates data scarcity by repurposing knowledge from a source domain (e.g., ImageNet) to a target domain (gear fault diagnosis). The VGG16 model—pre-trained on 1,000 classes and 1 million images—is adapted as follows:

Model Architecture: VGG16 comprises 13 convolutional layers grouped into five blocks and three fully connected (FC) layers. To reduce complexity and overfitting risks, Global Average Pooling (GAP) replaces the first two FC layers (Fig. 1). GAP compresses spatial dimensions by averaging each feature map:

$$P_k = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} f_k(i, j)$$

where $f_k$ is the $k$-th feature map of size $H \times W$. This reduces parameters by 89% compared to FC layers.

Migration Strategy: Convolutional blocks (1-5) are frozen to retain pre-trained weights. The modified top layers (GAP + softmax) are trained on gear-specific data. Cross-entropy loss with L2 regularization optimizes the model:

$$\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{M} y_{ic} \ln(p_{ic}) + \frac{\lambda}{2} \|\mathbf{W}\|^2$$

where $\lambda$ is the penalty coefficient, and $\mathbf{W}$ denotes weights.

Miter gears used in experimental validation

Methodology

The proposed framework (Fig. 2) involves:

Data Acquisition: Vibration signals are collected under variable loads using triaxial accelerometers.
Signal Processing: Signals are segmented into 20,480-point samples. EMD decomposes each segment, and the highest-correlation IMF undergoes HHT to generate 224×224-pixel Hilbert-Huang spectra.
Model Training: Spectra are split 7:3 into training/validation sets. The model is fine-tuned using SGD with momentum (initial learning rate: $10^{-4}$, decay factor: 0.7 every 4 epochs).

Experimental Validation

Data from the DPS fault diagnosis platform (Fig. 3) includes five gear conditions: healthy, wear, tooth fracture, root crack, and missing tooth. Vibration signals were acquired at 20,480 Hz under six loads (0–9 N·m) and a 30 Hz motor speed.

Table 1: Mixed Dataset Distribution
Fault Type	Load (N·m)	Training Set	Validation Set
Healthy	0,1,3,5,7,9	134	58
Wear	0,1,3,5,7,9	134	58
Tooth Fracture	0,1,3,5,7,9	134	58
Root Crack	0,1,3,5,7,9	134	58
Missing Tooth	0,1,3,5,7,9	134	58

Hyperparameter Optimization: Key parameters were tuned (Table 2). A learning rate of $10^{-4}$ and L2 penalty ($\lambda = 10^{-3}$) achieved optimal accuracy.

Table 2: Hyperparameter Selection
Learning Rate	Validation Accuracy	$\lambda$	Validation Accuracy
$3 \times 10^{-4}$	92.76%	$1 \times 10^{-2}$	94.14%
$1 \times 10^{-4}$	97.24%	$5 \times 10^{-3}$	96.55%
$3 \times 10^{-5}$	94.48%	$1 \times 10^{-3}$	98.97%
$1 \times 10^{-5}$	93.10%	$5 \times 10^{-4}$	97.24%
–	–	$1 \times 10^{-4}$	95.52%

Model Efficacy: The GAP-optimized VGG16 achieved 98.86% average accuracy over 10 trials (Table 3), outperforming benchmarks like TLCNN (90.10%) and Tran VGG-19 (96.67%). Training time reduced by 24.19% versus vanilla VGG16 (29.15 min vs. 38.45 min).

Table 3: Performance Comparison of Diagnostic Methods
Method	Average Accuracy	Standard Deviation
TCNN	91.35%	1.06%
TLCNN	90.10%	1.34%
1D-CNN	91.86%	1.12%
Tran VGG-19	96.67%	0.91%
Proposed Method	98.86%	0.33%

The confusion matrix (Table 4) confirms robustness, with severe faults (e.g., missing tooth) identified at 100% accuracy.

Table 4: Confusion Matrix for Gear Fault Diagnosis (%)
Actual \ Predicted	Healthy	Wear	Tooth Fracture	Root Crack	Missing Tooth
Healthy	99.1	0.9	0	0	0
Wear	1.2	97.3	0.5	0.5	0.5
Tooth Fracture	0	0	100	0	0
Root Crack	0	0.8	0	98.7	0.5
Missing Tooth	0	0	0	0	100

Conclusion

This study presents a novel gear fault diagnosis method combining HHT-based feature extraction with deep transfer learning. Key innovations include:

Adapting HHT to convert vibration signals into discriminative Hilbert-Huang spectra, enhancing feature representation for gear technology applications.
Optimizing VGG16 via Global Average Pooling, reducing parameters by 89% while improving accuracy and training efficiency.
Achieving 98.86% diagnosis accuracy under variable loads with minimal data, outperforming state-of-the-art models.

The approach demonstrates significant potential for industrial deployment, particularly where fault samples are scarce. Future work will explore real-time embedded implementations and multi-sensor fusion to further advance gear technology reliability.