In the context of Industry 4.0, industrial equipment has become increasingly complex and intelligent. Gearboxes, as indispensable components of mechanical systems, determine operational reliability, with gear failures accounting for 60% of gearbox malfunctions. Consequently, gear fault diagnosis is critical for enhancing equipment dependability and reducing production costs. Recent advances in artificial intelligence have propelled deep learning into prominence for mechanical fault diagnosis. However, deep learning models like Convolutional Neural Networks (CNNs) face limitations under small-sample conditions, leading to overfitting. This study addresses this challenge by integrating Hilbert-Huang Transform (HHT) with a pre-trained VGG16 model, leveraging transfer learning for high-accuracy gear fault diagnosis with minimal data.
Hilbert-Huang Transform for Feature Extraction
HHT is an adaptive signal processing technique comprising Empirical Mode Decomposition (EMD) and Hilbert Transform, eliminating the need for predefined basis functions. EMD decomposes vibration signals into Intrinsic Mode Functions (IMFs) based on intrinsic time scales:
$$x(t) = \sum_{i=1}^{n} IMF_i(t) + r_n(t)$$
where \(x(t)\) is the original signal, \(IMF_i(t)\) are the IMFs, and \(r_n(t)\) is the residual. The IMF with the highest correlation to the original signal is selected for Hilbert Transform to generate the Hilbert-Huang spectrum. The correlation coefficient \(\rho_i\) is calculated as:
$$\rho_i = \frac{\sum_{j=1}^{L} (x_j – \bar{x})(R_{i,j} – \bar{R_i})}{\sqrt{\sum_{j=1}^{L} (x_j – \bar{x})^2 \sum_{j=1}^{L} (R_{i,j} – \bar{R_i})^2}}$$
where \(R_i\) denotes the \(i\)-th IMF, \(L\) is the signal length, \(\bar{x}\) and \(\bar{R_i}\) are means. For the selected IMF \(I(t)\), its Hilbert Transform \(y(t)\) is:
$$y(t) = \frac{1}{\pi} P \int_{-\infty}^{\infty} \frac{I(\tau)}{t – \tau} d\tau$$
where \(P\) is the Cauchy principal value. The analytic signal \(z(t)\) is then constructed as:
$$z(t) = I(t) + jy(t) = a(t)e^{j\theta(t)}$$
with instantaneous amplitude \(a(t)\) and phase \(\theta(t)\):
$$a(t) = \sqrt{I^2(t) + y^2(t)}, \quad \theta(t) = \arctan\left(\frac{y(t)}{I(t)}\right)$$
The instantaneous frequency \(\omega(t)\) is derived as \(\omega(t) = d\theta(t)/dt\), yielding the Hilbert-Huang spectrum \(H(\omega, t)\):
$$H(\omega, t) = \text{Re} \sum a(t)e^{j \int \omega(t) dt}$$
This spectrum captures time-frequency-energy relationships, providing rich features for fault characterization in gear technology.
Transfer Learning Framework with Optimized VGG16
Transfer learning mitigates data scarcity by repurposing knowledge from a source domain (e.g., ImageNet) to a target domain (gear fault diagnosis). The VGG16 model—pre-trained on 1,000 classes and 1 million images—is adapted as follows:
- Model Architecture: VGG16 comprises 13 convolutional layers grouped into five blocks and three fully connected (FC) layers. To reduce complexity and overfitting risks, Global Average Pooling (GAP) replaces the first two FC layers (Fig. 1). GAP compresses spatial dimensions by averaging each feature map:
- Migration Strategy: Convolutional blocks (1-5) are frozen to retain pre-trained weights. The modified top layers (GAP + softmax) are trained on gear-specific data. Cross-entropy loss with L2 regularization optimizes the model:
$$P_k = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} f_k(i, j)$$
where \(f_k\) is the \(k\)-th feature map of size \(H \times W\). This reduces parameters by 89% compared to FC layers.
$$\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{M} y_{ic} \ln(p_{ic}) + \frac{\lambda}{2} \|\mathbf{W}\|^2$$
where \(\lambda\) is the penalty coefficient, and \(\mathbf{W}\) denotes weights.

Methodology
The proposed framework (Fig. 2) involves:
- Data Acquisition: Vibration signals are collected under variable loads using triaxial accelerometers.
- Signal Processing: Signals are segmented into 20,480-point samples. EMD decomposes each segment, and the highest-correlation IMF undergoes HHT to generate 224×224-pixel Hilbert-Huang spectra.
- Model Training: Spectra are split 7:3 into training/validation sets. The model is fine-tuned using SGD with momentum (initial learning rate: \(10^{-4}\), decay factor: 0.7 every 4 epochs).
Experimental Validation
Data from the DPS fault diagnosis platform (Fig. 3) includes five gear conditions: healthy, wear, tooth fracture, root crack, and missing tooth. Vibration signals were acquired at 20,480 Hz under six loads (0–9 N·m) and a 30 Hz motor speed.
Fault Type | Load (N·m) | Training Set | Validation Set |
---|---|---|---|
Healthy | 0,1,3,5,7,9 | 134 | 58 |
Wear | 0,1,3,5,7,9 | 134 | 58 |
Tooth Fracture | 0,1,3,5,7,9 | 134 | 58 |
Root Crack | 0,1,3,5,7,9 | 134 | 58 |
Missing Tooth | 0,1,3,5,7,9 | 134 | 58 |
Hyperparameter Optimization: Key parameters were tuned (Table 2). A learning rate of \(10^{-4}\) and L2 penalty (\(\lambda = 10^{-3}\)) achieved optimal accuracy.
Learning Rate | Validation Accuracy | \(\lambda\) | Validation Accuracy |
---|---|---|---|
\(3 \times 10^{-4}\) | 92.76% | \(1 \times 10^{-2}\) | 94.14% |
\(1 \times 10^{-4}\) | 97.24% | \(5 \times 10^{-3}\) | 96.55% |
\(3 \times 10^{-5}\) | 94.48% | \(1 \times 10^{-3}\) | 98.97% |
\(1 \times 10^{-5}\) | 93.10% | \(5 \times 10^{-4}\) | 97.24% |
– | – | \(1 \times 10^{-4}\) | 95.52% |
Model Efficacy: The GAP-optimized VGG16 achieved 98.86% average accuracy over 10 trials (Table 3), outperforming benchmarks like TLCNN (90.10%) and Tran VGG-19 (96.67%). Training time reduced by 24.19% versus vanilla VGG16 (29.15 min vs. 38.45 min).
Method | Average Accuracy | Standard Deviation |
---|---|---|
TCNN | 91.35% | 1.06% |
TLCNN | 90.10% | 1.34% |
1D-CNN | 91.86% | 1.12% |
Tran VGG-19 | 96.67% | 0.91% |
Proposed Method | 98.86% | 0.33% |
The confusion matrix (Table 4) confirms robustness, with severe faults (e.g., missing tooth) identified at 100% accuracy.
Actual \ Predicted | Healthy | Wear | Tooth Fracture | Root Crack | Missing Tooth |
---|---|---|---|---|---|
Healthy | 99.1 | 0.9 | 0 | 0 | 0 |
Wear | 1.2 | 97.3 | 0.5 | 0.5 | 0.5 |
Tooth Fracture | 0 | 0 | 100 | 0 | 0 |
Root Crack | 0 | 0.8 | 0 | 98.7 | 0.5 |
Missing Tooth | 0 | 0 | 0 | 0 | 100 |
Conclusion
This study presents a novel gear fault diagnosis method combining HHT-based feature extraction with deep transfer learning. Key innovations include:
- Adapting HHT to convert vibration signals into discriminative Hilbert-Huang spectra, enhancing feature representation for gear technology applications.
- Optimizing VGG16 via Global Average Pooling, reducing parameters by 89% while improving accuracy and training efficiency.
- Achieving 98.86% diagnosis accuracy under variable loads with minimal data, outperforming state-of-the-art models.
The approach demonstrates significant potential for industrial deployment, particularly where fault samples are scarce. Future work will explore real-time embedded implementations and multi-sensor fusion to further advance gear technology reliability.