In modern mechanical systems, bevel gears play a critical role in transmitting power between non-parallel shafts, offering high efficiency and torque capacity. However, operating under heavy loads and in enclosed environments, bevel gears are prone to failures that can lead to significant downtime and safety hazards. Traditional fault diagnosis methods for bevel gears often rely on manual feature extraction and extensive labeled datasets, which are time-consuming and impractical in real-world scenarios where data is scarce. The challenge is exacerbated under variable speed conditions, where fault characteristics become non-stationary and difficult to detect. To address these issues, we propose a deep learning-based approach that leverages one-dimensional convolutional neural networks (1D CNNs) and transfer learning to diagnose faults in bevel gears with limited samples. Our method focuses on extracting deep features from raw vibration signals and adapting pre-trained models to new operating conditions, ensuring robust performance even with small datasets.
The core of our approach lies in the ability of convolutional neural networks to automatically learn hierarchical features from input data. A typical CNN consists of multiple layers, including convolutional layers, pooling layers, fully connected layers, and an output layer. For one-dimensional signals like vibration data from bevel gears, the convolutional layer applies filters to extract local patterns. The operation at the i-th layer can be expressed as:
$$X_i = f(a_i X_{i-1} + b_i)$$
where \(a_i\) is the weight matrix, \(X_{i-1}\) is the input matrix, \(b_i\) is the bias matrix, and \(f\) is the activation function, often chosen as ReLU for its efficiency. Following convolution, pooling layers reduce dimensionality by selecting prominent features. Max pooling, which outputs the maximum value in a region, is defined as:
$$P^k_i(j) = \max_{(j-1)S+1 \leq t \leq jS} \{ q^k_i(t) \}$$
Here, \(P^k_i(j)\) is the pooled output, \(q^k_i(t)\) represents the values in the pooling region, and \(S\) is the stride. The fully connected layer then integrates these features for classification, with the output layer using Softmax to generate probability distributions:
$$G = f(a x + b)$$
Transfer learning enhances this framework by leveraging knowledge from a source domain (e.g., data from one speed condition) to improve learning in a target domain (e.g., a different speed condition) with limited data. Formally, if the source domain \(D_s\) has feature space \(A\) and probability distribution \(P(A)\), and the target domain \(D_t\) has task \(T_t\), transfer learning optimizes the predictive function \(f(\cdot)\) for \(D_t\) using insights from \(D_s\).
We designed a specialized 1D CNN model, termed 1CNN, for fault diagnosis in bevel gears. This model comprises 13 parameterized layers, including five convolutional layers, five pooling layers, two fully connected layers, and one output layer. The architecture alternates between convolution and pooling to progressively extract features from raw vibration signals. The first convolutional layer uses a larger kernel size (64×1) to capture broad patterns, while subsequent layers employ smaller kernels (3×1) for detailed feature extraction. Each convolutional layer is followed by batch normalization and a max-pooling layer with a kernel size of 2×1 to downsample the data. Dropout layers are incorporated to prevent overfitting, and the final layers use Softmax for classification. The model parameters are summarized in Table 1.
| Module | Layer | Type | Kernel Size | Kernel Number | Stride | Padding |
|---|---|---|---|---|---|---|
| M1 | 1 | Conv | 64×1 | 16 | 16 | Yes |
| 2 | BN | – | – | – | – | |
| 3 | Pool | 2×1 | 16 | 2 | – | |
| 4 | Dropout | – | – | – | No | |
| M2 | 5 | Conv | 3×1 | 32 | 1 | Yes |
| 6 | BN | – | – | – | – | |
| 7 | Pool | 2×1 | 32 | 2 | – | |
| 8 | Dropout | – | – | – | No | |
| M3 | 9 | Conv | 3×1 | 64 | 1 | Yes |
| 10 | BN | – | – | – | – | |
| 11 | Pool | 2×1 | 32 | 2 | – | |
| 12 | Dropout | – | – | – | No | |
| M4 | 13 | Conv | 3×1 | 64 | 1 | Yes |
| 14 | BN | – | – | – | – | |
| 15 | Pool | 2×1 | 32 | 2 | – | |
| 16 | Dropout | – | – | – | No | |
| M5 | 17 | Conv | 3×1 | 64 | 1 | Yes |
| 18 | BN | – | – | – | – | |
| 19 | Pool | 2×1 | 32 | 2 | – | |
| 20 | Dropout | – | – | – | No | |
| M6 | 21 | FC | 1000 | – | – | – |
| 22 | BN | – | – | – | – | |
| 23 | Dropout | – | – | – | – | |
| 24 | FC | 100 | – | – | – | |
| 25 | BN | – | – | – | – | |
| 26 | Dropout | – | – | – | – | |
| 27 | Softmax | 20 | – | – | – |
To implement transfer learning, we pre-train the 1CNN model on a source domain with ample data, then fine-tune it for a target domain with limited samples. Specifically, we freeze the parameters of modules M1 to M5 and adjust only the higher layers in M6 using a small learning rate. This approach allows the model to adapt to new conditions without overfitting, making it suitable for diagnosing faults in bevel gears under variable speeds.
We validated our method through experiments on a mechanical fault simulation testbed designed for bevel gears. The setup included a drive system, a bevel gearbox, and sensors to capture vibration data. The bevel gears tested had 18 teeth on the driving gear and 27 teeth on the driven gear, simulating common industrial configurations. Faults were introduced via electrical discharge machining to create conditions such as broken teeth and missing teeth, alongside healthy gears. Vibration signals were acquired using IEPE accelerometers mounted on the gearbox, with a sampling frequency of 20 kHz. Tests were conducted under four speed conditions, controlled by varying the inverter frequency (10 Hz, 20 Hz, 30 Hz, and 40 Hz), with a constant load applied via a magnetic brake.

Data collection involved recording one-minute vibration sequences for each fault type under each speed condition, resulting in 12 datasets of 1,200,000 data points each. To address the challenge of limited samples, we employed a sliding window overlapping sampling technique to augment the dataset. This method generates multiple samples from a continuous signal by shifting a window with a specified stride. The relationship between the total data points \(S\), the number of samples \(N\), and the stride \(K\) is given by:
$$S = (N – 1) K + 2048$$
Here, each sample consists of 2048 data points, corresponding to approximately two revolutions of the bevel gear, ensuring that periodic fault characteristics are captured. By varying the stride \(K\), we created datasets with different sample sizes, enabling us to study the impact of data quantity on diagnosis accuracy. For instance, with a stride of 100, we generated a large number of overlapping samples, while a stride of 1600 produced fewer samples. The datasets were split into training and testing sets, with details provided in Table 2.
| Condition (Frequency) | State | Training Samples | Testing Samples | Label |
|---|---|---|---|---|
| A (10 Hz) | Normal / Missing Tooth / Broken Tooth | 400 / 400 / 400 | 80 / 80 / 80 | 0 / 1 / 2 |
| B (20 Hz) | Normal / Missing Tooth / Broken Tooth | 400 / 400 / 400 | 80 / 80 / 80 | 3 / 4 / 5 |
| C (30 Hz) | Normal / Missing Tooth / Broken Tooth | 400 / 400 / 400 | 80 / 80 / 80 | 6 / 7 / 8 |
| D (40 Hz) | Normal / Missing Tooth / Broken Tooth | 400 / 400 / 400 | 80 / 80 / 80 | 9 / 10 / 11 |
We trained the 1CNN model on the source domain data (e.g., condition B) and then migrated it to the target domain (e.g., condition A) for fault diagnosis. The model was optimized using the Adam algorithm with a batch size of 32, initial learning rate of 0.001, and 100 epochs. To evaluate performance, we conducted 10 independent experiments and reported the average accuracy on the test sets. The results demonstrated that overlapping sampling significantly improved fault recognition rates compared to non-overlapping methods. For example, with a stride of 100, accuracy increased by up to 12.85% under variable speed conditions, as shown in Figure 6 (represented descriptively due to format constraints). This improvement is attributed to the enhanced feature learning from augmented samples, which provide more diverse patterns for the CNN to generalize effectively.
Further analysis involved visualizing the learned features using t-SNE dimensionality reduction on the fully connected layer outputs. The distributions showed clear separation between fault classes in both source and target domains, indicating that the model effectively captures discriminative features. Additionally, confusion matrices revealed high precision in classifying bevel gear faults, with broken tooth and missing tooth conditions being accurately identified. For instance, in the target domain, the model achieved an overall accuracy of 97.73%, with minimal misclassification between normal and faulty states.
In conclusion, our study presents a robust framework for fault diagnosis in bevel gears under variable speed conditions with limited samples. The 1CNN model, combined with overlapping sampling and transfer learning, enables deep feature extraction and adaptive model tuning, addressing the challenges of data scarcity and non-stationary signals. Experimental results confirm that this approach enhances diagnosis accuracy without requiring extensive computational resources, making it suitable for real-world applications where bevel gears are critical components. Future work could explore optimizing the sliding stride for specific operational contexts and extending the method to other types of gear systems.
