In modern industrial systems, bevel gearboxes are critical components within the transmission assemblies of complex electromechanical equipment such as helicopters, wind turbines, and nuclear power units. They are frequently subjected to harsh operating conditions characterized by high temperatures and heavy loads. Over time, these conditions inevitably lead to various failures, including tooth surface damage, missing teeth, and spalling. A malfunction in a bevel gearbox can severely impact the operational state of the entire machinery, resulting in unplanned downtime, reduced efficiency, increased production costs, and potentially serious safety incidents. Therefore, developing effective fault diagnosis methods for bevel gearboxes is of paramount practical significance.
The rapid advancement of artificial intelligence has propelled data-driven machine learning methods to the forefront of fault diagnosis research. Numerous powerful pattern recognition techniques have been successfully applied in this field, such as Support Vector Machines (SVM), Extreme Learning Machines (ELM), and Geometric Model-based Classification (GMC) models. Among GMC models, convex hull-based classification has garnered particular attention due to its solid geometric and statistical foundations, good generalization capability, and relatively straightforward computation compared to other geometric models like affine hulls or hyperdisks. The core idea is to represent the distribution of samples from each class by their convex hull and then find a separating hyperplane with the maximum margin between these hulls.

However, the practical application of these models for diagnosing faults in bevel gearboxes faces two significant challenges. First, vibration signals collected from bevel gearboxes under real operating conditions are inevitably contaminated with noise and may contain outliers. Standard convex hull models are highly sensitive to such disturbances, as a single outlier can drastically stretch and distort the hull boundary, compromising the model’s robustness. Second, bevel gearboxes predominantly operate under healthy conditions, making fault samples scarce and difficult to obtain. This leads to class-imbalanced datasets where the majority class (healthy state) vastly outnumbers the minority classes (various fault states). Traditional convex hull classifiers tend to bias the decision hyperplane toward the minority class in such scenarios, degrading diagnostic performance.
To address these critical issues, we propose a Robustness Imbalanced Convex Hull-based Classification (RICHC) model for the intelligent fault diagnosis of bevel gearboxes. The proposed RICHC model introduces two key innovations to enhance the standard convex hull framework.
1. The Proposed RICHC Methodology
1.1 Robust Convex Hull with Confidence Weighting
To improve robustness against noise and outliers, we assign a confidence weight to each training sample based on its relative position within the class distribution. The intuition is that samples closer to the centroid of the class are more representative and reliable, while potential outliers lying far from the center should have less influence on shaping the convex hull. For a sample $\mathbf{z}_i$ belonging to a class, its confidence coefficient $\tau_i$ is defined by a sigmoid function of its distance to the class centroid $\mathbf{cent}$:
$$
\tau(\mathbf{z}_i) = \frac{1}{1 + \exp(\gamma \|\mathbf{z}_i – \mathbf{cent}\|^2)}, \quad \gamma \in (0,1]
$$
where $\gamma$ is a confidence factor, and $\mathbf{cent} = \frac{1}{n}\sum_{i=1}^{n} \mathbf{z}_i$ is the centroid of the $n$ samples in that class. This function maps distance to a value in (0, 1], where a smaller distance yields a confidence weight closer to 1. The weighted convex hull for a set of samples $\Theta = \{\mathbf{z}_i\}_{i=1}^{n}$ is then defined as:
$$
C(\Theta) = \left\{ \sum_{i=1}^{n} \tau_i \beta_i \mathbf{z}_i \ \middle|\ \sum_{i=1}^{n} \beta_i = 1,\ 0 \leq \beta_i \leq 1 \right\}
$$
This formulation reduces the contribution of noisy and outlying samples, leading to tighter and more accurate hull boundaries that better represent the true data distribution of the bevel gearbox health states.
1.2 Adaptive Scaling Strategy for Class Imbalance
To handle the class imbalance problem commonly encountered when diagnosing faults in bevel gearboxes, we introduce an adaptive scaling constraint on the convex hulls. The standard sum-to-one constraint on the combination coefficients $\beta_i$ is modified. For the positive class (e.g., a specific fault) with $n_+$ samples and the negative class (e.g., healthy state) with $n_-$ samples, the constraints become:
For the positive class hull:
$$ 0 \leq \beta_i^+ \leq 1 – \mu \rho, \quad \sum_{i=1}^{n_+} \beta_i^+ = 1 $$
For the negative class hull:
$$ 0 \leq \beta_j^- \leq 1 + \mu \rho, \quad \sum_{j=1}^{n_-} \beta_j^- = 1 $$
Here, $\mu$ is a scaling coefficient, and $\rho$ is a dynamic imbalance factor defined as:
$$
\rho = \begin{cases}
\sqrt{n_- / n_+}, & \text{if } n_+ < n_- \quad \text{(Positive class is minority)} \\
-\sqrt{n_+ / n_-}, & \text{if } n_+ > n_- \quad \text{(Positive class is majority)}
\end{cases}
$$
This strategy dynamically adjusts the upper bounds for the coefficients based on the relative sizes of the classes. When the positive class is the minority (has fewer samples), its hull is allowed to expand slightly ($\rho > 0$ reduces the upper bound below 1, effectively allowing a weighted point to lie outside the original convex combination), while the majority negative class hull is constrained to shrink. This counteracts the inherent bias of the maximum-margin hyperplane towards the minority class, pulling the decision boundary to a more equitable position.
1.3 RICHC Model Formulation
For a binary classification problem with positive samples $\Theta^+ = \{\mathbf{z}_i^+\}_{i=1}^{n_+}$ and negative samples $\Theta^- = \{\mathbf{z}_j^-\}_{j=1}^{n_-}$, the RICHC model finds the closest points between the two adapted, weighted convex hulls. This is formulated as the following quadratic programming problem:
$$
\begin{aligned}
\min_{\boldsymbol{\beta}} & \quad \left\| \sum_{i=1}^{n_+} \tau_i^+ \beta_i^+ \mathbf{z}_i^+ – \sum_{j=1}^{n_-} \tau_j^- \beta_j^- \mathbf{z}_j^- \right\|^2 \\
\text{s.t.} & \quad \sum_{i=1}^{n_+} \beta_i^+ = 1, \quad 0 \leq \beta_i^+ \leq 1 – \mu \rho \\
& \quad \sum_{j=1}^{n_-} \beta_j^- = 1, \quad 0 \leq \beta_j^- \leq 1 + \mu \rho
\end{aligned}
$$
Solving this optimization yields the optimal combination coefficients $\boldsymbol{\beta}^*$. The normal vector $\mathbf{w}^*$ and bias $b^*$ of the separating hyperplane are then derived from these coefficients and the weighted support vectors. To further enhance robustness in decision-making, the model’s output is mapped to a posterior probability using a sigmoid function, rather than relying on a hard sign function. This probabilistic output is more stable in the presence of noise.
For multi-class fault diagnosis of bevel gearboxes, the standard “one-vs-one” strategy is employed to extend the binary RICHC classifier.
2. Fault Diagnosis Framework for Bevel Gears
The overall framework for applying RICHC to diagnose faults in bevel gearboxes consists of the following steps, which are designed to handle the specific challenges of vibration analysis from these components.
- Signal Acquisition: Collect vibration signals from the bevel gearbox under different health states (healthy and various fault conditions).
- Feature Extraction: From each vibration signal sample, extract a comprehensive set of statistical features that characterize changes in the time and frequency domains induced by faults in the bevel gears. This typically includes features like mean, variance, kurtosis, skewness in the time domain, and spectral mean, spectral kurtosis, etc., in the frequency domain. A set of 22 such features is commonly used.
- Feature Selection: To reduce redundancy and improve computational efficiency, select the most discriminative features using a criterion like Fisher Score (FS). The FS for the $m$-th feature is calculated as:
$$ F(m) = \frac{\sum_{k=1}^{c} n_k (v_k^m – v^m)^2}{\sum_{k=1}^{c} n_k (s_k^m)^2} $$
where $c$ is the number of classes (fault states), $n_k$ is the number of samples in class $k$, $v_k^m$ and $s_k^m$ are the mean and standard deviation of the $m$-th feature in class $k$, and $v^m$ is the overall mean. Features with higher $F(m)$ are more sensitive to the different states of the bevel gearbox. - Model Training: Train the RICHC model using the selected features from the training dataset. The optimal hyperparameters (confidence factor $\gamma$, scaling coefficient $\mu$, and kernel parameter if using a nonlinear kernel) are determined via cross-validation.
- Testing and Diagnosis: Evaluate the trained RICHC model on the testing dataset to diagnose the health state of the bevel gearbox.
3. Experimental Validation and Analysis
3.1 Bevel Gearbox Dataset and Feature Selection
The proposed method was validated using vibration data from a bevel gearbox test rig. The gearbox had a 12-tooth input pinion and a 24-tooth output bevel gear. Faults were introduced on the input support bearing and on the bevel gears themselves, resulting in 7 health states: Normal, Bearing Inner Race Fault (0.4 mm width), Bearing Outer Race Fault (0.2 mm width), Bevel Gear Missing Tooth, Bevel Gear Crack (1 mm depth), Bevel Gear Pitting (1.25 mm radius), and Bevel Gear Chipping. For each state, 100 samples were collected, each with 2048 data points at a sampling frequency of 10,240 Hz.
After extracting the initial 22 time and frequency domain features, Fisher Score was applied. The diagnostic accuracy of RICHC was tested with different numbers of top-ranked features. The highest accuracy was achieved using the top 10 features. The selected features are listed in the table below.
| Index | Feature Description (Formula) |
|---|---|
| F1 | Mean: $\frac{1}{N}\sum_{i=1}^{N} s_i$ |
| F2 | Square Mean Root: $\left( \frac{1}{N}\sum_{i=1}^{N} \sqrt{|s_i|} \right)^2$ |
| F3 | Root Mean Square: $\sqrt{ \frac{1}{N}\sum_{i=1}^{N} s_i^2 }$ |
| F4 | Variance: $\frac{1}{N-1}\sum_{i=1}^{N} (s_i – F1)^2$ |
| F7 | Skewness: $\frac{1}{N}\sum_{i=1}^{N} \left( \frac{s_i – F1}{\text{std}(s)} \right)^3$ |
| F14 | Frequency Center: $\frac{1}{M}\sum_{k=1}^{M} p_k$ |
| F15 | Mean Square Frequency: $\frac{1}{F14}\sum_{k=1}^{M} (p_k – F14)^2$ |
| F19 | Frequency Variance: $\frac{\sum_{k=1}^{M} (f_k – F20)^2 p_k}{\sum_{k=1}^{M} p_k}$ |
| F20 | Root Mean Square Frequency: $\sqrt{ \frac{\sum_{k=1}^{M} f_k^2 p_k}{\sum_{k=1}^{M} p_k} }$ |
| F22 | Spectral Kurtosis: $\frac{\sum_{k=1}^{M} (f_k – F20)^4 p_k}{M \cdot (F19)^2}$ |
Where $s_i$ is the time-domain signal, $N$ is its length, $p_k$ is the $k$-th spectral line magnitude, $f_k$ is its corresponding frequency, and $M$ is the number of spectral lines.
3.2 Comparative Analysis with State-of-the-Art Methods
The performance of RICHC was compared against several established and recent methods: Support Vector Machine (SVM), Extreme Learning Machine (ELM), Maximum Margin Classification based on Convex Hulls (MMCCH), Extensible and Displaceable Hyperdisk (EDHD), and Minimal Error Convex Hull Approximation (MECHA). For all models, hyperparameters were optimized via 5-fold cross-validation. The optimal parameters found for RICHC were: confidence factor $\gamma=0.2$, scaling coefficient $\mu=0.0625$, and Gaussian kernel parameter $\varepsilon=1$.
In each trial, 50 samples per state were randomly selected for training, and the remaining 50 for testing. The experiment was repeated 10 times independently. The average diagnostic accuracy, standard deviation, and training time are summarized below.
| Model | Average Accuracy (%) | Std. Deviation (%) | Avg. Training Time (s) |
|---|---|---|---|
| SVM | 96.70 | 1.01 | 0.7105 |
| ELM | 97.10 | 0.49 | 0.1563 |
| MMCCH | 97.85 | 0.72 | 0.9265 |
| EDHD | 97.74 | 0.55 | 2.2857 |
| MECHA | 97.47 | 0.81 | 3.1740 |
| RICHC | 99.20 | 0.41 | 0.8816 |
The results clearly demonstrate that RICHC achieves the highest average diagnosis accuracy (99.20%) and the lowest standard deviation, indicating superior and more stable performance for bevel gearbox fault diagnosis. Its training time is comparable to SVM and MMCCH and significantly lower than EDHD and MECHA, making it suitable for practical applications.
3.3 Robustness Test Against Noise and Outliers
To evaluate robustness, Gaussian white noise was added to the raw vibration signals to create Signal-to-Noise Ratios (SNR) from 0 dB to 10 dB. The average accuracies over 10 trials are shown below.
| Model | Diagnosis Accuracy (%) at Different Noise Levels | ||||
|---|---|---|---|---|---|
| 0 dB | 2 dB | 4 dB | 6 dB | 8 dB | |
| SVM | 86.14 | 91.15 | 91.35 | 92.56 | 94.69 |
| ELM | 87.53 | 88.03 | 89.24 | 92.21 | 93.60 |
| MMCCH | 88.96 | 91.14 | 90.95 | 93.11 | 95.00 |
| EDHD | 86.44 | 89.63 | 90.23 | 92.33 | 92.54 |
| MECHA | 89.11 | 90.25 | 91.75 | 92.47 | 95.11 |
| RICHC | 92.53 | 94.65 | 94.90 | 96.90 | 97.80 |
RICHC consistently outperforms all other models at every noise level. Crucially, at the severe 0 dB noise condition, only RICHC maintains an accuracy above 90%, demonstrating its exceptional noise immunity due to the confidence weighting mechanism.
Robustness against outliers was tested by intentionally mislabeling a number $l$ of training samples from one class into another. The results for varying $l$ are summarized below.
| Model | Diagnosis Accuracy (%) with Number of Label Outliers per Class | ||||
|---|---|---|---|---|---|
| l=2 | l=4 | l=6 | l=8 | l=10 | |
| SVM | 97.29 | 94.66 | 93.00 | 91.19 | 90.90 |
| ELM | 93.41 | 91.86 | 92.89 | 89.47 | 88.11 |
| MMCCH | 96.14 | 93.48 | 92.73 | 92.64 | 90.38 |
| EDHD | 97.42 | 95.75 | 93.52 | 93.05 | 91.44 |
| MECHA | 96.08 | 95.42 | 92.93 | 92.73 | 90.05 |
| RICHC | 98.43 | 97.56 | 97.05 | 95.86 | 94.74 |
Again, RICHC shows superior resilience. Even with 10 outliers per class (20% contamination), it retains an accuracy of 94.74%, significantly higher than other models whose accuracy drops to around 90% or below.
3.4 Performance on Imbalanced Datasets
To simulate the realistic scenario of scarce fault data for bevel gearboxes, several imbalanced datasets were constructed. The normal state was treated as the majority class with 50 samples. Different fault states were used as the minority class with a small number of samples, defining the imbalance ratio $k = N_{majority} / N_{minority}$. The performance was evaluated using standard imbalanced classification metrics: Accuracy, G-mean ($\sqrt{TPR \times TNR}$), and F-value ($\frac{2 \times TPR \times PPV}{TPR + PPV}$), where TPR is True Positive Rate, TNR is True Negative Rate, and PPV is Positive Predictive Value.
| Dataset | Minority Fault (Bevel Gear / Bearing) | Minority Samples | Imbalance Ratio (k) |
|---|---|---|---|
| A | Missing Tooth | 5 | 10.00 |
| B | Chipping | 10 | 5.00 |
| C | Crack | 15 | 3.33 |
| D | Inner Race Fault | 20 | 2.50 |
| E | Outer Race Fault | 25 | 2.00 |
The average G-mean and F-value over 10 trials for each model are presented below.
| Model | G-mean (%) on Imbalanced Datasets | ||||
|---|---|---|---|---|---|
| A (k=10) | B (k=5) | C (k=3.33) | D (k=2.5) | E (k=2.0) | |
| SVM | 91.41 | 92.23 | 97.54 | 88.77 | 93.17 |
| ELM | 92.88 | 94.57 | 96.16 | 98.29 | 97.89 |
| MMCCH | 92.18 | 95.96 | 96.58 | 98.99 | 97.98 |
| EDHD | 93.78 | 95.88 | 97.00 | 98.58 | 98.18 |
| MECHA | 92.52 | 92.75 | 87.19 | 92.96 | 91.32 |
| RICHC | 96.78 | 98.69 | 99.07 | 99.30 | 99.80 |
| Model | F-value (%) on Imbalanced Datasets | ||||
|---|---|---|---|---|---|
| A (k=10) | B (k=5) | C (k=3.33) | D (k=2.5) | E (k=2.0) | |
| SVM | 92.20 | 91.76 | 91.77 | 98.00 | 88.90 |
| ELM | 91.50 | 92.92 | 93.70 | 97.60 | 98.28 |
| MMCCH | 92.70 | 92.23 | 94.53 | 98.20 | 99.01 |
| EDHD | 93.70 | 93.76 | 96.41 | 98.01 | 98.59 |
| MECHA | 92.50 | 92.59 | 93.07 | 88.35 | 92.98 |
| RICHC | 96.80 | 96.79 | 98.90 | 99.27 | 99.30 |
RICHC achieves the highest G-mean and F-value across all imbalance ratios. Its performance remains exceptionally high even at the most challenging imbalance ratio of k=10, demonstrating the effectiveness of the adaptive scaling strategy in handling the class imbalance problem inherent in fault diagnosis for bevel gearboxes.
4. Conclusion
This work presented a Robustness Imbalanced Convex Hull-based Classification (RICHC) model specifically designed to address the key practical challenges in intelligent fault diagnosis for bevel gearboxes. The two core innovations are the integration of a confidence weighting function and an adaptive hull scaling strategy.
The confidence weighting mechanism significantly enhances model robustness. By assigning lower weights to samples that are potential outliers or heavily corrupted by noise, based on their distance from the class centroid, the RICHC model constructs tighter and more accurate convex hull boundaries. Experimental results on bevel gearbox data under strong noise (0 dB SNR) and significant label contamination confirm its superior anti-interference capability compared to state-of-the-art methods.
The adaptive scaling strategy effectively mitigates the classifier bias caused by imbalanced training data, a common situation when fault samples for bevel gears are scarce. By dynamically adjusting the expansion and shrinkage of the convex hulls based on the relative sample sizes of the classes, the model finds a more equitable and accurate separating hyperplane. Validation on datasets with high imbalance ratios shows that RICHC maintains excellent diagnostic performance where other models degrade.
In summary, the proposed RICHC model offers a comprehensive solution that combines high diagnostic accuracy, strong robustness against real-world signal impurities, and effective handling of class imbalance. This makes it a highly suitable and reliable approach for the intelligent fault diagnosis of critical components like bevel gearboxes in industrial applications.
