If we are most interested in the employees from Level 1, the first model might be a good choice! It makes a few confident predictions with high probabilities. For example at the 0,6 threshold, it has only one false positive in this group.
If we want to predict resignations in Level 3, the second model looks much better.
If we want our model to work for all levels, we would probably pick the second model again. On average, it has acceptable performance for Levels 1, 2, and 3.
But what is also interesting is how both models perform on Levels 4 and 5. For all predictions made for employees in these groups, the probabilities are visibly lower than 0,5. Both models always assign a "negative" label.
If we look at the distribution of the true labels, we can see that the absolute number of resignations is pretty low in these job levels. Likely it was the same in training, and the model did not pick up any useful patterns for the segment.