Train a High-Quality Model

Industrial quality inspections usually have strict limits on false negative and false positive rates.

Therefore, the quality of a defect segmentation model is very important.

This section introduces several factors that most affect the model quality and how to train high-quality defect segmentation models.

Ensure Labeling Quality

Labeling quality is one of the most significant factors affecting model performance. In actual projects, low labeling quality accounts for the reasons for more than 90% of poor model performance cases. Therefore, if the model is not performing well, solving labeling quality issues should be prioritized.

Labeling quality involves consistency, completeness, and accuracy:

  1. Consistency: Ensure the consistency of defect labeling methods, and avoid using different labeling methods for the same type of defects.

    • Left image, bad example: Label defects of the same type in different ways.

    • Right image, good example: Label defects of the same type in a consistent way.

    ../../../_images/label_consistency.png

    The defects are missing parts in the welding areas. Although both ways of labeling are correct, when labeling, please stick to one way.

  2. Completeness: Ensure that all regions that should be considered as defect regions according to the user-defined standard are selected, and avoid any missed selections.

    • Left image, bad example: Omit the regions that should be labeled.

    • Right image, good example: Label all necessary regions.

    ../../../_images/label_completeness.png

    Some bubbles in the example on the left were omitted.

  3. Accuracy: Make the region selection as fine as possible to ensure the selected regions’ contours fit the actual defects’ contours and avoid bluntly covering the defects with coarse large selections.

    • Left image, bad example: Cover defects with a coarse large selection.

    • Right image, good example: Make the contour of the selection fit the defect’s contour.

    ../../../_images/label_accuracy.png

    The contours of the selection and the actual defect should be the same.

  4. Certainty: For ambiguous defects, when it is impossible to judge whether the defect judgment criteria are met, the mask polygon tool can be used to cover the defect regions.

    • Left image, good example: Mask out regions containing ambiguous defects.

    • Right image, mediocre example: Leave regions in which whether there are defects is hard to determine unprocessed and exposed to the model.

    ../../../_images/certainty.png

    You can use the mask polygon tool to mask out the regions containing ambiguous defects.

Attention

When there are multiple defects in the image, if it is impossible to judge whether each defect meets the defect judgment criteria, you can delete the current image to avoid affecting the model training effect.

Set the Proper Region of Interest (ROI)

Setting the ROI can effectively eliminate the interference of the background, and the ROI boundary should be as close to the outer contours of the objects as possible.

../../../_images/roi1.png

Hint

The same ROI setting will be applied to all images, so it is necessary to ensure that objects in all images are located within the ROI, especially in scenarios where the object positions/sizes are not fixed.

Select the Right Dataset

  1. Control dataset image quantities

    For the first-time model building of the Classification module, capturing 20 to 30 images is recommended.

    It is not true that the larger the number of images the better. Adding a large number of inadequate images in the early stage is not conducive to model improvement later, and will make the training time longer.

  2. Collect representative data

    The datasets should contain NG images covering all the defect types with all defect features, in terms of shape, background, color, size, etc. When the features in OK images do not differ across images, the number of OK images can be relatively small.

  3. Balance data proportion

    The number of images of different conditions/object classes in the datasets should be proportioned according to the actual project; otherwise, the training effect will be affected.

  4. Dataset images should be consistent with those from the application site

    The factors that need to be consistent include lighting conditions, object features, background, field of view, etc.