Train a High-Quality Model¶
Industrial quality inspections usually have strict limits on false negative and false positive rates.
Therefore, the quality of the “Defect Segmentation” model is very important.
This section introduces several factors that most affect the model quality and how to train high-quality “Defect Segmentation” models.
Ensure Labeling Quality¶
Labeling quality is the most significant factor affecting model performance. In actual projects, low labeling quality accounts for the reasons for more than 90% of poor model performance cases. Therefore, if the model is not performing well, solving labeling quality issues should be prioritized.
Labeling quality involves consistency, completeness, and accuracy:
Consistency: Ensure the consistency of defect labeling methods, and avoid using different labeling methods for the same type of defects.
Left image, bad example: Label defects of the same type in different ways.
Right image, good example: Label defects of the same type in a consistent way.
Completeness: Ensure that all regions that should be considered as defect regions according to the user-defined standard are selected, and avoid any missed selections.
Left image, bad example: Omit the regions that should be labeled.
Right image, good example: Label all necessary regions.
Accuracy: Make the region selection as fine as possible to ensure the selected regions’ contours fit the actual defects’ contours and avoid bluntly covering the defects with coarse large selections.
Left image, bad example: Cover defects with a coarse large selection.
Right image, good example: Make the contour of the selection fit the defect’s contour.
Certainty: For ambiguous defects, when it is impossible to judge whether the defect judgment criteria are met, the mask polygon tool can be used to cover the defect area.
Left image, good example: Mask out regions containing ambiguous defects.
Right image, mediocre example: Leave regions in which whether there are defects is hard to determine unprocessed and exposed to the model.
Attention
When there are multiple defects in the image, if it is impossible to judge whether each defect meets the defect judgment criteria, you can delete the current image to avoid affecting the model training effect.
Set the Proper Region of Interest (ROI)¶
Setting the ROI can effectively eliminate the interference of the background, and the ROI boundary should be as close to the outer contours of the objects as possible.
Hint
The same ROI setting will be applied to all images, so it is necessary to ensure that objects in all images are located within the ROI, especially in scenarios where the object positions/sizes are not fixed.
Select the Right Dataset¶
Control dataset image quantities
For the first-time model building of the “Classification” module, capturing 20 to 30 images is recommended.
It is not true that the larger the number of images the better. Adding a large number of inadequate images in the early stage is not conducive to the later model improvement, and will make the training time longer.
Collect representative data
The datasets should contain NG images covering all the defect types with all defect features, in terms of shape, background, color, size, etc. When the features in OK images do not differ across images, the number of OK images can be relatively small.
Balance data proportion
The number of images of different conditions/object classes in the datasets should be proportioned according to the actual project; otherwise, the training effect will be affected.
Dataset images should be consistent with those from the application site
The factors that need to be consistent include lighting conditions, object features, background, field of view, etc.