Train a High-Quality Model¶
This section introduces several factors that most affect the model quality and how to train a high-quality image classification model.
Ensure Image Quality¶
Avoid overexposure, dimming, color distortion, blur, occlusion, etc. These conditions will lead to the loss of features that the deep learning model relies on, which will affect the model training effect.
Left image, bad example: Overexposure.
Right image, good example: Adequate exposure.
Left image, bad example: Dim image.
Right image, good example: Adequate exposure.
Left image, bad example: Color distortion.
Right image, good example: Normal color.
Left image, bad example: Blur.
Right image, good example: Clear.
Left image, bad example: Occluded by the robot arm.
Right image, good example: Occluded by a human.
Ensure that the background, perspective, and height of the image capturing process are consistent with the actual application. Any inconsistency will reduce the effect of deep learning in practical applications. In severe cases, data must be re-collected. Please confirm the conditions of the actual application in advance.
Bad example: The background in the training data (left) is different from the background in the actual application (right).
Bad example: The field of view and perspective in the training data (left) are different from that in the actual application (right).
Bad example: The camera height in the training data (left) is different from the background in the actual application (right).
Ensure Dataset Quality¶
An instance segmentation model is trained by learning the features of the objects in the image. Then the model applies what is learned in the actual applications.
Therefore, to train a high-quality model, the conditions of the collected and selected dataset must be consistent with those of the actual applications.
Collect Datasets¶
Various placement conditions need to be properly allocated. For example, if there are horizontal and vertical incoming materials in actual production, but only the data of horizontal incoming materials are collected for training, the classification effect of vertical incoming materials cannot be guaranteed.
Therefore, when collecting data, it is necessary to consider various conditions of the actual application, including the features present given different object placement orientations, positions, and positional relationships between objects.
Attention
If some situations are not in the datasets, the deep learning model will not go through inadequate learning on the corresponding features, which will cause the model to be unable to effectively make recognitions given such conditions. In this case, data on such conditions must be collected and added to reduce the errors.
Orientations
Positions
Positional relationships between objects
Data Collection Examples¶
A metal piece project
Single object class.
50 images were collected.
Object placement conditions of lying down and standing on the side need to be considered.
Object positions at the bin center, edges, corners, and at different heights need to be considered.
Object positional relationships of overlapping and parallel arrangement need to be considered.
Samples of the collected images are as follows.
A grocery project
7 classes of objects are mixed and classification is required.
The following situations need to be considered to fully capture object features.
Situation #1: objects of one class placed in different orientations
Situation #2: mixing objects of multiple classes
Number of images for situation #1: 5 * number of object classes.
Number of images for situation #2: 20 * number of object classes.
The objects may come lying flat, standing on sides, or reclining, so images containing all faces of the objects need to be considered.
The objects may be in the center, on the edges, and in the corners of the bins.
The objects may be placed parallelly or fitted together.
Samples of the collected images are as follows:
Placed alone
Mixedly placed
A track shoe project
The track shoes come in many models.
The number of images captured was 30 * number of models.
The track shoes only face up, so only the facing-up condition needs to be considered.
The track shoes may be on different heights under the camera.
The track shoes are arranged regularly together, so the situation of closely fitting together needs to be considered.
Samples of the collected images are as follows:
A metal piece project
Metal pieces are presented in one layer only. So only 50 images were captured.
The metal pieces only face up.
The metal pieces are in the center, edges, and corners of the bin.
The metal pieces may be fitted closely together.
Samples of the collected images are as follows:
A metal piece project
Metal pieces are neatly placed in multiple layers.
30 images were collected.
The metal pieces only face up.
The metal pieces are in the center, edges, and corners of the bin and are on different heights under the camera.
The metal pieces may be fitted closely together.
Samples of the collected images are as follows:
Select the Right Dataset¶
Control dataset image quantities
For the first-time model building of the “Instance Segmentation” module, capturing 30 images is recommended.
It is not true that the larger the number of images the better. Adding a large number of inadequate images in the early stage is not conducive to the later model improvement, and will make the training time longer.
Collect representative data
Dataset image capturing should consider all the conditions in terms of illumination, color, size, etc. of the objects to recognize.
Lighting: Project sites usually have environmental lighting changes, and the datasets should contain images with different lighting conditions.
Color: Objects may come in different colors, and the datasets should contain images of objects of all the colors.
Size: Objects may come in different sizes, and the datasets should contain images of objects of all existing sizes.
Attention
If the actual on-site objects may be rotated, scaled in images, etc., and the corresponding image datasets cannot be collected, the datasets can be supplemented by adjusting the data augmentation training parameters to ensure that all on-site conditions are included in each dataset.
Balance data proportion
The number of images of different conditions/object classes in the datasets should be proportioned according to the actual project; otherwise, the training effect will be affected.
Dataset images should be consistent with those from the application site
The factors that need to be consistent include lighting conditions, object features, background, field of view, etc.
Ensure Labeling Quality¶
Determine the Labeling Method¶
Label the upper surface’ contour: It is suitable for regular objects that are laid flat, such as cartons, medicine boxes, rectangular workpieces, etc., for which the pick points are calculated on the upper surface contour, and the user only needs to make rectangular selections on the images.
Left image, bad example: Select the entire box when only selecting the top surface is needed.
Right image, good example: Only select the top surface when necessary.
Label the entire objects’ contours: It is suitable for sacks, various types of workpieces, etc., for which only labeling the object contours is the general method.
Good examples
Special cases: For example, when the recognition result needs to conform to how the grippers work.
It is necessary to ensure that the suction cup and the tip of the bottle to pick completely fit (high precision is required), and only the bottle tip contours need to be labeled.
Good example: Label bottle tips.
The task of rotor picking involves recognizing rotor orientations. Only the middle parts whose orientations are clear can be labeled, and the thin rods at both ends cannot be labeled.
Good example: Label middle parts of rotors.
It is necessary to ensure that the suction parts are in the middle parts of the metal pieces, so only the middle parts of the metal pieces are labeled, and the ends do not need to be labeled.
Good example: Label middle parts.
Check Labeling Quality¶
Labeling quality should be ensured in terms of completeness, correctness, consistency, and accuracy.
Completeness: Label all objects that meet the rules, and avoid missing any objects or object parts.
Left image, bad example: Omit objects.
Right image, good example: Label all objects.
Correctness: Make sure that each object corresponds correctly to the label it belongs to, and avoid situations where the object does not match the label.
Left image, bad example: Wrong label. A Mentos was labeled as a yida.
Right image, good example: Correct labels.
Consistency: All data should follow the same labeling rules. For example, if a labeling rule stipulates that only objects that are over 85% exposed in the images be labeled, then all objects that meet the rule should be labeled. Please avoid situations where one object is labeled but another similar object is not.
Bad example: An object that should be labeled is labeled in one image but not labeled in another.
Accuracy: Make the region selection as fine as possible to ensure the selected regions’ contours fit the actual object contours and avoid bluntly covering the defects with coarse large selections or omitting object parts.
Left image, bad example: Omit object parts
Middle image: Correct labeling selection.
Right image, good example: Include parts of other objects in an object’s selection.