Skip to content

Support for multiple objects in single image #40

@dereklukacs

Description

@dereklukacs

FasterRCNN supports training with multiple objects per image (according to following)

From: https://pytorch.org/docs/stable/torchvision/models.html#object-detection-instance-segmentation-and-person-keypoint-detection

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

  • boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values of x between 0 and W and values of y between 0 and H
  • labels (Int64Tensor[N]): the class label for each ground-truth box

Is this supported by detecto? I have only been able to pass in {"boxes": Tensor(1, 4), "labels": str}

Would you be open to a pull request fixing this? It looks like the only place where there is an issue is in core.Model

class Model:
...
    # Converts all string labels in a list of target dicts to
    # their corresponding int mappings
    def _convert_to_int_labels(self, targets):
        for target in targets:
            # Convert string labels to integer mapping
            if _is_iterable(target["labels"]):
                target["labels"] = torch.tensor(self._int_mapping[label] for label in target["labels"])
            else:
                target["labels"] = torch.tensor(self._int_mapping[target["labels"]]).view(1)

This would now accept target["labels"] = "one_object" or target["labels"] = ["obj1_class", "obj2_class"] and would still return a tensor of length equal to the number of objects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions