computer vision pipeline to "see" in front of the robot maybe a way to promptize the image if needing to feed to gpt object detection? tracking objects and identifying them would be cool