Automatically grabbing objects with Stretch3

Hello,

I’ve been working on programming Stretch 3 to detect specific objects, like bottles. I’ve explored stretch_visual_servoing but couldn’t find any configuration options for specifying the object model name. Could someone guide me on how to set this up? Any assistance would be greatly appreciated.

Thank you!

Hey @allen, The Visual Servoing demo available in stretch_visual_servoing should be able to find the objects in the Yolov8 segmentation model classes. Currently the demo is hard coded to grasp only apple and sports ball, but you can modified it or add objects to grasp by modifying yolo_servo_perception.py L#83.

1 Like

Hi @Mohamed_Fazil,

Thank you for your response. I followed your approach and added ‘bottle’ to the class name list: if class_name in ['apple', 'sports ball', 'bottle']:

However, when I ran python3 visual_servoing_demo.py again, the camera still did not detect the bottle. I also tested this with a tennis ball, but it didn’t work either.

Hi Allen,

I just freshly cloned + gave the visual servoing demo a try, and here are my results. I added an orange, a fork, and a toothbrush to the YOLO object list. Let me know if following these steps solves your troubles.

1. Add objects to YOLO detector

I changed line 83 in yolo_servo_perception.py to:

if class_name in ['apple', 'sports ball', 'orange', 'fork', 'toothbrush']:

You could add bottle to this list. I found a full list of YOLO classes here.

2. Test streaming from camera and running YOLO

In two separate terminals, I ran:

  • python3 send_d405_images.py
  • python3 recv_and_yolo_d405_images.py

Then, I e-stopped the robot and moved its arm over the table. This resulted in the following visualizations. Note how the toothbrush is not recognized - YOLO had a tough time with this object. If you can’t see any objects detected at this stage, there may be other issues with YOLO.


Above: orange


Above: fork


Above: unrecognized toothbrush

3. Run visual servoing

To run the visual servoing with YOLO, the following three processes should be running simultaneously in separate terminals:

  • python3 send_d405_images.py
  • python3 recv_and_yolo_d405_images.py
  • python3 visual_servoing_demo.py -y (the -y flag enables YOLO)

After running the code, I e-stopped the robot, pointed the wrist at the table, and turned the e-stop off. Then, I placed each object on the table. Below is what that looked like for me. Note how the servoing recovers after bumping the orange backwards.

servo
Above: visual servoing in action!

I hope that this helps you debug your issues!

3 Likes

Hi @hello-lamsey !

Thank you so much for your response! It was very helpful and now stretch3 is able to detect my bottle.
Nevertheless, the issue I encounter right now is that the robot was not able to calculate the distance between the object and the gripper correctly result in grabbing air in front of the object like in the image below:


Could you point me into the direction on how to fix this issue?

Thansk!

Hi @allen,

I’m glad that you got the code running!

The first thing to check would be that the visual servoing code is calculating distances correctly based on the camera feed. The coordinate system for the image (and object position) are shown below. The [x,y,z] position (cm) of the center of the object is printed over the object.


Above: image coordinate system (position approximate). Positive Z (blue) is pointing into the screen (or directly out of the camera).

To achieve grasping of the tennis ball, my system reports a tennis ball position of approximately [x,y,z] = [0, 4, 17] centimeters, which matches the default grasp position defined in the demo.


Above: ball in grasp location.

Can you share a screenshot / screen recording of the YOLO render window to debug measurements? It is possible that the z distance is not being measured correctly.

Also note: the target grasp position between the robot’s fingertips is also calculated online, so the target grasp position may vary between robots.

Hi @hello-lamsey

Thanks again for your reply. After testing out, the robot seems not be able to nivagate to the proper position for grapping. As shown below, the position the robot started grabbing is 1.6,4.2,12.9, also it seems like the estimated position of the grippers are different than yours too. Furthermore, when the robot move it’s gripper to this position, it stops and starts grabbing and results in grabbing nothing.


Hi @allen,

It looks like the depth readings for the bottle are off. The model thinks that the bottle is ~3cm wide and close to the camera, which does not appear to match reality. I tried with a transparent bottle on my end, and while YOLO did not always see the bottle, the depth readings appeared correct. Do you observe erroneous distance / size readings when looking at an opaque bottle as well?

I see from an earlier post in the thread that you were having trouble with the tennis ball as well. Have you tested whether the visual servoing demo works for the tennis ball and / or the ArUco cube (running without YOLO)? If so, what were the results?

Also, it is odd that the grasp begins when the object’s z=12.9. While running visual servoing, I see values such as 'grasp_center_xyz': array([0.00646052, 0.06986317, 0.17505255]) in the YOLO results printout in the visual servoing terminal. What value are you seeing for 'grasp_center_xyz' in your terminal output?

Example YOLO results output:
yolo_results = {'fingertips': {'right': {'pos': array([0.09063274, 0.03583154, 0.1768219 ]), 'x_axis': array([-0.68398136, 0.03269413, -0.72876649]), 'y_axis': array([ 0.08904867, -0.98778254, -0.12789051]), 'z_axis': array([-0.72404409, -0.15237041, 0.67271347])}, 'left': {'pos': array([-0.07698445, 0.03379969, 0.17573048]), 'x_axis': array([ 0.69878856, 0.10996119, -0.70682607]), 'y_axis': array([-0.0210888 , 0.99085159, 0.13329814]), 'z_axis': array([ 0.71501735, -0.0782411 , 0.6947147 ])}}, 'yolo': [{'name': 'orange', 'confidence': 0.29331216, 'width_m': 0.04629309170714137, 'estimated_z_m': 0.15356746551650566, 'grasp_center_xyz': array([0.00646052, 0.06986317, 0.17505255]), 'left_side_xyz': array([-0.01747896, 0.06128851, 0.15356747]), 'right_side_xyz': array([0.02881413, 0.06128851, 0.15356747])}]}

Hi @hello-lamsey ,

I have included the links to the recordings of the results for grabbing the transparent and opaque bottles. It seems like the robot struggles a little bit but eventually manages to grab the opaque bottle, whereas it still cannot grab the transparent bottle. Although I don’t have a video for grabbing the ArUco cube, it works perfectly fine.

Transparent bottle video

Opaque bottle video

For the ‘grasp_center_xyz’ terminal output when grabbing the transparent bottle, I am seeing values such as:

  • 'grasp_center_xyz': array([ 0.01307648, -0.03386912, 0.23353366])
  • 'grasp_center_xyz': array([ 0.01090819, -0.0208785 , 0.23958624])
  • 'grasp_center_xyz': array([0.01349765, 0.0044698 , 0.26781633])
  • 'grasp_center_xyz': array([0.01890173, 0.06022887, 0.23040281])
  • 'grasp_center_xyz': array([0.01301666, 0.05514316, 0.22045396])
  • 'grasp_center_xyz': array([0.01462979, 0.03248335, 0.17890907])

These are quite different from the values you are seeing.

Best regards,
Allen

Hi @allen,

Glad to hear that grabbing the ArUco cube works well! Also, those grasp centers do seem to vary a decent amount. It might be within tolerances for the code to successfully grasp some objects, but if performance is poor, then factors such as the lighting conditions may be negatively affecting the estimate of the fingertip ArUco markers.

Regarding the transparent bottle, the d405 wrist camera uses RGB stereo image pairs to compute depth. It may struggle with transparent objects such as the water bottle that you are using; I recommend trying to use opaque objects when possible.

The original visual servoing code was tuned to work with the cube and the tennis ball, so grasping other objects reliably may take some tweaking of parameters. If you are interested in improving performance while grasping the opaque bottle, here are some code bits that could be adjusted:

  • Change the grasp_depth in yolo_servo_perception.py: see YoloServoPerception.apply()

  • Change (reduce) the speeds for the arm and lift in the retract state in visual_servoing_demo.py: see the behavior == 'retract' block inside main() and adjust the cmd dictionary.

Since this code uses velocity control for the robot’s joints, be careful not to turn the speeds up too high! I also recommend staying near the E-Stop in case the parameters that you adjust cause the robot to move in an undesirable way.

Last, if you are interested in an alternative approach to visual servoing, consider checking out the stretch_forcesight repository. This is an experimental deep learning-based approach to planning force and position targets for Stretch’s end effector to achieve while performing semantically labeled tasks, such as picking up a cup. Note that the deep model in stretch_forcesight was trained using an experimental camera mount on the end effector, as well as older models of Stretch, so it will likely not perform as well as described in the original paper out-of-the-box.

1 Like

Hi @hello-lamsey!

Thanks for the additional information. I’m also curious if there is a ROS 2 package with similar functionality to stretch_visual_servoing or stretch_forcesight. Specifically, I’m looking for a ROS 2 package or tool that can provide a message indicating whether the robot has successfully grabbed an object or not.

Do you know of any existing packages or solutions that offer this capability?

Thanks in advance for your help!

Hi @allen,

Currently, there is not an official ROS2 package for visual servoing (or forcesight) with Stretch. I have opened an issue on github that requests the creation of a ROS2 package for visual servoing.

One current limitation is that the ROS2 stretch_driver does not directly expose a velocity controller for the robot’s joints yet. Velocity control is used in the pythonic demo and is a key enabler of fast, dexterous tracking. Running velocity control over ROS2 may present issues related to factors like communication latency, so this could be a bit tricky to implement safely as a prerequisite for a visual servoing package.

If there is general interest in the creation of a visual servoing ROS2 package, let us know!