Hi y’all!
I’m Matt Lamsey, and I’m a robotics PhD student at the Georgia Institute of Technology. This summer, I have been interning with Hello Robot, where I have had the opportunity to work on several interesting projects with Stretch. These include:
- I explored optimizing whole-body robot trajectories in simulation, as detailed in this forum post. This built on top of excellent prior work by Michal Ciebielski!
- In collaboration with @hello-amal, @hello-vinitha, and @vynguyen91, I developed ROS2 actions that enable Stretch to show its tablet end effector to humans. @hello-amal described how these actions were integrated into Stretch Web Teleop in his recent forum post.
- In collaboration with @cpaxton and @bshah, I contributed to dexterous teleoperation and visual servoing modules for Stretch.
This forum post details my work on these three projects, and includes links to relevant code. I hope that this serves as a useful reference for others to build on!
1. Trajectory Optimization
Trajectory optimization code link here!
Figure 1. Example simultaneous position and orientation tracking for a simulated Stretch RE3.
A more comprehensive forum post about trajectory optimization can be found here, which builds on prior work by Michal Ciebielski. We leverage the OpTaS robotic optimization framework to perform trajectory optimizations for a simulated Stretch RE3.
We specify 6DOF trajectories for the robot’s end effector to track, and non-holonomic motion constraints are enforced for the robot’s base. My original post on this topic contains results from optimizing several types of trajectories - check it out!
2. Showing a Tablet to a Human
Tablet showing code link here! This work was first mentioned in @hello-amal’s forum post last week.
Figure 2. Example tablet viewing poses relative to a human.
Past piloting of Stretch with older adults and individuals with motor impairments highlighted the usefulness of attaching a tablet to the end of Stretch’s arm. Some applications of a tablet end effector include facilitating video calls and showing pictures / videos to someone. Therefore, I spent some time this summer exploring how Stretch can best show someone a tablet. The objective was to create a planning and motion framework that can consistently position Stretch’s tablet at a comfortable viewing pose relative to a human.
The tablet showing routine has been tested with older adults at Our Place Social Center in Hillsboro, Oregon, and it is also currently being tested as part of a larger study at Clark Lindsey Village senior living community in Illinois, in collaboration with Prof. Wendy Rogers from the University of Illinois Urbana-Champaign. Keep an eye out for the future results of that study!
Try it out!
Installation instructions for this ROS2 package are found here. Steps for configuring your robot to use the tablet end effector as its tool are found here.
Web Teleoperation
The best way to try this code is through the stretch_web_teleop
package. With the robot configured to use a tablet tool, the pose estimator and tablet showing UI elements should be enabled under the realsense camera view. See Figure 6 below for an example.
Command Line Demo
We also provide the a command line tool to test tablet showing. In two terminals, run the following commands:
Terminal one: ros2 launch stretch_show_tablet demo_show_tablet.launch.py
Terminal two: ros2 run stretch_show_tablet demo_show_tablet
The demo_show_tablet
contains a command line menu system for testing the tablet showing routine. Enter commands to enable the pose estimator and jog the robot’s camera to point at a human. You can then send a ShowTablet
action request to the show_tablet_server
action server. The action server will estimate the human’s pose and plan a tablet location for easy viewing.
Perception
Our system leverages the human pose estimation module from the stretch_deep_perception
ROS2 package to estimate keypoints on a human’s body. An example of the keypoint estimates obtained using this package is shown in Figure 3. We present a planning framework that uses this pose estimate to generate an ergonomic placement for the tablet relative to the human. The robot then moves its end effector to this pose for the duration of the interaction between the human and the tablet.
Figure 3. Example human keypoint estimate visualized in the Stretch Web Teleoperation interface. Left: fisheye camera view of human. Right: keypoint estimate (magenta) overlayed on the realsense camera feed.
Planning and Motion Framework
To show a human something, it is helpful to explicitly define the transforms between the human, the robot, and the object of interest. For showing a tablet, we formulated the coordinate system for the human using pose estimate keypoints for the human’s shoulders and nose, as shown in Figure 4. This procedure creates a coordinate system positioned at the human’s nose, with the X axis perpendicular to the human’s shoulders and the Z axis pointing straight upwards.
Figure 4. Definition of head-centric human coordinate system for showing a tablet.
Using the human’s head-centric coordinate system, we can then define an ergonomic position for the tablet relative to the human’s head. Our process for determining an ergonomic tablet placement is shown in Figure 5. Based on ergonomics literature [1], we defined an ergonomic placement of the tablet as approximately 40cm from the human’s head with the top of the tablet slightly below eye level, parallel to the human’s shoulders, and centered on the human’s sagittal plane.
Figure 5. Definition of tablet coordinate system and placement relative to a human’s head.
The human pose estimation module in stretch_deep_perception
generates human pose estimates in the robot’s head camera frame, which can be easily transformed into the robot’s base frame using the tf
package. This connects the nodes in the kinematic tree between the robot’s base, the human’s coordinate system, and the desired tablet placement location. We then use the Pinocchio Inverse Kinematics (IK) solver from the Gepetto project, which is integrated with Stretch’s URDF in the stretch_web_teleop
package here, to find the inverse kinematic solution for the tablet placement.
Sometimes, the desired tablet placement will be unreachable by the robot, such as when the human is far away from the robot, or when the human is standing. If the IK solver fails to converge solely due to the robot’s arm extension limits, then we change the kinematic solution to use the maximum arm extension and to then point the tablet at the person’s head from the end effector’s new position. If the IK solver fails due to the robot’s upper lift limit, then we change the kinematic solution to use the maximum lift height and to then tilt the tablet slightly upwards.
ROS2 Action and Web Teleoperation Integration
We integrated the tablet showing planning and motion framework as a ROS2 Action into the Stretch Web Teleop application. The UI elements for estimating a human’s pose and executing the tablet showing action are highlighted in Figure 6.
Figure 6. Tablet showing integration into Stretch Web Teleop, borrowed from @hello-amal’s forum post.
The flow of information and actions inside the ROS2 action server for showing a tablet are given in Figure 7. For safety, we cleanly handle action cancel requests made by the web teleop GUI at every stage of the tablet showing routine.
Figure 7. ROS2 Action integration flow inside the Stretch Web Teleoperation application.
Future Work: Optimizing the Robot’s Base Placement
The current implementation of the tablet showing routine does not include methods to autonomously plan and navigate the robot’s base to a “good” location from which it can show the tablet. Below is some preliminary work on modeling what good locations for the robot’s base could be, based on sampling and optimizing Inverse Kinematics (IK) solutions for showing a tablet. These optimizations have not been integrated into the stretch_show_tablet
actions, but the modeling code is in the repository for future reference.
First, we evaluated which base placements allow the robot to show the tablet at many different locations, as shown in Figure 8. A human’s pose estimate key points are shown as black dots and example tablet locations are shown as coordinate frames. Points on the ground are colored based on how many of the example tablet locations the robot can reach if its base was positioned at that point. We found that the robot’s wrist yaw joint limits constrained the robot’s valid base placements to be mostly to the right of the human.
Figure 8. Sampling robot base placements to show a tablet at many locations relative to a human.
For a single tablet placement, we also evaluated optimizing the robot’s base placement against a kinematic cost function. We used the “distance from mechanical joint limits” cost function, presented in Chapter 3 of [2], which is shown in Figure 9. Here, w(q) is the cost function of a robot configuration q, q_i is an individual joint value, q_i bar is the center value of the joint’s range, q_iM is the maximum joint limit, and q_im is the minimum joint limit. In this optimization, we only include joint values from the robot’s lift through its wrist (i.e. ignoring the base motions).
We visualized the cost surface for the robot’s IK solutions, as shown below. Because this cost surface is convex, a greedy optimizer can be used to solve the optimization problem. In preliminary testing, scipy.optimize.minimize
was able to optimize the robot’s base placement in under 0.1s on a Stretch RE3.
Figure 9. Base placement optimization to minimize the robot’s distance from mechanical joint limits while showing a tablet.
3. Dexterous Teleoperation and Visual Servoing for Grasping
Code for dexterous teleoperation and visual servoing coming soon! Email Chris Paxton (cpaxton@hello-robot.com @cpaxton) for more information about the release of this code.
The final project that I contributed to was the development of some new modules for Stretch: 1) dexterous teleoperation for data collection and 2) visual servoing to grasp an object. Stay tuned for more updates from @hello-yiche and @cpaxton regarding these modules!
Dexterous Teleoperation
Dexterous teleoperation of Stretch using specialized ArUco tongs was recently released alongside the Stretch RE3. The initial implementation presents several limitations, such as a limited teleoperation workspace inside the field of view of the camera on the ground and a lack of a way to pause tracking of the tongs to avoid unwanted motion.
To address these limitations, I added a “clutch” to the dexterous teleoperation framework. The clutch is activated by placing your free hand over the camera on the ground, as shown in Figure 10. Low latency hand tracking is achieved using Google’s mediapipe hand tracking package. Activating the clutch allows users to reposition the tongs over the camera without moving the robot. This opens the door for commanding the robot over longer distances using dexterous teleoperation.
Figure 10. Teleoperating Stretch over long distances using the new dexterous teleoperation “clutch” feature.
Visual Servoing for Grasping
We have also been working on adding an autonomous object grasping module to Stretch. Preliminary grasping work by @cpaxton identifies objects in the robot’s camera feeds using the Detic [3] deep perception model and using a proportional visual servoing controller to drive the robot’s gripper to the object. I expanded on this work by adding a temporal filter for object state predictions and by tuning the visual servoing controller.
The temporal filter keeps track of all observations of an object within a time window of n
seconds. This improves the stability of the visual servoing loop in instances where the object is mis-classified in the wrist camera’s images (such as when the object is partially obscured by the robot’s gripper, or when the object is partially or fully outside of the camera’s frame). Examples of Stretch servoing to and grasping a mug, a towel, and a screwdriver are shown in Figure 11, and examples of the visual servoing input from the wrist camera are shown in Figure 12.
Figure 11. Visual servoing to grasp objects identified by Detic [3].
Figure 12. Example eye-in-hand object segmentation masks used during visual servoing. Current camera view center represented by the green dot, center of the object mask represented by the blue dot.
Acknowledgement
I want to thank the entire Hello Robot team for supporting me and my work this summer. Binit Shah, Chris Paxton, and Vinitha Ranganeni served as great mentors, and I enjoyed learning from my fellow interns, Amal Nanavati and Yi-Che Huang. I also want to thank Julian Mehu for his assistance with equipment. I am grateful to have had the opportunity to work alongside excellent engineers, learn a lot about robots, and push Stretch’s capabilities forward.
References
[1] Tilley, Alvin R. The measure of man and woman: human factors in design. John Wiley & Sons, 2001.
[2] Sciavicco, L., et al. “Robotics: Modelling, planning and Control, ser.” Advanced Textbooks in Control and Signal Processing (2011).
[3] Zhou, Xingyi, et al. “Detecting twenty-thousand classes using image-level supervision.” European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.