Exciting New Features for the Stretch Web Interface!

Exciting New Features for the Stretch Web Interface!

Hi Stretch Community!

My name is Amal, and I’m a PhD Candidate at the University of Washington. I spent the summer interning at Hello Robot, where I did frontend and backend development for web interface features that enable operators to more effectively control Stretch. This includes:

  1. Features to enable operators to more easily manipulate objects with Stretch: (a) click-to-pregrasp; (b) gripper camera depth overlay; and (c) expanded gripper camera view;

  2. Features to enable operators to more easily use Stretch beyond line-of-sight: (a) real-time robot-to-operator audio streaming; and (b) operator-to-robot text-to-speech;

  3. Features to enable operators to use Stretch with a tablet as the end effector.

Along the way, I isolated and helped resolve bugs in the ROS2 Stretch driver and the web interface, which should make your experience working with these tools more reliable. If you notice more bugs, please raise them as Github Issues (stretch_ros2 issues, stretch_web_teleop issues) and we’d be happy to address them!

1. Improving Operators’ Manipulation Experience

Click-to-Pregrasp

Operators have told us that the process of aligning Stretch’s gripper with an object they want to manipulate can be tiresome and time-consuming. To address this, I developed a click-to-pregrasp feature. Inspired by this award-winning HRI paper, this feature enables users to click a point in the head Realsense’s camera feed and have the robot automatically rotate its base, adjust its arm’s lift and length, and rotate the wrist to align with the clicked object. We call the final position “pre-grasp” because it ends a few centimeters away from the object.

click_to_pregrasp_higher_quality

Gripper Depth Overlay

Operators have also told us that it can be hard to know when they have moved the gripper far enough to grasp an object. To address this, I developed a gripper depth overlay that highlights all the points within the gripper’s two fingers.

Expanded Gripper Camera View

Finally, operators have told us that after grasping an object, the entire gripper camera view in the web interface can get occluded, making it hard to see the context beyond the object. To address this, I added an expanded gripper view, which shows operators a wider field of view than the default gripper view.

Using These Features: Sample Workflow

A sample workflow for manipulating objects using these features is as follows:

  1. An operator navigates the base until their target object is within the graspable region (as indicated by the depth overlay on the head Realsense).

  2. They have the robot automatically align its gripper with the object, using the new click-to-pregrasp feature.

  3. They move the arm towards the object until it is within the gripper fingers, using the new gripper depth overlay feature.

  4. They move the arm around their environment, using the new expanded gripper camera view feature for more context regarding the object’s surroundings.

Example 1: Picking up Coffee from a Table

The below video shows an example of using a horizontal pre-grasp to pick up coffee from a table.

Example 2: Picking up Trash from the Floor

The below video shows an example of using a vertical pre-grasp to pick up trash from the floor and throw it in a trash can.

2. Facilitating Beyond Line-of-Sight Operation

Although operators are able to control Stretch beyond line-of-sight using the web interface’s camera streams (in fact, we have had people in Europe control a robot in the US!), the visual modality does not always capture enough environmental context. To help with beyond line-of-sight operation, I added bi-directional audio capabilities to the web interface.

Robot-to-Operator Audio Streaming

Operators have told us that hearing what is around Stretch can be useful for knowing when they have successfully put an object down. Thus, I implemented real-time robot-to-operator audio streaming, using WebRTC, so the operator can hear through Stretch’s microphone when they desire.

Operator-to-Robot Text-to-Speech

Although the above enables operators to hear what others are saying to Stretch, operators can’t have Stretch verbally respond to them. Thus, I implemented text-to-speech in the web interface using gTTS, with options to switch to other engines like pyttsx3.

I also developed a text-to-speech command-line interface (CLI) for operators who prefer Terminal-based interfaces (the CLI stores history and supports tab-autocomplete).

Text_to_Speech_CLI

Using These Features

Using these features requires some configuration of the robot’s audio settings, which can be done automatically by running colcon_cd stretch_web_teleop; ./configure_audio.sh. You have to re-run this script every time you turn on the robot or change the audio configuration.

Depending on the use case, it may be beneficial to add an external speaker or microphone to Stretch. We had success with the following external audio devices:

  1. Microphones: The Zealsound USB Condenser Mic can capture voices as far as ~15 ft away, but it has a large form factor. The SoundProfessionals Miniature USB Mic provides similar (but not as crisp) audio quality, with a much more minimalistic form factor.

  2. Speaker: The JBL Flip Speaker, connected to Stretch via Bluetooth, provides very loud and clear audio. The Anker Powerconf S330 USB Speakerphone also provides clear (but not as loud) audio, but it’s mic does not do as well as capturing far-away voices as the above mics.

To configure an external microphones and/or speakers, run colcon_cd stretch_web_teleop; ./configure_audio.sh -s <name_of_speaker> -m <name_of_microphone>. To get the name of the speaker/microphone, follow the instructions in the comment at the top of configure_audio.sh.

3. Using Stretch with a Tablet as an End-Effector

Finally, our past pilots with people with motor impairments and older adults have shown the promise of using Stretch with a tablet as an end-effector (e.g., to video call with loved ones, or to watch a video). Thus, I developed features to make it easier for operators to control the robot when a tablet is used as the end-effector. This includes:

  1. Allowing the operator to toggle between portrait and landscape mode;

  2. Allowing the operator to have the robot automatically detect a person and ergonomically place the tablet in front of the person.

@hello-lamsey will write about the technical details of the latter feature in his end-of-internship post; my primary contribution was wrapping his code into a ROS2 action and integrating it into the web interface.

Using These Features

These features can work with any 12” tablet. Open-source designs for the tablet mount and D405 mount will be released shortly, along with a guide on swapping the gripper for the tablet end-effector. The tablet and mounts will also be available for purchase; if interested, please inquire by emailing sales@hello-robot.com.

Once you’ve attached the tablet and run the script to configure the toll (stretch_configure_tool.py), you just need to launch the web interface! The below screenshot shows where you will find the buttons for the new tablet features.

Testing

We iteratively developed and tested these features through three (pilot) studies:

  1. A week-long deployment with Henry Evans, a long-time friend and supporter of Hello Robot who is non-speaking and paralyzed from the neck down;

  2. A two days of piloting with senior citizens at Our Place Social Center in Hillsboro, OR;

  3. A multi-week study at the Clark Lindsey Village senior living community in Urbana, IL, in collaboration with Wendy Rogers’ lab at the University of Illinois Urbana-Champaign.

Stay tuned for a future research paper that documents our findings in-detail :slightly_smiling_face:. In the meantime, here are some of the ways the studies led to iterative improvements on the above features:

  1. Henry loved click-to-pregrasp, and used it to pick up painting tools that he used to make art with his granddaughter. Unfortunately, the gripper depth overlay added too much lag to be usable. This led to me taking a deep dive into reducing lag, CPU usage, and memory usage of the backend web interface code. The results of this are documented in this PR (highlight: we got worst-case CPU usage of the top process from 904% to 172%).

  2. The senior citizens who came to Our Place enjoyed and benefited from bi-directional voice interaction with the robot. This was particularly important for the robot to speak its intention and to prompt them. However, this experience revealed that controlling robot motion and text-to-speech is a lot of cognitive workload for one operator, which led me to create the aforementioned CLI so a separate operator can be solely focused on text-to-speech.

  3. The staff at Clark Lindsey shared that the robot’s speaker was likely too soft and muffled for their residents to be able to understand Stretch. This prompted us to investigate the aforementioned external speakers to provide the necessary quality for this context.

Upgrade Guide

These features are available to anyone who has a Stretch 3 or a Stretch 2 with the upgraded Dex Wrist. To access the new features, follow these instructions to update your ROS2 workspace. If you are receiving a Stretch 3 after the date of this post, your Stretch will arrive with these new features pre-installed.

Conclusion

We’re excited to see what you do with these new features! If you have any questions, comments, or cool results please share them on this thread!

And finally, I’d like to extend a huge thank you to my colleagues @hello-vinitha, @vynguyen91, @hello-lamsey, @bshah, @bmatulevich , and the entire Hello Robot team for making this internship so memorable and enjoyable!

4 Likes