Stretch_body Python SDK and Python Multithreading

I noticed that the stretch_body SDK heavily relies on Python threads.

For example, take the following snippet:

robot.base.translate_by(0.1)
robot.push_command()
time.sleep(0.05)
robot.base.translate_by(0.1)
robot.push_command()

The time.sleep is necessary here to have the main thread yield to the controller thread correctly.
Without the sleep, whether the first translate_by actually executes or not is left to chance.
Additionally, the amount of time to sleep also seems to matter in executing some command correctly (maybe the controller thread is yielding back too quickly to the main thread before it’s logic is finished running).

This is a known issue when one relies on Python threads, because they don’t actually run parallely with each other, because of the GIL.

I wonder if you have considered using Python multiprocessing, which alleviates these issues.

In droidlet for example, we built up a small utility called BackgroundTask, which uses the Python-inbuilt multiprocessing package to truly run parallel work. This process runs as a daemon process, so that when the parent process exits, this process is automatically collected. Additionally, things such as exception-propagation are also handled similar to threads. Lastly, you can register an __atexit__ handler to do process cleanup as well, i.e. if someone presses ctrl+c or does a kill -9, the __atexit__ handler can close connections to the hardware interfaces and such.

If you have considered or are considering using Python multiprocessing instead of multithreading, I am happy to send some patches to help with this direction.

Thanks for the discussion topic – threading with Python and the GIL can certainly be tricky.

It might help if I first clarify how Stretch Body is arranged. It is split into Status (data pulled from the control boards) and Command (data pushed to the control boards).

Status data is polled in the background using helper threads. This allows the robot.status dictionary to always have up-to-date state information.

The Command API is not managed via a thread. So in your example, robot.push_command blocks until the command has successfully been pushed down to the motor controllers.

Note, however, that it returns almost immediately, before the motion has completed. Your program can poll for completion (eg, robot.base.left_wheel.status['is_moving']). Or it can send a new command down before motion has completed, (as in your example).

Also, I should point out that the translate_by and move_by commands in Stretch Body move relative to the current encoder reading. Therefore the expected behavior in your example is to move by 0.1m (presuming no motion has commenced in the .05s).

Regarding multi-processing - it is something we have considered and I agree it may be a better solution than the current threading. As you note here, thread starvation between IO Bound and CPU bound threads can be problematic (fix is under testing for this issue btw). Multi-processing might be a good option in this regard. Any additional pointers / patches would be welcome.

Another challenge is coordinating comms across the multiple USB devices while maintaining loop rates in a non-RT system. In this regard, we have an Asyncio implementation of Stretch Body currently under testing for Python3.

Thanks for the larger context. This is super helpful.
I’ll open some proposals when I get time to move things to multiprocessing (or I’ll create the robot object in a separate process).

Relatedly, the is_moving stuff is completely broken, I think.
I opened an issue: base wheel `is_moving` is broken · Issue #47 · hello-robot/stretch_body · GitHub

Regarding is_moving:

The value should be correct if you add the following to your user YAML ($HELLO_FLEET_PATH/$HELLO_FLEET_ID/stretch_re1_user_params.yaml):

hello-motor-arm:
  gains:
    vel_near_setpoint_d: 3.5
hello-motor-lift:
  gains:
    vel_near_setpoint_d: 3.5
hello-motor-right-wheel:
  gains:
    vel_near_setpoint_d: 3.5
hello-motor-left-wheel:
  gains:
    vel_near_setpoint_d: 3.5

The is_moving flag compares the current encoder velocity to this value. The velocity signal can be noisy and non-zero even when there is no motion (artifact of hall effect encoders). This change to your YAML overrides the factory settings and gives a bit more margin to allow for greater encoder noise. As the encoder noise can vary per robot, it can be necessary to adjust the factory settings to be less strict.

You can also use robot.base.left_wheel.motor.status['near_pos_setpoint'] to check if a commanded joint position has been accomplished. This compares the current trajectory error against the YAML value pos_near_setpoint_d. As it uses position and not velocity it will be less subject to encoder noise.

1 Like

Just for closure (and for future readers), setting vel_near_setpoint_d to 3.5 didn’t work, not entirely sure why, because looking at some raw values, and looking at the firmware code, it should have.

However, what did end up working was this:

robot.base.translate_by(dist)
robot.push_command()

# wait for the velocity of both wheels to be under reasonable limits
time.sleep(0.2)
while(abs(robot.base.left_wheel.status['vel']) >= 0.1 or abs(robot.base.right_wheel.status['vel']) >= 0.1):
    time.sleep(0.05)

That’s strange - as you likely saw in the code it is a simple threshold. We tested it on a robot here as well. The only thing I could imagine is that the YAML defaults didn’t actually get overwritten as expected.

If you’ve a recent (v0.1.x) version of Stretch Body installed you can double check that your params are actually set as expected:

stretch_params.py | grep vel_near_setpoint_d

Hi @smth, I was seeing some flickering in the is_moving flag as well. Tuning vel_near_setpoint_d to 3.5 and higher reduces the noise, but doesn’t eliminate it entirely. I’ve added a filter on the is_moving flag called is_moving_filtered (e.g. robot.base.left_wheel.status['is_moving_filtered']). This behaves more reliably.

There’s an pre-release out with this fix. Let me know if it works for your case.

$ pip3 install -U --pre hello-robot-stretch-body
1 Like

I just want to report that is_moving_filtered, while an improvement, is still fairly unreliable.

I’ve found it most unreliable on base rotations.

For other readers, here’s code I’m using

is_moving = True
while is_moving:
  time.sleep(0.1)
  robot.pull_status()
  left_wheel_moving = robot.base.left_wheel.status['is_moving_filtered']
  right_wheel_moving = robot.base.right_wheel.status['is_moving_filtered']
  is_moving = left_wheel_moving or right_wheel_moving

The robot starts rotating, but before it finishes rotating, is_moving returns False

1 Like

I think I found something a bit more reliable:

is_moving = True
while is_moving:
  time.sleep(0.1)
  robot.pull_status()
  left_wheel_moving = robot.base.left_wheel.status['is_moving_filtered'] or robot.base.left_wheel.status['is_moving']
  right_wheel_moving = robot.base.right_wheel.status['is_moving_filtered'] or robot.base.right_wheel.status['is_moving']
  is_moving = left_wheel_moving or right_wheel_moving

1 Like

You might try using the is_mg_moving flag. This flag indicates if the underlying trapezoidal motion generator is commanding motion to the PID controller. When the trapezoidal motion generators arrives at its setpoint this flag will be zero.

This flag can be preferable as it isn’t dependent on the motor encoder which inherently has sensor noise. The PID controller tracks the commanded motion accurately so it serves as a reasonable approximation of the true physical motion. The exception being that an exception to the motion (say, a disturbance causes the wheel to stall) will not be caught.

The following diagram helps illustrate:

Here’s a code snippet to test this use case:

#!/usr/bin/env python
import stretch_body.robot

def wait_until_not_moving(robot):
    import time
    # Note, this is necessary to give time for the trajectory to begin executing before polling for completion
    time.sleep(0.1)
    while robot.base.left_wheel.status['is_mg_moving'] or robot.base.right_wheel.status['is_mg_moving']:
        time.sleep(0.1)

robot = stretch_body.robot.Robot()
robot.startup()


robot.base.translate_by(.005)
robot.push_command()
wait_until_not_moving(robot)

robot.base.translate_by(-.005)
robot.push_command()
wait_until_not_moving(robot)

robot.stop()

This will cause the robot to move forward 5mm, then back 5mm.

1 Like

thanks so much, will give this a try!