Repeated realsense sensor failures

Hi,

I’m constantly getting this error whenever I try to access the realsense stream through ROS (for hector slam).

ERROR: The device has been disconnected!
.
.
.
An exception has been thrown: xioctl(VIDIOC_S_FMT) failed, errno=5 Last Error: Input/output error
.
.
.
Hardware Notification:IR stream start failure, 1.67847e+12,Error,Hardware Error

It used to be a rare issue, but it is happening every time I start ros now.

I have a Stretch RE1 with ros-noetic and ubuntu 20.04. Happy to share more details as needed.

Note: I tried the realsense-viewer firmware update mentioned here. This error occurs even after that update.

Hi @srama2512, thank you for letting us know. We’ve seen a few reports of the The device has been disconnected! error in the past 2 weeks and @Mohamed_Fazil is actively investigating this issue. We don’t have a solution yet, but it would be helpful to get some more details about the dropout issue on your robot/try a potential solution we’ve seen work in our testing:

  • Does the dropout issue occur when you’re not using Hector SLAM? E.g. just running roslaunch stretch_core d435i_high_resolution.launch.
  • How long until dropout issue occurs? Is it after a few minutes/hours?
  • After the dropout, can you restart the program to work with the Realsense again?
  • What’s your robot’s serial number?

We suspect the dropout issue is related to the CPU load being experienced by Stretch’s onboard computer. In our early testing, we’ve seen that increasing the amount of swap memory allocated by the kernel for USB communication solves the issue on one robot (at the cost of some memory). If you’d like to try this, you would power cycle the robot, run the following command to increase the swap memory, and run your program as normal a few times.

sudo sh -c 'echo 1024 > /sys/module/usbcore/parameters/usbfs_memory_mb'
1 Like

@bshah - Thanks for your prompt response. Since this issue is not related to the other one, I’ll continue responding here.

Hi @srama2512, thank you for letting us know. We’ve seen a few reports of the The device has been disconnected! error in the past 2 weeks and @Mohamed_Fazil is actively investigating this issue. We don’t have a solution yet, but it would be helpful to get some more details about the dropout issue on your robot/try a potential solution we’ve seen work in our testing:

  • Does the dropout issue occur when you’re not using Hector SLAM? E.g. just running roslaunch stretch_core d435i_high_resolution.launch.

I’ve noticed this issue happens only when I’m using Hector SLAM and parallelly launch another navigation script that gets data from Hector SLAM. It does not happen when I run Hector SLAM in isolation. But I get a warning when I start Hector SLAM.

[INFO] [1678489656.722987]: For use with S T R E T C H (TM) RESEARCH EDITION from Hello Robot Inc.
[INFO] [1678489656.724066]: /stretch_driver started
[ WARN] [1678489657.127573464]: Hardware Notification:Depth stream start failure,1.67849e+12,Error,Hardware Error
[ INFO] [1678489657.574300573]: Finished waiting for tf, waited 2.007778 seconds
[INFO] [1678489658.572809]: mode = navigation
[INFO] [1678489658.574015]: /stretch_driver: Changed to mode = navigation
  • How long until dropout issue occurs? Is it after a few minutes/hours?

It happens immediately after I run the navigation script. Here it is for your reference.

  • After the dropout, can you restart the program to work with the Realsense again?

Yes. After I terminate the navigation program (on a server) and Hector SLAM (on Stretch), I am able to run realsense-viewer and see the depth map and RGB image.

  • What’s your robot’s serial number?

1097

We suspect the dropout issue is related to the CPU load being experienced by Stretch’s onboard computer. In our early testing, we’ve seen that increasing the amount of swap memory allocated by the kernel for USB communication solves the issue on one robot (at the cost of some memory). If you’d like to try this, you would power cycle the robot, run the following command to increase the swap memory, and run your program as normal a few times.

sudo sh -c 'echo 1024 > /sys/module/usbcore/parameters/usbfs_memory_mb'

I used htop to monitor CPU and memory usage when the error occurs. Both were not used heavily. Swap memory usage was close to 0.

@srama2512 Thank you for providing more information. Does the issue still persist after increasing the Kernel USB Swap memory to 1024 MB?

@Mohamed_Fazil - I shutdown Stretch, turned off the power switch, and restarted it again. Afterwards, I changed the USB swap memory to 1024. I’m still facing the issue when I run the script now.

Thanks, based on the fact that it only happens when you’re running Hector SLAM in parallel with that navigation script, I suspect it’s similar to the issue Mohamed has seen where it’s related to higher CPU usage. The difference is that it drops out immediately after running the script in your case.

I’d suggest that we jump on a support call to dig deeper into this issue. I think it’s likely that we can either find a solution or workaround that lets you keep using the Realsense. Would you send us an email at support@hello-robot.com to schedule a call?

@srama2512 In the meantime can you also try to reinstall the Librealsense2 packages and followed by rebuilding the realsense-ros packages?

Remove all existing Librealsense2 packages:

dpkg -l | grep "realsense" | cut -d " " -f 3 | xargs sudo dpkg --purge

Install Librealsense2 packages:

sudo apt install librealsense2 librealsense2-dkms librealsense2-udev-rules librealsense2-utils librealsense2-dev librealsense2-dbg 

Rebuild RealSense Ros v2.3.2 package already existing in the path ~/catkin_ws/src/realsense-ros:

cd ~/catkin_ws
rm -rf build/realsense-ros
catkin_make --pkg realsense2_camera --force-cmake
catkin_make --pkg realsense2_description --force-cmake

Let us know how it goes.

I’ve sent an email. Thank you.

@Mohamed_Fazil - So close. Reinstalling the librealsense2 packages did seem to be doing the trick for ~10 mins. But I’m seeing the same errors now.

@srama2512
Another solution you can try is downgrading the Librealsense SDK to v2.50.0 and D435i firmware to v13.00.50 which was suggested in a realsense forum post that these specific version packages might provide increased stability while using realsense-ros ROS1 version.

I created shell scripts that you can execute to perform these version specific downgrades.

Downgrade Librealsense2 to v2.50.0 and rebuild realsense-ros packages using librealsense_2_50_install.sh:

wget https://gist.githubusercontent.com/hello-fazil/196201e2cede9e4240bc4f5d4e65a088/raw/533bab5974548f662217c50b81b27ac19e1c3f69/librealsense_2_50_install.sh
sh librealsense_2_50_install.sh

Downgrade D435i firmware to v13.00.50 using rs_fw_13050.sh:

wget https://gist.githubusercontent.com/hello-fazil/fb8a4063203d50214ff4e7556560fa21/raw/4e135235f952c7acb127299662dfe3050f762502/rs_fw_13050.sh
sh rs_fw_13050.sh

Let us know if you find any difficulties.
If it does not solve the issue, you can choose to revert to the latest librealsense2 packages by re-running the commands in the previous reply and updating the D435i firmware to latest using realsense-viewer.

Thanks @Mohamed_Fazil ! I’m trying this now. Will keep you posted.

I’m still getting the same errors :frowning: . Whenever I make a suggested change, things look fine for 10-15 mins. But then the same errors start popping up. Could this be related to some overheating issues? In case it helps, here are some additional logs when the error happens.

[ WARN] [1678518557.564925826]: Param '/camera/rgb_camera/power_line_frequency' has value 3 that is not in the enum { {50Hz: 1} {60Hz: 2} {Disabled: 0} }. Removing this parameter from dynamic reconfigure options.
[ INFO] [1678518557.572376629]: Done Setting Dynamic reconfig parameters.
[ INFO] [1678518557.572857186]: depth stream is enabled - width: 640, height: 480, fps: 30, Format: Z16
[ INFO] [1678518557.573068861]: infra1 stream is enabled - width: 640, height: 480, fps: 30, Format: Y8
[ INFO] [1678518557.573290959]: infra2 stream is enabled - width: 640, height: 480, fps: 30, Format: Y8
[ INFO] [1678518557.573880019]: color stream is enabled - width: 640, height: 480, fps: 30, Format: RGB8
[ INFO] [1678518557.576516572]: gyro stream is enabled - fps: 400
[ WARN] [1678518557.576572706]: No mathcing profile found for accel with fps=250
[ WARN] [1678518557.576595802]: Using default profile instead.
[ INFO] [1678518557.576622030]: accel stream is enabled - fps: 100
[ INFO] [1678518557.576648613]: setupPublishers...
[ INFO] [1678518557.578685385]: Expected frequency for depth = 30.00000
[ INFO] [1678518557.611326235]: Expected frequency for infra1 = 30.00000
[ INFO] [1678518557.635472302]: Expected frequency for infra2 = 30.00000
[ INFO] [1678518557.659097598]: Expected frequency for color = 30.00000
[ INFO] [1678518557.679221807]: Expected frequency for aligned_depth_to_color = 30.00000
[ INFO] [1678518557.703143500]: setupStreams...
[ERROR] [1678518557.809640330]: An exception has been thrown: xioctl(VIDIOC_S_FMT) failed Last Error: Input/output error
[ERROR] [1678518557.809725945]: Exception: xioctl(VIDIOC_S_FMT) failed Last Error: Input/output error
Warning: Rate of calls to Pimu:trigger_motor_sync rate of 69.161580 above maximum frequency of 20.00 Hz. Motor commands dropped: 289
[ WARN] [1678518559.573465601]: Hardware Notification:Depth stream start failure,1.67852e+12,Error,Hardware Error
[ WARN] [1678518561.574355841]: Hardware Notification:IR stream start failure,1.67852e+12,Error,Hardware Error
[ WARN] [1678518563.575310247]: Hardware Notification:Depth stream start failure,1.67852e+12,Error,Hardware Error
Warning: Rate of calls to Pimu:trigger_motor_sync rate of 20.956850 above maximum frequency of 20.00 Hz. Motor commands dropped: 323
[ WARN] [1678518564.575810974]: Hardware Notification:IR stream start failure,1.67852e+12,Error,Hardware Error
Warning: Rate of calls to Pimu:trigger_motor_sync rate of 28.174273 above maximum frequency of 20.00 Hz. Motor commands dropped: 363
Warning: Rate of calls to Pimu:trigger_motor_sync rate of 24.335262 above maximum frequency of 20.00 Hz. Motor commands dropped: 389
[INFO] [1678518574.666178]: /stretch_driver joint_traj action: New trajectory received with joint_names = ['joint_lift', 'joint_arm_l0', 'joint_arm_l1', 'joint_arm_l2', 'joint_arm_l3', 'joint_gripper_finger_left', 'joint_wrist_roll', 'joint_wrist_pitch', 'joint_wrist_yaw', 'joint_head_pan', 'joint_head_tilt']

@srama2512 Thank you for trying out the debug methods and providing the outputs for us. It looks like the xioctl(VIDEOC_S_FMT) is a call made by the real sense node for the video drivers and it is erroring out with Input/output error which might also mean a hardware issue. Not sure if the thermals are affecting but to make sure you could try to remove the camera shell and try. Looking forward to the findings during your support call with @bshah.

@Mohamed_Fazil Thanks for your response. My NUC has stopped working now. I’m not sure what the issue is. I’ll respond on this thread once this issue is resolved.