Triggering Demos with Voice Commands

Hello!

I am trying to figure out how to get stretch to be controlled by voice commands. I figured a good starting place would be to trigger one of the demos (like grasp_object) to initiate by saying a certain word or phrase rather than a keyboard input. I have already played around with the stretch demos, and have gotten the respeaker speech to text functions working. I am new to hello robot and ROS and not sure I quite understand where to start in putting this all together now. Has anyone tried/ had any success communicating with stretch through voice prompts yet?

Thanks,
Julia

Hello! I’ve had success with using voice prompts on Stretch + ROS. Here is a link to a ROS node that records a snippet of audio using Stretch’s internal mic, and counts how many unique colors were spoken in the snippet. For your application, you could create logic to execute different actions based on the strings detected in the recording. You could use the roslaunch API to launch a launch file from stretch_demos, but make sure you only run one Stretch driver at a time.

We used the SpeechRecognition python package. PyPI docs had everything we needed to set it up, such as this example.

See this launch file for how we included the node in our application.

Snippet:

import speech_recognition as sr

recording_length = 10.  # seconds

r = sr.Recognizer()
with sr.Microphone() as source:
    # record
    audio_clip = r.record(source, duration=recording_length)

    # recognize
    try:
        text_string = r.recognize_google(audio_clip)  # see PyPI docs for other options
    except sr.UnknownValueError:
        print("Speech recognizer could not understand audio")
    except sr.RequestError as e:
        print("Speech recognition error; {0}".format(e))

    # execute
    if "grasp" in text_string:
        roslaunch_grasp_demo()  # implement this elsewhere using the roslaunch package
    else:
        print("No command found in string")
3 Likes

Hi @jgangemi, welcome to the forum! It’s a great question, and I really like @lamsey’s reply. I would approach it pretty much the way that he has described.

I wanted to highlight a few other implementations I’ve seen from members on this forum, and hopefully, they will be useful for you.

  • @asanchez made a tutorial that is available on our docs, called Voice Teleoperation of Base. Phrases like “forward” triggers motions of the mobile base, and there a code explanation section that breaks down what each part of the code is doing.
  • @hello-garv created a node for the Stretch Web Interface, called speech_commands.py, which builds on Alan’s tutorial. It enables voice teleop for Stretch’s other joints, but also triggers ROS services to replay saved poses. Her post has more details.
  • @FergusKidd posted about example code using Azure speech-to-text and LUIS (a cloud service that understands intent from phrases) to control a Stretch. Their project went even further to use text-to-speech and Q&A services to get Stretch to give answers back.
1 Like

Hey @jgangemi,

To trigger the grasp object demo using speech you could build up on @lamsey’s response while using the keyboard_teleop node as a reference. This node illustrates the way the ‘/grasp_object/trigger_grasp_object’ service can be triggered using ROS ServiceProxy like here. Adding to @lamsey’s response, here’s a snippet that should work:

recording_length = 10. # seconds

r = sr.Recognizer()

rospy.wait_for_service('/grasp_object/trigger_grasp_object')
rospy.loginfo('Node connected to /grasp_object/trigger_grasp_object.')
trigger_grasp_object_service = rospy.ServiceProxy('/grasp_object/trigger_grasp_object', Trigger)

with sr.Microphone() as source:
    # record
    audio_clip = r.record(source, duration=recording_length)

    # recognize
    try:
        text_string = r.recognize_google(audio_clip)  # see PyPI docs for other options
    except sr.UnknownValueError:
        print("Speech recognizer could not understand audio")
    except sr.RequestError as e:
        print("Speech recognition error; {0}".format(e))

    # execute
    if "grasp" in text_string:
        trigger_request = TriggerRequest() 
        trigger_result = trigger_grasp_object_service(trigger_request)
        print('trigger_result = {0}'.format(trigger_result))
    else:
        print("No command found in string")

Once you wrap this in a ROS node and launch it along with the grasp_object node like in this launch file, you’d have a working grasp object demo that runs using voice commands.

Let me know if you need further assistance with this.

Best,
Chintan

3 Likes

Thank you all so much!! These are all very helpful examples, and I will be trying them all out today. I will let you know how it goes :slight_smile: Thank you all again!!