Using another microphone instead of Respeaker

Hello!

I was looking to use another microphone instead of Respeaker for the Voice Teleop demo and in general. We have the Rode Wireless Pro, and was wondering if there was a way to use that instead? I’m pretty sure these mics do not support ROS/ROS2 and I was thinking of using recordings and using a speech to text model to transcribe and then feed it in to the demo.

Thanks!

Hi @dsquid,

Good question! Some previous work has used the SpeechRecognition python package; see this forum post for more explanation of working with that package.

Since that forum post is a bit old, and any ROS code in the old post is ROS1, here’s an isolated tutorial for using a different USB microphone in ROS2; I hope that it helps with your integration!

Set up USB Microphone

To test, I plugged my Samson Condenser Microphone into a USB port on the base of a Stretch RE3. You can change the input device in Ubuntu sound settings as shown below; you may have to adjust the gain to get suitable performance with your Rode mic.

ROS2 Node

I whipped up a ROS2 node that is a simplified adaptation of the “color listening” ROS1 node linked in the previous form post (link to that code here). This ROS2 node will use your system’s default microphone and use the SpeechRecognition package to transcribe audio from an audio clip.

Hopefully this code can serve as a useful reference, or as a starting point for building your application.

Example Usage

  1. Copy this node into your ROS2 project and add it to your setup.py file.

  2. In a terminal, run ros2 run your_package node_name.

  3. Say something into your mic, and the output should be something like:

[INFO] [1722017398.853921312] [transcriber_node]: Processing Audio...
[INFO] [1722017399.414619283] [transcriber_node]: recognized text: what's up squid
[INFO] [1722017399.418863376] [transcriber_node]: Transcriber done.

Node Code

import threading

import rclpy
from rclpy.executors import MultiThreadedExecutor
from rclpy.node import Node

# Speech Recognition
import speech_recognition as sr
from speech_recognition.audio import AudioData

class Transcriber(Node):
    def __init__(self):
        super().__init__("transcriber_node")

        # Initialize speech recognizer
        self.recognizer = sr.Recognizer()

    def _predict_text(self, audio_clip: AudioData) -> str:
        """
        Predicts text contained in an audio snippet.

        Parameters
        ----------
        audio_clip : AudioData
            Audio data output from the SpeechRecognition recognizer object

        Returns
        -------
        str
            English text contained in the audio data
        """

        self.get_logger().info("Processing Audio...")
        try:
            return self.recognizer.recognize_google(audio_clip)  # you can change this to be something else
        except sr.UnknownValueError:
            self.get_logger().info("Speech recognizer could not understand audio")
            return None
        except sr.RequestError as e:
            self.get_logger().info("Speech recognition error; {0}".format(e))
            return None

    def start_recording(self, recording_length_s: float=3.) -> str:
        """
        Triggers an audio recording and returns text contained in the recording.

        Parameters
        ----------
        recording_length_s : float
            Number of seconds to record for

        Returns
        -------
        str
            English text contained in the audio data
        """

        with sr.Microphone() as source:
            audio_clip = self.recognizer.record(source, duration=recording_length_s)
            text_string = self._predict_text(audio_clip)
            self.get_logger().info("recognized text: {}".format(text_string))
            return text_string

    def run(self):
        """
        Main method for node.
        """

        # you could put a loop here
        self.start_recording()
        self.get_logger().info("Transcriber done.")

def main():
    rclpy.init()
    node = Transcriber()
    executor = MultiThreadedExecutor(num_threads=4)

    # Spin in the background since detecting faces will block the main thread
    spin_thread = threading.Thread(
        target=rclpy.spin,
        args=(node,),
        kwargs={"executor": executor},
        daemon=True,
    )
    spin_thread.start()

    # Run node
    try:
        node.run()
    except KeyboardInterrupt:
        pass

    # Terminate this node
    node.destroy_node()
    rclpy.shutdown()

    # Join the spin thread (so it is spinning in the main thread)
    spin_thread.join()


if __name__ == '__main__':
    main()

1 Like

Hi @dsquid ,

Adding on to @hello-lamsey 's great response, here are some additional pointers:

  1. We have investigated external mics on Stretch. We’ve gotten the best “room-level coverage” (e.g., voices audible from 10-15 ft away) with either this condenser mic mounted on the base or the head, or this USB-dongle mic mounted on the head.
  2. You have to correctly configure which microphone your system uses, as well as the microphone’s gain. If you connect your robot to a monitor, this can be easily done through Ubuntu’s System Settings, as @hello-lamsey pointed out. But if you’d prefer a terminal-based solution, this configure_audio.sh script should also do that for you. Essentially, it unmutes the mic and speaker, sets the speaker to the speaker you specify (defaulting to the built-in robot one), and sets the mic to the mic you specify (if you only have one external mic plugged in, it sets the mic to that mic even without you specifying it).
  3. In the above applications, we used Javascript to access the mic (this was for our web teleop application), so we didn’t need to use ROS2 to process the audio. If you are interested in speech-to-text though, we did create a ROS2 node for text-to-speech here, and a ROS2 node CLI to easily send text-to-speech commands here.
1 Like