Hi @rishi, welcome to the forum! Stretch has a popular sensor called the Intel Realsense D435i in its head that would probably work best for interpreting hand or body gestures. The sensor returns regular color images, as well as depth images.
Before your robot arrives, you might explore gesture design in terms of these images. For just color images, you might get started with just a webcam. I’ve seen researchers use color images as input to a deep recurrent neural nets (RNNs), which is trained to identify gestures. For the depth images, there’s a number of depth cameras available: the Intel Realsense D435i (which we use), Microsoft Kinect V1/V2, and many more. At Georgia Tech, I was part of a research project called CopyCat that interpreted American Sign Language using Microsoft Kinect within a Unity game.
After your robot arrives, you can execute the navigation/calibration packages within stretch_ros and visualize the results in a tool that comes with ROS called “Rviz”. It’s common to use C# wrappers in order to visualize/use the data in Unity if you need that. These packages in stretch_ros do require a robot for execution. Also within stretch_ros, we include a few deep learning model for perception; one of which can identify body landmarks (e.g. elbow, wrist) on a depth image, which may be useful for body gestures.
Finally, here are links to stretch_robot_home.py and update_uncalibrated_urdf.sh. Hope this answers your questions and gives some helpful info. Let me know if you have any follow up questions.