Gemma 4 VLA Demo on Jetson Orin Nano Super
Talk to Gemma 4, and she’ll decide on her own if she needs to look through the webcam to answer you. All running locally on a Jetson Orin Nano Super.
You speak → Parakeet STT → Gemma 4 → [Webcam if needed] → Kokoro TTS → Speaker
Press SPACE to record, SPACE again to stop. This is a simple VLA: the model decides on its own whether to act based on the context of what you asked, no keyword triggers, no hardcoded logic. If your question needs Gemma to open her eyes, she’ll decide to take a photo, interpret it, and answer you with that context in mind. She’s not describing the picture, she’s answering your actual question using what she saw.
And honestly? It’s pretty impressive that this runs on a Jetson Orin Nano. 🙂
Get the code
The full script