Labs

Jarvis AI Voice Assistant Project

1. Project Objective

Build a fully local AI voice assistant that can:

  • Listen through a microphone
  • Convert speech to text using Whisper
  • Process requests using a local LLM (Gemma4 via Ollama)
  • Respond using text-to-speech
  • Run continuously in a loop

Goal :  to create a privacy-focused voice assistant that runs entirely on local hardware without relying on cloud AI services.

StageComponentFunction
InputMicrophoneCaptures user voice input in real time
 Speech-to-TextFaster-WhisperConverts spoken audio into text
 AI ProcessingOllama + Gemma4Generates intelligent responses from user input
Text-to-Speechpyttsx3Converts AI response into spoken audio
OutputUserReceives spoken response from Jarvis

2. Environment Setup

Create Project Directory
mkdir ~/jarvis
cd ~/jarvis
Create Python Virtual Environment
python3 -m venv venv
source venv/bin/activate

Verify virtual environment:

which python3
which pip

Expected output should point to:

~/jarvis/venv/bin/

StepCommandPurpose
Create Project Directorymkdir ~/jarvisCreates the main project folder
Move into Directorycd ~/jarvisNavigates into the project folder
Create Virtual Environmentpython3 -m venv venvCreates an isolated Python environment
Activate Environmentsource venv/bin/activateActivates the virtual environment

3. Python Dependencies

Install required packages:

pip install faster-whisper
pip install ollama
pip install pyttsx3
pip install sounddevice
pip install scipy

Or install all together:

pip install faster-whisper ollama pyttsx3 sounddevice scipy

StepCommandPurpose
Install Faster-Whisperpip install faster-whisperSpeech-to-text (Whisper model)
Install Ollamapip install ollamaConnects Python to local LLM (Gemma4)
Install pyttsx3pip install pyttsx3Text-to-Speech engine (offline voice output)
Install sounddevicepip install sounddeviceCaptures audio from microphone
Install SciPypip install scipySaves audio as WAV files

4. Ollama Setup

Verify Ollama installation:

ollama --version

List available models:

ollama list

Installed models:

gemma4:latest
gemma4:e4b
gemma3:12b
llama3.1:8b

Test Gemma4 manually:

ollama run gemma4:latest

Example:

>>> What is cybersecurity?

Model successfully generated responses.

5. Faster-Whisper Validation

from faster_whisper import WhisperModel

model = WhisperModel(
"base",
device="cpu",
compute_type="int8"
)

segments, info = model.transcribe(
"test.wav",
vad_filter=True,
beam_size=5
)

print("Language:", info.language)

text = ""

for segment in segments:
print(segment.text)
text += segment.text

print("\nFULL:", text)

 

StepCodePurpose
Import modelfrom faster_whisper import WhisperModelLoads the Faster-Whisper library
Initialize modelWhisperModel("base", device="cpu", compute_type="int8")Loads lightweight Whisper model optimized for CPU
Transcribe audiomodel.transcribe("test.wav", vad_filter=True, beam_size=5)Converts speech in audio file to text
Detect languageprint("Language:", info.language)Displays detected language
Process segmentsfor segment in segments:Iterates through transcription output
Combine texttext += segment.textBuilds full sentence from segments
Final outputprint("\nFULL:", text)Displays complete transcription

6. Text-to-Speech Validation

import pyttsx3

engine = pyttsx3.init()
engine.say("Jarvis is online and working")
engine.runAndWait()

SyntaxDescription
import pyttsx3Imports the offline Text-to-Speech (TTS) library.
pyttsx3.init()Initializes the speech engine and creates a TTS object.
engine.say()Queues the specified text to be spoken.
engine.runAndWait()Processes the speech queue and plays the audio through the speakers.

7. Initial AI Integration Test

Created file:

nano jarvis_test.py

Contents:

import ollama
import pyttsx3

engine = pyttsx3.init()

user_input = "What is cybersecurity?"

response = ollama.chat(
model="gemma4:latest",
messages=[
{"role": "user", "content": user_input}
]
)

reply = response["message"]["content"]

print("Jarvis:", reply)

engine.say(reply)
engine.runAndWait()

Execute:

python3 jarvis_test.py

 

Code SectionPurpose
import ollamaImports the Ollama library to communicate with the local Gemma4 AI model.
import pyttsx3Imports the Text-to-Speech library for offline voice output.
engine = pyttsx3.init()Initializes the Text-to-Speech engine.
user_input = "What is cybersecurity?"Defines the user prompt to be sent to the AI model.
ollama.chat(...)Sends the prompt to Gemma4 and receives an AI-generated response.
reply = response["message"]["content"]Extracts the response text from the Ollama output.
print("Jarvis:", reply)Displays the AI response in the terminal.
engine.say(reply)Sends the response text to the Text-to-Speech engine.
engine.runAndWait()Processes and plays the spoken response through the speakers.

8. Real-Time Jarvis Assistant

Created:

nano jarvis_realtime.py

import sounddevice as sd
from scipy.io.wavfile import write
from faster_whisper import WhisperModel
import ollama
import pyttsx3
import time

samplerate = 16000
duration = 5

model = WhisperModel(
"base",
device="cpu",
compute_type="int8"
)

engine = pyttsx3.init()

def record_audio():
print("šŸŽ¤ Listening...")

recording = sd.rec(
int(duration * samplerate),
samplerate=samplerate,
channels=1,
dtype="int16"
)

sd.wait()

return recording

try:
while True:

audio = record_audio()

write(
"temp.wav",
samplerate,
audio
)

segments, info = model.transcribe(
"temp.wav"
)

text = " ".join(
[seg.text for seg in segments]
).strip()

if not text:
continue

print("\nšŸ§‘ You said:", text)

response = ollama.chat(
model="gemma4:latest",
messages=[
{
"role": "system",
"content": (
"You are Jarvis, a concise AI assistant. "
"Always reply in 1 or 2 short sentences. "
"Avoid long explanations."
)
},
{
"role": "user",
"content": text
}
]
)

reply = response["message"]["content"]

print("šŸ¤– Jarvis:", reply)

engine.say(reply)
engine.runAndWait()

time.sleep(0.5)

except KeyboardInterrupt:
print("\nšŸ‘‹ Jarvis stopped.")

Code SectionDescription
import sounddevice as sdImports the library used to capture audio from the microphone.
from scipy.io.wavfile import writeImports the function used to save recorded audio as a WAV file.
from faster_whisper import WhisperModelImports the Faster-Whisper speech-to-text model.
import ollamaImports the Ollama library to communicate with the local Gemma4 model.
import pyttsx3Imports the Text-to-Speech library for voice responses.
import timeImports time-related functions for controlling execution delays.
samplerate = 16000Sets the audio sample rate to 16 kHz, optimized for Whisper.
duration = 5Sets the recording duration to 5 seconds per interaction.
WhisperModel(...)Loads the Whisper model using CPU and INT8 optimization.
pyttsx3.init()Initializes the Text-to-Speech engine.
record_audio()Function that records audio from the microphone.
sd.rec()Captures audio from the microphone and stores it in memory.
sd.wait()Waits until recording is complete.
write("temp.wav", ...)Saves the recorded audio to a temporary WAV file.
model.transcribe("temp.wav")Converts speech from the audio file into text.
" ".join([seg.text for seg in segments])Combines all transcribed segments into a single text string.
if not text: continueSkips processing if no speech was detected.
print("You said:", text)Displays the user's spoken input in the terminal.
ollama.chat(...)Sends the transcribed text to Gemma4 and requests a response.
"system" messageInstructs Jarvis to provide short and concise responses.
reply = response["message"]["content"]Extracts the AI-generated response from Ollama.
print("Jarvis:", reply)Displays Jarvis's response in the terminal.
engine.say(reply)Sends the response text to the Text-to-Speech engine.
engine.runAndWait()Speaks the response through the speakers.
time.sleep(0.5)Adds a short delay before the next listening cycle.
while True:Creates a continuous conversation loop.
KeyboardInterruptAllows the user to stop Jarvis gracefully using Ctrl + C.

9. Running Jarvis

Activate environment:

cd ~/jarvis
source venv/bin/activate

Start assistant:

python3 jarvis_realtime.py

Example output:

šŸŽ¤ Listening...

šŸ§‘ You said: Hello Jarvis

šŸ¤– Jarvis: Hello! How can I help you today?

10. Final Status Checklist

āœ… Python Virtual Environment — Complete

āœ… Faster-Whisper Installation — Complete

āœ… Whisper Model Download — Complete

āœ… Speech-to-Text (STT) Functionality — Complete

āœ… Ollama Installation — Complete

āœ… Gemma4 Model Setup — Complete

āœ… Text-to-Speech (TTS) Functionality — Complete

āœ… AI Response Generation — Complete

āœ… Real-Time Voice Assistant Integration — Complete

Conclusion

This project successfully built a fully local AI voice assistant capable of listening, understanding speech, processing requests with a local LLM, and responding through voice output. By integrating Faster-Whisper, Ollama (Gemma4), and pyttsx3, a complete end-to-end conversational system was achieved.

While Version 1 is fully functional, there is still plenty of room for improvement and new features to explore. This serves as a solid foundation for future enhancements and experimentation.

Hi, I’m Ron