{"id":2449,"date":"2026-06-12T10:48:24","date_gmt":"2026-06-12T10:48:24","guid":{"rendered":"https:\/\/hackmybox.com\/?p=2449"},"modified":"2026-06-12T15:47:03","modified_gmt":"2026-06-12T15:47:03","slug":"jarvis-ai-voice-assistant-project-documentation","status":"publish","type":"post","link":"https:\/\/hackmybox.com\/index.php\/2026\/06\/12\/jarvis-ai-voice-assistant-project-documentation\/","title":{"rendered":"Jarvis AI Voice Assistant Project"},"content":{"rendered":"<div class=\"vce-row-container\" data-vce-boxed-width=\"true\"><div class=\"vce-row vce-row--col-gap-30 vce-row-equal-height vce-row-content--top\" id=\"el-8ff235fd\" data-vce-do-apply=\"all el-8ff235fd\"><div class=\"vce-row-content\" data-vce-element-content=\"true\"><div class=\"vce-col vce-col--md-auto vce-col--xs-1 vce-col--xs-last vce-col--xs-first vce-col--sm-last vce-col--sm-first vce-col--md-last vce-col--lg-last vce-col--xl-last vce-col--md-first vce-col--lg-first vce-col--xl-first\" id=\"el-14b75cca\"><div class=\"vce-col-inner\" data-vce-do-apply=\"border margin background  el-14b75cca\"><div class=\"vce-col-content\" data-vce-element-content=\"true\" data-vce-do-apply=\"padding el-14b75cca\"><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-7985b4fa\" data-vce-do-apply=\"all el-7985b4fa\"><p><span style=\"font-size: 14pt;\"><strong>1. Project Objective<\/strong><\/span><\/p><p>Build a fully local AI voice assistant that can:<\/p><ul><li>Listen through a microphone<\/li><li>Convert speech to text using Whisper<\/li><li>Process requests using a local LLM (Gemma4 via Ollama)<\/li><li>Respond using text-to-speech<\/li><li>Run continuously in a loop<\/li><\/ul><p>Goal :&nbsp; to create a privacy-focused voice assistant that runs entirely on local hardware without relying on cloud AI services.<\/p><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-17862895\" data-vce-do-apply=\"all el-17862895\"><table><thead><tr><th>Stage<\/th><th>Component<\/th><th>Function<\/th><\/tr><\/thead><tbody><tr><td>Input<\/td><td>Microphone<\/td><td>Captures user voice input in real time<\/td><\/tr><tr><td>&nbsp;Speech-to-Text<\/td><td>Faster-Whisper<\/td><td>Converts spoken audio into text<\/td><\/tr><tr><td>&nbsp;AI Processing<\/td><td>Ollama + Gemma4<\/td><td>Generates intelligent responses from user input<\/td><\/tr><tr><td>Text-to-Speech<\/td><td>pyttsx3<\/td><td>Converts AI response into spoken audio<\/td><\/tr><tr><td>Output<\/td><td>User<\/td><td>Receives spoken response from Jarvis<\/td><\/tr><\/tbody><\/table><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-7d84374f\" data-vce-do-apply=\"all el-7d84374f\"><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-fb1a0fad\" data-vce-do-apply=\"all el-fb1a0fad\"><p><span style=\"font-size: 16pt;\"><strong>2. Environment Setup<\/strong><\/span><\/p><p>Create Project Directory<br>mkdir ~\/jarvis<br>cd ~\/jarvis<br>Create Python Virtual Environment<br>python3 -m venv venv<br>source venv\/bin\/activate<\/p><p>Verify virtual environment:<\/p><p>which python3<br>which pip<\/p><p>Expected output should point to:<\/p><p>~\/jarvis\/venv\/bin\/<\/p><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-b4e4bdee\" data-vce-do-apply=\"all el-b4e4bdee\"><table><thead><tr><th>Step<\/th><th>Command<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td>Create Project Directory<\/td><td><code>mkdir ~\/jarvis<\/code><\/td><td>Creates the main project folder<\/td><\/tr><tr><td>Move into Directory<\/td><td><code>cd ~\/jarvis<\/code><\/td><td>Navigates into the project folder<\/td><\/tr><tr><td>Create Virtual Environment<\/td><td><code>python3 -m venv venv<\/code><\/td><td>Creates an isolated Python environment<\/td><\/tr><tr><td>Activate Environment<\/td><td><code>source venv\/bin\/activate<\/code><\/td><td>Activates the virtual environment<\/td><\/tr><\/tbody><\/table><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-fe0d77e5\" data-vce-do-apply=\"all el-fe0d77e5\"><p><span style=\"font-size: 16pt;\">3. Python Dependencies<\/span><\/p><p>Install required packages:<\/p><p>pip install faster-whisper<br>pip install ollama<br>pip install pyttsx3<br>pip install sounddevice<br>pip install scipy<\/p><p>Or install all together:<\/p><p>pip install faster-whisper ollama pyttsx3 sounddevice scipy<\/p><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-09cf32bc\" data-vce-do-apply=\"all el-09cf32bc\"><table><thead><tr><th>Step<\/th><th>Command<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td>Install Faster-Whisper<\/td><td><code>pip install faster-whisper<\/code><\/td><td>Speech-to-text (Whisper model)<\/td><\/tr><tr><td>Install Ollama<\/td><td><code>pip install ollama<\/code><\/td><td>Connects Python to local LLM (Gemma4)<\/td><\/tr><tr><td>Install pyttsx3<\/td><td><code>pip install pyttsx3<\/code><\/td><td>Text-to-Speech engine (offline voice output)<\/td><\/tr><tr><td>Install sounddevice<\/td><td><code>pip install sounddevice<\/code><\/td><td>Captures audio from microphone<\/td><\/tr><tr><td>Install SciPy<\/td><td><code>pip install scipy<\/code><\/td><td>Saves audio as WAV files<\/td><\/tr><\/tbody><\/table><\/div><\/div><div class=\"vce-single-image-container vce-single-image--align-left\"><div class=\"vce vce-single-image-wrapper\" id=\"el-01db357d\" data-vce-do-apply=\"all el-01db357d\"><figure><div class=\"vce-single-image-figure-inner\" style=\"width: 1024px;\"><div class=\"vce-single-image-inner vce-single-image--absolute\" style=\"width: 100%; padding-bottom: 45.9961%;\"><img loading=\"lazy\" decoding=\"async\" class=\"vce-single-image\"  width=\"1024\" height=\"471\" srcset=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/scipy-e1781185424732-1024x471.png 1024w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/scipy-e1781185424732-320x147.png 320w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/scipy-e1781185424732-480x221.png 480w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/scipy-e1781185424732-800x368.png 800w\" src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/scipy-e1781185424732-1024x471.png\" data-img-src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/scipy-e1781185424732.png\" data-attachment-id=\"2467\"  alt=\"\" title=\"scipy\" \/><\/div><\/div><figcaption hidden=\"\"><\/figcaption><\/figure><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-8338100a\" data-vce-do-apply=\"all el-8338100a\"><p><span style=\"font-size: 16pt;\"><strong>4. Ollama Setup<\/strong><\/span><\/p><p>Verify Ollama installation:<\/p><p>ollama --version<\/p><p>List available models:<\/p><p>ollama list<\/p><p>Installed models:<\/p><p>gemma4:latest<br>gemma4:e4b<br>gemma3:12b<br>llama3.1:8b<\/p><p>Test Gemma4 manually:<\/p><p>ollama run gemma4:latest<\/p><p>Example:<\/p><p>&gt;&gt;&gt; What is cybersecurity?<\/p><p>Model successfully generated responses.<\/p><\/div><\/div><div class=\"vce-single-image-container vce-single-image--align-left\"><div class=\"vce vce-single-image-wrapper\" id=\"el-d7fedabf\" data-vce-do-apply=\"all el-d7fedabf\"><figure><div class=\"vce-single-image-figure-inner\" style=\"width: 1024px;\"><div class=\"vce-single-image-inner vce-single-image--absolute\" style=\"width: 100%; padding-bottom: 45.8984%;\"><img loading=\"lazy\" decoding=\"async\" class=\"vce-single-image\"  width=\"1024\" height=\"470\" srcset=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-46-02-e1781185657348-1024x470.png 1024w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-46-02-e1781185657348-320x147.png 320w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-46-02-e1781185657348-480x220.png 480w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-46-02-e1781185657348-800x367.png 800w\" src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-46-02-e1781185657348-1024x470.png\" data-img-src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-46-02-e1781185657348.png\" data-attachment-id=\"2469\"  alt=\"\" title=\"Screenshot from 2026-06-11 17-46-02\" \/><\/div><\/div><figcaption hidden=\"\"><\/figcaption><\/figure><\/div><\/div><div class=\"vce-single-image-container vce-single-image--align-left\"><div class=\"vce vce-single-image-wrapper\" id=\"el-bad0d3b2\" data-vce-do-apply=\"all el-bad0d3b2\"><figure><div class=\"vce-single-image-figure-inner\" style=\"width: 1024px;\"><div class=\"vce-single-image-inner vce-single-image--absolute\" style=\"width: 100%; padding-bottom: 56.4453%;\"><img loading=\"lazy\" decoding=\"async\" class=\"vce-single-image\"  width=\"1024\" height=\"578\" srcset=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-48-33-1024x579.png 1024w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-48-33-320x181.png 320w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-48-33-480x271.png 480w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-48-33-800x452.png 800w\" src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-48-33-1024x579.png\" data-img-src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-11-17-48-33.png\" data-attachment-id=\"2470\"  alt=\"\" title=\"Screenshot from 2026-06-11 17-48-33\" \/><\/div><\/div><figcaption hidden=\"\"><\/figcaption><\/figure><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-a8c003bd\" data-vce-do-apply=\"all el-a8c003bd\"><p><strong><span style=\"font-size: 16pt;\">5. Faster-Whisper Validation<\/span><\/strong><\/p><p>from faster_whisper import WhisperModel<\/p><p>model = WhisperModel(<br>\"base\",<br>device=\"cpu\",<br>compute_type=\"int8\"<br>)<\/p><p>segments, info = model.transcribe(<br>\"test.wav\",<br>vad_filter=True,<br>beam_size=5<br>)<\/p><p>print(\"Language:\", info.language)<\/p><p>text = \"\"<\/p><p>for segment in segments:<br>print(segment.text)<br>text += segment.text<\/p><p>print(\"\\nFULL:\", text)<\/p><p>&nbsp;<\/p><\/div><\/div><div class=\"vce-single-image-container vce-single-image--align-left\"><div class=\"vce vce-single-image-wrapper\" id=\"el-2a6757a4\" data-vce-do-apply=\"all el-2a6757a4\"><figure><div class=\"vce-single-image-figure-inner\" style=\"width: 1024px;\"><div class=\"vce-single-image-inner vce-single-image--absolute\" style=\"width: 100%; padding-bottom: 53.7109%;\"><img loading=\"lazy\" decoding=\"async\" class=\"vce-single-image\"  width=\"1024\" height=\"550\" srcset=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-08-53-55-e1781240198819-1024x550.png 1024w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-08-53-55-e1781240198819-320x172.png 320w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-08-53-55-e1781240198819-480x258.png 480w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-08-53-55-e1781240198819-800x430.png 800w\" src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-08-53-55-e1781240198819-1024x550.png\" data-img-src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-08-53-55-e1781240198819.png\" data-attachment-id=\"2472\"  alt=\"\" title=\"Screenshot from 2026-06-12 08-53-55\" \/><\/div><\/div><figcaption hidden=\"\"><\/figcaption><\/figure><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-76cd4bb8\" data-vce-do-apply=\"all el-76cd4bb8\"><table><thead><tr><th>Step<\/th><th>Code<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td>Import model<\/td><td><code>from faster_whisper import WhisperModel<\/code><\/td><td>Loads the Faster-Whisper library<\/td><\/tr><tr><td>Initialize model<\/td><td><code>WhisperModel(\"base\", device=\"cpu\", compute_type=\"int8\")<\/code><\/td><td>Loads lightweight Whisper model optimized for CPU<\/td><\/tr><tr><td>Transcribe audio<\/td><td><code>model.transcribe(\"test.wav\", vad_filter=True, beam_size=5)<\/code><\/td><td>Converts speech in audio file to text<\/td><\/tr><tr><td>Detect language<\/td><td><code>print(\"Language:\", info.language)<\/code><\/td><td>Displays detected language<\/td><\/tr><tr><td>Process segments<\/td><td><code>for segment in segments:<\/code><\/td><td>Iterates through transcription output<\/td><\/tr><tr><td>Combine text<\/td><td><code>text += segment.text<\/code><\/td><td>Builds full sentence from segments<\/td><\/tr><tr><td>Final output<\/td><td><code>print(\"\\nFULL:\", text)<\/code><\/td><td>Displays complete transcription<\/td><\/tr><\/tbody><\/table><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-56cd8309\" data-vce-do-apply=\"all el-56cd8309\"><p><span style=\"font-size: 16pt;\">6. Text-to-Speech Validation<\/span><\/p><p>import pyttsx3<\/p><p>engine = pyttsx3.init()<br>engine.say(\"Jarvis is online and working\")<br>engine.runAndWait()<\/p><\/div><\/div><div class=\"vce-single-image-container vce-single-image--align-left\"><div class=\"vce vce-single-image-wrapper\" id=\"el-48016e24\" data-vce-do-apply=\"all el-48016e24\"><figure><div class=\"vce-single-image-figure-inner\" style=\"width: 1024px;\"><div class=\"vce-single-image-inner vce-single-image--absolute\" style=\"width: 100%; padding-bottom: 15.1367%;\"><img loading=\"lazy\" decoding=\"async\" class=\"vce-single-image\"  width=\"1024\" height=\"155\" srcset=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-00-11-e1781240478624-1024x155.png 1024w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-00-11-e1781240478624-320x49.png 320w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-00-11-e1781240478624-480x73.png 480w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-00-11-e1781240478624-800x121.png 800w\" src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-00-11-e1781240478624-1024x155.png\" data-img-src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-00-11-e1781240478624.png\" data-attachment-id=\"2475\"  alt=\"\" title=\"Screenshot from 2026-06-12 09-00-11\" \/><\/div><\/div><figcaption hidden=\"\"><\/figcaption><\/figure><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-afb4b242\" data-vce-do-apply=\"all el-afb4b242\"><table><thead><tr><th>Syntax<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td><code>import pyttsx3<\/code><\/td><td>Imports the offline Text-to-Speech (TTS) library.<\/td><\/tr><tr><td><code>pyttsx3.init()<\/code><\/td><td>Initializes the speech engine and creates a TTS object.<\/td><\/tr><tr><td><code>engine.say()<\/code><\/td><td>Queues the specified text to be spoken.<\/td><\/tr><tr><td><code>engine.runAndWait()<\/code><\/td><td>Processes the speech queue and plays the audio through the speakers.<\/td><\/tr><\/tbody><\/table><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-c134620b\" data-vce-do-apply=\"all el-c134620b\"><p>7<span style=\"font-size: 16pt;\">. Initial AI Integration Test<\/span><\/p><p>Created file:<\/p><p>nano jarvis_test.py<\/p><p>Contents:<\/p><p>import ollama<br>import pyttsx3<\/p><p>engine = pyttsx3.init()<\/p><p>user_input = \"What is cybersecurity?\"<\/p><p>response = ollama.chat(<br>model=\"gemma4:latest\",<br>messages=[<br>{\"role\": \"user\", \"content\": user_input}<br>]<br>)<\/p><p>reply = response[\"message\"][\"content\"]<\/p><p>print(\"Jarvis:\", reply)<\/p><p>engine.say(reply)<br>engine.runAndWait()<\/p><p>Execute:<\/p><p>python3 jarvis_test.py<\/p><p>&nbsp;<\/p><\/div><\/div><div class=\"vce-single-image-container vce-single-image--align-left\"><div class=\"vce vce-single-image-wrapper\" id=\"el-82a5d4bd\" data-vce-do-apply=\"all el-82a5d4bd\"><figure><div class=\"vce-single-image-figure-inner\" style=\"width: 1024px;\"><div class=\"vce-single-image-inner vce-single-image--absolute\" style=\"width: 100%; padding-bottom: 50.3906%;\"><img loading=\"lazy\" decoding=\"async\" class=\"vce-single-image\"  width=\"1024\" height=\"516\" srcset=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-17-52-e1781241543369-1024x517.png 1024w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-17-52-e1781241543369-320x162.png 320w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-17-52-e1781241543369-480x242.png 480w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-17-52-e1781241543369-800x404.png 800w\" src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-17-52-e1781241543369-1024x517.png\" data-img-src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/Screenshot-from-2026-06-12-09-17-52-e1781241543369.png\" data-attachment-id=\"2481\"  alt=\"\" title=\"Screenshot from 2026-06-12 09-17-52\" \/><\/div><\/div><figcaption hidden=\"\"><\/figcaption><\/figure><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-a712c7b2\" data-vce-do-apply=\"all el-a712c7b2\"><table><thead><tr><th>Code Section<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td><code>import ollama<\/code><\/td><td>Imports the Ollama library to communicate with the local Gemma4 AI model.<\/td><\/tr><tr><td><code>import pyttsx3<\/code><\/td><td>Imports the Text-to-Speech library for offline voice output.<\/td><\/tr><tr><td><code>engine = pyttsx3.init()<\/code><\/td><td>Initializes the Text-to-Speech engine.<\/td><\/tr><tr><td><code>user_input = \"What is cybersecurity?\"<\/code><\/td><td>Defines the user prompt to be sent to the AI model.<\/td><\/tr><tr><td><code>ollama.chat(...)<\/code><\/td><td>Sends the prompt to Gemma4 and receives an AI-generated response.<\/td><\/tr><tr><td><code>reply = response[\"message\"][\"content\"]<\/code><\/td><td>Extracts the response text from the Ollama output.<\/td><\/tr><tr><td><code>print(\"Jarvis:\", reply)<\/code><\/td><td>Displays the AI response in the terminal.<\/td><\/tr><tr><td><code>engine.say(reply)<\/code><\/td><td>Sends the response text to the Text-to-Speech engine.<\/td><\/tr><tr><td><code>engine.runAndWait()<\/code><\/td><td>Processes and plays the spoken response through the speakers.<\/td><\/tr><\/tbody><\/table><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-703d71f5\" data-vce-do-apply=\"all el-703d71f5\"><p>8<span style=\"font-size: 16pt;\">. Real-Time Jarvis Assistant<\/span><\/p><p>Created:<\/p><p>nano jarvis_realtime.py<\/p><p>import sounddevice as sd<br>from scipy.io.wavfile import write<br>from faster_whisper import WhisperModel<br>import ollama<br>import pyttsx3<br>import time<\/p><p>samplerate = 16000<br>duration = 5<\/p><p>model = WhisperModel(<br>\"base\",<br>device=\"cpu\",<br>compute_type=\"int8\"<br>)<\/p><p>engine = pyttsx3.init()<\/p><p>def record_audio():<br>print(\"\ud83c\udfa4 Listening...\")<\/p><p>recording = sd.rec(<br>int(duration * samplerate),<br>samplerate=samplerate,<br>channels=1,<br>dtype=\"int16\"<br>)<\/p><p>sd.wait()<\/p><p>return recording<\/p><p>try:<br>while True:<\/p><p>audio = record_audio()<\/p><p>write(<br>\"temp.wav\",<br>samplerate,<br>audio<br>)<\/p><p>segments, info = model.transcribe(<br>\"temp.wav\"<br>)<\/p><p>text = \" \".join(<br>[seg.text for seg in segments]<br>).strip()<\/p><p>if not text:<br>continue<\/p><p>print(\"\\n\ud83e\uddd1 You said:\", text)<\/p><p>response = ollama.chat(<br>model=\"gemma4:latest\",<br>messages=[<br>{<br>\"role\": \"system\",<br>\"content\": (<br>\"You are Jarvis, a concise AI assistant. \"<br>\"Always reply in 1 or 2 short sentences. \"<br>\"Avoid long explanations.\"<br>)<br>},<br>{<br>\"role\": \"user\",<br>\"content\": text<br>}<br>]<br>)<\/p><p>reply = response[\"message\"][\"content\"]<\/p><p>print(\"\ud83e\udd16 Jarvis:\", reply)<\/p><p>engine.say(reply)<br>engine.runAndWait()<\/p><p>time.sleep(0.5)<\/p><p>except KeyboardInterrupt:<br>print(\"\\n\ud83d\udc4b Jarvis stopped.\")<\/p><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-403b2603\" data-vce-do-apply=\"all el-403b2603\"><table><thead><tr><th>Code Section<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td><code>import sounddevice as sd<\/code><\/td><td>Imports the library used to capture audio from the microphone.<\/td><\/tr><tr><td><code>from scipy.io.wavfile import write<\/code><\/td><td>Imports the function used to save recorded audio as a WAV file.<\/td><\/tr><tr><td><code>from faster_whisper import WhisperModel<\/code><\/td><td>Imports the Faster-Whisper speech-to-text model.<\/td><\/tr><tr><td><code>import ollama<\/code><\/td><td>Imports the Ollama library to communicate with the local Gemma4 model.<\/td><\/tr><tr><td><code>import pyttsx3<\/code><\/td><td>Imports the Text-to-Speech library for voice responses.<\/td><\/tr><tr><td><code>import time<\/code><\/td><td>Imports time-related functions for controlling execution delays.<\/td><\/tr><tr><td><code>samplerate = 16000<\/code><\/td><td>Sets the audio sample rate to 16 kHz, optimized for Whisper.<\/td><\/tr><tr><td><code>duration = 5<\/code><\/td><td>Sets the recording duration to 5 seconds per interaction.<\/td><\/tr><tr><td><code>WhisperModel(...)<\/code><\/td><td>Loads the Whisper model using CPU and INT8 optimization.<\/td><\/tr><tr><td><code>pyttsx3.init()<\/code><\/td><td>Initializes the Text-to-Speech engine.<\/td><\/tr><tr><td><code>record_audio()<\/code><\/td><td>Function that records audio from the microphone.<\/td><\/tr><tr><td><code>sd.rec()<\/code><\/td><td>Captures audio from the microphone and stores it in memory.<\/td><\/tr><tr><td><code>sd.wait()<\/code><\/td><td>Waits until recording is complete.<\/td><\/tr><tr><td><code>write(\"temp.wav\", ...)<\/code><\/td><td>Saves the recorded audio to a temporary WAV file.<\/td><\/tr><tr><td><code>model.transcribe(\"temp.wav\")<\/code><\/td><td>Converts speech from the audio file into text.<\/td><\/tr><tr><td><code>\" \".join([seg.text for seg in segments])<\/code><\/td><td>Combines all transcribed segments into a single text string.<\/td><\/tr><tr><td><code>if not text: continue<\/code><\/td><td>Skips processing if no speech was detected.<\/td><\/tr><tr><td><code>print(\"You said:\", text)<\/code><\/td><td>Displays the user's spoken input in the terminal.<\/td><\/tr><tr><td><code>ollama.chat(...)<\/code><\/td><td>Sends the transcribed text to Gemma4 and requests a response.<\/td><\/tr><tr><td><code>\"system\"<\/code> message<\/td><td>Instructs Jarvis to provide short and concise responses.<\/td><\/tr><tr><td><code>reply = response[\"message\"][\"content\"]<\/code><\/td><td>Extracts the AI-generated response from Ollama.<\/td><\/tr><tr><td><code>print(\"Jarvis:\", reply)<\/code><\/td><td>Displays Jarvis's response in the terminal.<\/td><\/tr><tr><td><code>engine.say(reply)<\/code><\/td><td>Sends the response text to the Text-to-Speech engine.<\/td><\/tr><tr><td><code>engine.runAndWait()<\/code><\/td><td>Speaks the response through the speakers.<\/td><\/tr><tr><td><code>time.sleep(0.5)<\/code><\/td><td>Adds a short delay before the next listening cycle.<\/td><\/tr><tr><td><code>while True:<\/code><\/td><td>Creates a continuous conversation loop.<\/td><\/tr><tr><td><code>KeyboardInterrupt<\/code><\/td><td>Allows the user to stop Jarvis gracefully using <code>Ctrl + C<\/code>.<\/td><\/tr><\/tbody><\/table><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-b9f725ac\" data-vce-do-apply=\"all el-b9f725ac\"><p><span style=\"font-size: 16pt;\">9. Running Jarvis<\/span><\/p><p>Activate environment:<\/p><p>cd ~\/jarvis<br>source venv\/bin\/activate<\/p><p>Start assistant:<\/p><p>python3 jarvis_realtime.py<\/p><p>Example output:<\/p><p>\ud83c\udfa4 Listening...<\/p><p>\ud83e\uddd1 You said: Hello Jarvis<\/p><p>\ud83e\udd16 Jarvis: Hello! How can I help you today?<\/p><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-901b096e\" data-vce-do-apply=\"all el-901b096e\"><p><span style=\"font-size: 16pt;\">10. Final Status Checklist<\/span><\/p><p>\u2705 Python Virtual Environment \u2014 Complete<\/p><p>\u2705 Faster-Whisper Installation \u2014 Complete<\/p><p>\u2705 Whisper Model Download \u2014 Complete<\/p><p>\u2705 Speech-to-Text (STT) Functionality \u2014 Complete<\/p><p>\u2705 Ollama Installation \u2014 Complete<\/p><p>\u2705 Gemma4 Model Setup \u2014 Complete<\/p><p>\u2705 Text-to-Speech (TTS) Functionality \u2014 Complete<\/p><p>\u2705 AI Response Generation \u2014 Complete<\/p><p>\u2705 Real-Time Voice Assistant Integration \u2014 Complete<\/p><\/div><\/div><div class=\"vce-single-image-container vce-single-image--align-left\"><div class=\"vce vce-single-image-wrapper\" id=\"el-af2897d9\" data-vce-do-apply=\"all el-af2897d9\"><figure><div class=\"vce-single-image-figure-inner\" style=\"width: 1024px;\"><div class=\"vce-single-image-inner vce-single-image--absolute\" style=\"width: 100%; padding-bottom: 71.7773%;\"><img loading=\"lazy\" decoding=\"async\" class=\"vce-single-image\"  width=\"1024\" height=\"735\" srcset=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/jarvis1-1024x735.jpeg 1024w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/jarvis1-2048x1471.jpeg 2x, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/jarvis1-320x230.jpeg 320w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/jarvis1-480x345.jpeg 480w, https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/jarvis1-800x574.jpeg 800w\" src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/jarvis1-1024x735.jpeg\" data-img-src=\"https:\/\/hackmybox.com\/wp-content\/uploads\/2026\/06\/jarvis1-scaled.jpeg\" data-attachment-id=\"2498\"  alt=\"\" title=\"jarvis1\" \/><\/div><\/div><figcaption hidden=\"\"><\/figcaption><\/figure><\/div><\/div><div class=\"vce-text-block\"><div class=\"vce-text-block-wrapper vce\" id=\"el-c0442bce\" data-vce-do-apply=\"all el-c0442bce\"><p><span style=\"font-size: 16pt;\">Conclusion<\/span><\/p><p>This project successfully built a fully local AI voice assistant capable of listening, understanding speech, processing requests with a local LLM, and responding through voice output. By integrating Faster-Whisper, Ollama (Gemma4), and pyttsx3, a complete end-to-end conversational system was achieved.<\/p><p>While Version 1 is fully functional, there is still plenty of room for improvement and new features to explore. This serves as a solid foundation for future enhancements and experimentation.<\/p><\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>1. Project ObjectiveBuild a fully local AI voice assistant that can:Listen through a microphoneConvert speech to text using WhisperProcess requests using a local LLM (Gemma4 via Ollama)Respond using text-to-speechRun continuously in a loopGoal :&nbsp; to create a privacy-focused voice assistant that runs entirely on local hardware without relying on cloud AI services.StageComponentFunctionInputMicrophoneCaptures user voice input [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2490,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","site-transparent-header":"default","prose-style":"enable","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","footnotes":""},"categories":[12],"tags":[],"class_list":["post-2449","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-labs"],"_links":{"self":[{"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/posts\/2449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/comments?post=2449"}],"version-history":[{"count":42,"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/posts\/2449\/revisions"}],"predecessor-version":[{"id":2499,"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/posts\/2449\/revisions\/2499"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/media\/2490"}],"wp:attachment":[{"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/media?parent=2449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/categories?post=2449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hackmybox.com\/index.php\/wp-json\/wp\/v2\/tags?post=2449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}