Talk to Your Datastar Chat: Voice Input with the Web Speech API

Typing prompts wore thin. The chat should feel immediate. Tokens stream as you type. A mic button adds voice input. No backend changes. No transcription service. The browser's SpeechRecognition API writes into the same chat input.

First things to know

Not all browsers support this feature. Chrome, Edge, and Safari support it. Firefox ships with a flag to disable it by default. For Firefox, many users will not have it.
This feature is optional. If the browser lacks support, the mic button does not render. The text input remains usable.

How it plugs in

The chat input uses a signal named message. The mic button writes into this signal, just like typing.

Code example (simplified):

const SpeechRecognition = window.SpeechRecognition || (window as any).webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;

recognition.onresult = (event) => {
  let finalTranscript = "";
  let interimTranscript = "";

  for (let i = event.resultIndex; i < event.results.length; i++) {
    const transcript = event.results[i][0].transcript;
    if (event.results[i].isFinal) {
      finalTranscript += transcript;
    } else {
      interimTranscript += transcript;
    }
  }

  // Update the chat input with the transcription
  document.dispatchEvent(new CustomEvent("speech-result", {
    detail: { text: finalTranscript || interimTranscript },
  }));
};

Wiring into the chat

On the UI, an event listener patches the message signal when the speech result fires. The rest of the chat remains unchanged.

<input type="text" data-bind="message" placeholder="Ask something..." />

<button
  type="button"
  data-show="$speechSupported"
  onclick="$listening ? @post('/api/mic/stop') : @post('/api/mic/start')"
  aria-pressed="$listening"
>
  🎤
</button>

The mic button is shown only when the feature exists in the browser. When speech ends, the listening flag flips to false.

Edge cases

Permissions: mic access prompts on first click. If denied, handle gracefully by falling back to typing.
No speech: recognition ends after silence. Decide whether to auto restart or stop.
Submitting mid sentence: stop the recognition to avoid capturing extra audio.

I went with toggle instead of push-to-talk — click once to start listening, click again to stop. Typing doesn't require holding anything down either, so a toggle felt like the closer match to how the rest of the widget already behaves.

Why bother

None of this required a new dependency, a new API key, or a line on an invoice. It's the same trade I keep making with this site: reach for what the browser already gives you before reaching for a service that bills per request. The chat widget streams tokens back because that felt like the honest way to show an LLM thinking. Now it can listen the same way — no new infrastructure, just one more signal getting written into from a different direction.

It's turned into something I actually reach for when I'm thinking out loud about a prompt rather than composing it carefully — for quick, exploratory questions it's faster than typing. For anything that needs precise wording, I still default back to the keyboard.

The edges that'll actually bite you

A few things I hit building this that aren't obvious from the API docs:

Permissions. The first click prompts for microphone access, same as any getUserMedia call. If the user denies it, onerror fires with not-allowed — handle it by falling back silently to typing rather than showing a dead mic icon.
No-speech timeouts. Recognition sessions can end on their own after a period of silence, even with continuous: true. Listen for onend and decide whether to restart automatically or just flip $listening back to false.
Stop it when you submit. If the user hits enter mid-sentence, call recognition.stop() so it's not still capturing audio into a message that's already been sent.