2026-05-20 | 📺 Word Meter Picture-in-Picture — Lessons from a Failed Experiment 🏳️

🏳️ This post documents a feature exploration that didn’t ship. The goal was to add Picture-in-Picture support to the Word Meter so the word count would stay visible when switching apps. It worked on desktop Chrome, but the mobile story turned out to be a dead end.
🧭 The Ask
🙋 The original question: when I switch apps on my phone, can the Word Meter automatically pop into a Picture-in-Picture overlay and keep counting?
📱 This is a compelling idea. The Word Meter counts words spoken aloud using the Web Speech API. It only works while the page is visible, so the natural failure mode is forgetting to tap Stop before checking a notification. A floating overlay would solve that.
🔬 What I Tried
Attempt 1: Document Picture-in-Picture (Desktop Only)
🪟 The Document Picture-in-Picture API (window.documentPictureInPicture.requestWindow()) lets you pop arbitrary HTML into a floating overlay window. I built a full implementation:
- A thin FFI module wrapping the four browser calls: availability check, request window, attach close listener, and close.
- A PureScript capability class so the PiP window lifecycle was testable.
- A “Pop out count” button in the Word Meter UI that mirrored the live word count and listening status into the floating window.
- Reducer actions for
pipOpenandpipStatusstate. - Three Playwright tests covering the idle label, unsupported-API diagnostics, and the full open/close toggle.
✅ This worked great on desktop Chrome. The floating window stayed on top across app switches, and SpeechRecognition kept running because the source page was technically still in the foreground.
🚫 But then we tested on mobile Chrome: “picture-in-picture not supported on this browser.”
Root Cause Analysis: Why Mobile Chrome Says “Not Supported”
🔍 I did a full 5-Whys investigation:
- Why does the button show “not supported”? Because
window.documentPictureInPictureisundefinedon mobile Chrome. - Why is the API undefined on mobile? Because Document Picture-in-Picture is intentionally desktop-only in Chromium (Windows, macOS, Linux, ChromeOS).
- Why is it desktop-only? The spec authors designed it for desktop windowing systems. Mobile platforms have their own PiP surfaces managed at the OS level.
- Why doesn’t the Android PiP permission help? The per-app PiP toggle in Android settings governs
HTMLVideoElement.requestPictureInPicture()— a completely separate, older API for floating video players. - Why can’t we use a Permissions-Policy or flag? There is no flag. The API simply does not exist in mobile Chromium builds.
📊 Browser support as of May 2026:
| Browser | Platform | Document PiP | Video PiP |
|---|---|---|---|
| Chrome | Desktop (Win/Mac/Linux/ChromeOS) | Yes (v116+) | Yes |
| Chrome | Android | No | Yes |
| Safari | macOS | No | Yes |
| Safari | iOS | No | Yes |
| Firefox | All | No | Yes |
| Samsung Internet | Android | No | Yes |
In prose: Desktop Chrome (and other desktop Chromium browsers like Edge and Opera) support Document Picture-in-Picture starting at version 116. All mobile browsers — Android Chrome, iOS Safari, Samsung Internet — lack Document PiP entirely but do provide the older Video PiP API. Desktop Safari and all Firefox builds also lack Document PiP and only support Video PiP.
Attempt 2: Video PiP Fallback for Mobile
🎬 Since the legacy video PiP API (video.requestPictureInPicture()) is available on mobile, I built a fallback:
- Paint the live word count onto an offscreen 320x220
<canvas>. - Capture the canvas via
canvas.captureStream(2)into aMediaStream. - Attach the stream to a hidden
<video>element. - Call
video.requestPictureInPicture()from the button’s click handler. - The opaque
PipWindowhandle carried akind: "document" | "video"tag internally, so PureScript code didn’t need to know which path was active.
🎉 This technically worked — you could get a floating window showing the word count on mobile Chrome.
😞 But here’s the showstopper: SpeechRecognition suspends the moment the source page is fully backgrounded, regardless of any video PiP window floating on top. The PiP window would freeze on the last count the meter saw before the tab went into the background. The count only updated while the meter tab was in the foreground (split-screen, or briefly checking another app and coming back).
🏳️ Why We Stopped
📉 The whole point was to keep counting while switching apps. On desktop, Document PiP achieves this because the source page stays in the foreground. On mobile, neither PiP path solves the underlying problem: SpeechRecognition gets killed by the OS when the page hides.
🤷 You could argue the frozen-count floating window has some value — it shows you the count without navigating back to the tab. But it adds significant complexity (two PiP code paths, canvas rendering, stream capture, lifecycle management, ~26 cascading constraint changes through the PureScript codebase) for a feature that doesn’t do the one thing users actually want on mobile.
💡 Bryan made the right call: this isn’t useful enough to justify the complexity. We’re reverting all the code.
📝 Lessons Learned
-
Document PiP is desktop-only by design. If your feature targets mobile, do not assume
window.documentPictureInPictureexists. Check the spec — it was designed for desktop windowing systems. -
The Android PiP toggle is for video elements only. When a user says “I have PiP enabled in my app settings,” they’re talking about
HTMLVideoElement.requestPictureInPicture(), not Document PiP. These are completely separate APIs. -
SpeechRecognitiondoes not survive page backgrounding. Chromium suspends the Web Speech API when the source page is hidden, regardless of video PiP windows, service workers, or any other trick. This is the real blocker for a mobile word counter. -
requestWindow()requires a trusted user gesture. You cannot auto-enter Document PiP onvisibilitychange, timers, or any other non-gesture trigger. There is no Permissions-Policy that grants ambient Document PiP. TheautoPictureInPictureentitlement applies only to video elements. -
Prototype on all target platforms early. I shipped a working desktop implementation before discovering the mobile API gap. A quick
typeof window.documentPictureInPicturecheck on a phone would have surfaced the problem in minutes. -
Complexity cost matters. The Document PiP capability added a constraint that cascaded through ~26 function signatures in the PureScript codebase. Even if the feature worked perfectly, that’s a real maintenance burden for a niche UI affordance.
🔮 Ideas for Future Attempts
If someone wants to revisit this, here are the paths worth exploring:
-
Replace Web Speech with
getUserMedia+ custom STT. The fundamental blocker is thatSpeechRecognitiondies on page hide. A raw microphone stream viagetUserMediacan survive in the background (Android Chrome sustains it while a PiP video window is active). Pipe the audio to an on-device or cloud speech-to-text engine and the count could keep ticking. This is a major undertaking — it means replacing the entire recognition pipeline (on-device pre-flight, cloud fallback, dedup/restart machinery). -
Service Worker with audio worklet. Process audio in a service worker that stays alive independently of the page. Combined with
getUserMedia, this could keep recognition running. But service workers have their own lifecycle constraints, and bridging audio data between contexts adds complexity. -
Native app wrapper. A thin native Android/iOS wrapper (Capacitor, React Native, or a custom WebView) could keep the web page “visible” from the browser engine’s perspective, or use platform-native speech recognition that doesn’t depend on page visibility.
-
Wait for platform changes. Document PiP is still evolving. If Chromium ever ships it on Android, or if a future spec revision adds an
autoPictureInPictureentitlement for documents, the desktop implementation from this PR could be resurrected with minimal changes. -
Desktop-only feature. Accept that PiP is desktop-only and ship it as such. The desktop implementation worked well. Whether it’s worth the ~26-constraint cascade for a desktop-only affordance is a judgment call.
📎 Artifacts
- This blog post is all that remains of the PiP experiment. All code, specs, and tests were reverted.
- The PR discussion thread has the full technical conversation, including the 5-Whys RCA and per-browser support table.
📚 Book Recommendations
📖 Similar
- 💺🚪💡🤔 The Design of Everyday Things by Don Norman is relevant because the core failure here — an API that looks like it should work on mobile but silently does not — is a classic affordance mismatch. Norman’s framework for mapping user expectations to system capabilities explains exactly why the Android PiP permission toggle misleads users into thinking Document PiP is available.
- Release It! by Michael T. Nygard is relevant because it covers the principle of knowing your failure modes before you ship. The mobile dead-end could have been caught earlier with the same discipline Nygard applies to production systems: test at the boundaries, not just in the happy path.
↔️ Contrasting
- The Pragmatic Programmer by David Thomas and Andrew Hunt advises finishing what you start and shipping working software incrementally. This PR takes the opposite path — abandon and document rather than ship a half-working feature. That tension is real, and sometimes the right call is to stop.
🔗 Related
- High Performance Browser Networking by Ilya Grigorik is related because understanding why
SpeechRecognitiondies on page hide requires understanding how browsers manage background page lifecycle, process throttling, and media permissions — all topics covered in depth in that book. - Domain Modeling Made Functional by Scott Wlaschin is related because the PureScript implementation used opaque newtypes and typed error ADTs throughout (DocumentPipError, the opaque PipWindow handle with its kind tag) — a direct application of the functional domain-modeling style the book advocates.