A Simple Package Update Saved My Months of Debugging

For months I was plagued with a maddening issue. Every time I used the Ollama API through OpenWebUI (OWUI) behind Traefik, the response streams would get interrupted mid-flight, breaking the JSON parsing on the receiving end. It worked fine going direct, but behind my ingress? Broken. Every. Single. Time.

What followed was months of troubleshooting that took me through every layer of my stack, only to be saved by the most mundane fix imaginable.

The Symptom

The problem was specific and consistent: streaming responses from Ollama would start fine, chug along for a bit, and then… nothing. The stream would terminate prematurely, leaving the client hanging, waiting for the rest of the response that would never arrive. It wasn’t a timeout issue - the stream just stopped, mid-chunk, like something was cutting the connection.

Interruption from the future

My setup: Traefik (in my K3s cluster) → OWUI → Ollama. All running on my home lab infrastructure, which I wrote about here and here.

Okay back to the story

The Traefik Rabbit Hole

I started, logically, at the edge. Traefik. I must have read every documentation page on their site related to:

  • Buffer sizes
  • Streaming configurations
  • passHostHeader
  • serversTransport
  • responseForwarding
  • forwardingTimeouts
  • WebSocket configs

I learned more about Traefik than I ever planned to, good stuff but… oh my gawd was it boring. I created custom middlewares, tweaked timeouts at every layer, tried websocket-specific configurations. Nothing.

Then I moved to MTU settings. Maybe the packets were being fragmented? I adjusted, I tuned, I broke things, I fixed them. Still broken.

The OWUI Deep Dive

Okay, so not Traefik. Must be OWUI. I dove into the OWUI documentation, searched issues, found threads. And that’s when I found it - other people reporting the exact same streaming issues. Posts with titles like “Streaming broken behind reverse proxy” and not a single helpful response.

Until… “Oh, never mind, it started working!” or “Fixed it somehow!” or the dreaded “Found the issue, nevermind it wasn’t that.”

No one ever posted what actually fixed it. I began to suspect this was some kind of cosmic conspiracy.

The Kubernetes Nuclear Option

Fine. Not Traefik, not OWUI. Maybe K3s itself? Maybe something weird with the cluster?

I tore down the entire K3s cluster. All four nodes, everything. Rebuilt it from scratch using k3sup, re-applied all my ArgoCD applications, restored the GitOps state. Same issue.

Maybe it’s the OS? I wiped one of the nodes, reinstalled Debian, joined it back to the cluster. Same issue.

At this point I was ready to nuke everything from orbit. “I’m just going to wipe all the nodes and upgrade to Debian Trixie,” I told myself. “Start fresh.”

The Fix That Wasn’t Supposed to Work

Right next to my destroy script was the update_nodes.sh. Hmm “screw it” lets update before it turn around and update the nodes

So I ran update_nodes.sh, it’s just a fancy script to do a bunch of updates on all the nodes…

TLDR:

apt-get update && apt-get upgrade

Rebooted. Fired everything back up.

The streaming responses were perfect. No interruptions. No broken JSON. Just clean, complete streams.

I sat there staring at the terminal for a long moment.

The Irony

Months. Months of troubleshooting. Tweaking Traefik configs at 2 AM (seriously, gitops from my phone). Reading source code for headers that might be wrong. Destroying and rebuilding my entire cluster twice. Contemplating a full OS reinstall.

And it was fixed by a routine apt-get update && apt-get upgrade that I ran because the script was right there.

I still don’t know what package was broken. I didn’t bother checking; by the time I realized it was working, the upgrade had already applied and I was too afraid to look and break the magic.

Lessons

The lesson here isn’t that you should always run package updates (though you should). It’s that sometimes the simplest explanations are right, and sometimes the most mundane fixes are the real ones. I spent weeks assuming it was some complex interaction between Traefik and streaming protocols, when really some package somewhere in my base OS had a bug that got patched in some routine update.

The fix that took me months to find took 30 seconds to apply.