Self-Host a Teams Alternative with Private AI: UK Build Guide

A few months ago I wrote an internal proposal to move off Microsoft Teams. The trigger was Copilot — specifically the fact that Microsoft’s published EU Data Boundary commitments explicitly exclude the Anthropic models surfaced through Copilot, which means client meeting content can be processed outside the EU when AI features are enabled. That’s a Data Protection Impact Assessment problem on its own, never mind the broader principle that an AI you don’t control shouldn’t be reading your clients’ meetings. I never shipped that proposal. But the research it produced is the cleanest answer I’ve come across for any UK team that wants modern meeting tooling without surrendering data sovereignty to a US hyperscaler.

The case for self-hosting, in one breath

Three things have shifted in the last 18 months. First, Microsoft has accelerated Copilot’s reach across Teams — transcription, summarisation, action items, suggested replies — backed by models that process meeting audio and text in cloud regions you don’t fully control. Second, the EU has tightened expectations around what counts as “EU data residency” for AI processing, and Microsoft’s own documentation is now explicit about which models are excluded from the Boundary. Third, the open-source AI stack has caught up. Real-time transcription via Whisper, meeting summaries via local 7–8B-class LLMs, and self-hosted speech-to-text routing through Skynet are all production-grade in 2026.

Put together, the trade-off has flipped. A year or two ago, self-hosting meant giving up the AI features people now expect. Today you can run the same features on your own boxes, and the only thing you give up is the convenience of someone else’s bill.

The stack at a glance

Every block of this is open source, MIT or Apache-licensed, and runs in Docker. None of it requires a third-party account or a phone-home.

Component	Project	What it does
Video conferencing	Jitsi Meet	The user-facing meeting app — web-first, no install needed, native mobile apps available
Media server	Jitsi Videobridge (JVB)	The WebRTC SFU that routes media between participants
Recording	Jibri	Records meetings to MP4 (also drives live-streaming)
Transcription gateway	Jigasi	Dials into a meeting as a SIP participant and pipes audio to your STT
AI services	Skynet	Jitsi’s AI services framework — wraps Whisper, the LLM, and translation behind one API
Speech-to-text	Whisper (or Vosk for low-resource)	Local, GPU-friendly transcription
LLM	Ollama + any open-weights model	Local summarisation and action-item extraction
Translation	LibreTranslate	Self-hosted multilingual support
Orchestration	Docker Compose	Deploys the lot from a single `docker compose up -d`

Skynet is the piece that turns this from a video conferencing tool into a genuine Teams replacement. It exposes the AI services to Jitsi the same way Microsoft exposes Copilot to Teams — the user experience inside the meeting is “click record, get a transcript and summary in the dashboard afterwards”. The difference is the model behind it is yours.

Three paths for the AI tier

This is where the build decision lives. Jitsi itself runs comfortably on a modest VPS. The AI services are what need horsepower, and how much depends on which path you take.

Path 1: CPU-only baseline

You can run all of this without a GPU. Whisper’s smaller models (base, small) transcribe acceptably on 8–16 CPU cores, and 7–8B-class LLMs run via Ollama with CPU inference, just slowly. Real-time transcription is off the table — you’re looking at post-meeting batch processing instead — but for plenty of UK SMB use cases that’s perfectly fine.

Realistic spec: 16 GB RAM, 8 cores, 100 GB NVMe. Roughly £35–60/mo on a UK VPS. Hostinger’s KVM 2 or KVM 4 fit this comfortably — KVM 4 is what Hostinger’s own plan card lists as the entry point for self-hosted LLM workloads. Hetzner, OVH, and Mythic Beasts UK all offer comparable tiers.

What you give up: live captions, real-time AI suggestions during the call. What you keep: full meeting recording, accurate post-meeting transcripts (overnight is fine), AI-generated summaries, action items — all self-hosted.

Path 2: Rent a GPU box

The fastest path to real-time Whisper Large is to rent an entry-level GPU instance. Hetzner’s GEX series, OVH’s RISE GPU line, and a handful of UK-based GPU specialists all offer entry boxes for £100–300/mo with NVIDIA T4-class or RTX 4000-class GPUs and enough VRAM (12–16 GB) for both Whisper Large and a quantised 8B LLM at the same time.

The win is operational: it’s somebody else’s hardware, it’s in a UK or EU data centre by your pick, and you can spin it down for the months when AI features are quiet. The catch is the run rate — at £150/mo over three years you’ve spent £5,400, which is the price of a serious DIY rig several times over.

Pick this path if you want AI features running this month with no procurement, and you’d rather a predictable monthly bill than a capex spike.

Path 3: Build your own rig

If you have wall space, a corner of an office, or a quiet cupboard with airflow, the cheapest long-term option is to build the box yourself. The canonical worked example here is Ingo Eichhorst’s wall-mounted ML rig — €628 total, used Tesla P100 (16 GB VRAM), 18-core Xeon, 128 GB DDR3, custom aluminium wall mount instead of a case, ~500 W draw.

The trick is the used enterprise GPU market. P100s, P40s, used 3090s, and used A4000s show up regularly on eBay UK and on European used-server marketplaces. Translated to GBP, expect £500–1,500 one-off for a working rig that does what an entry rented GPU box does — and keeps doing it for the next three years without a monthly invoice.

What you trade away: cable management is, in the wall-mounted form, a project. You’re responsible for thermals (the room becomes the heat sink), noise (server fans are loud — water cooling helps), and electricity (around £15/mo at a 500 W average UK domestic tariff). Used GPUs come with no warranty.

Honest take: this path is the right one if you’ll keep using AI meeting features for 18+ months and you’ve got a non-living-room location for the box. Otherwise rent.

Picking models without pinning versions

Whichever GPU path you take, model choice is the second-most important call. I’m deliberately not naming specific versions here because the open-weights landscape moves every few months. The categories that matter:

Speech-to-text: the current Whisper Large family if you have the VRAM, Whisper Medium if you’re tight on either VRAM or CPU. Vosk is the lighter-weight fallback if you need something that runs on a Raspberry Pi-class device.
LLM for summarisation: any current 7–8B-class open-weights model — pick from the Llama, Mistral, Qwen, or Gemma families depending on which licence terms you need and which benchmarks you trust this quarter. The 4-bit quantised variants run comfortably on 8 GB VRAM.
Translation: LibreTranslate’s bundled models are good enough for most European languages. For high-stakes legal or medical translation, do not lean on any open-weights translator without human review.

Stay flexible. Skynet is configured to call whatever model is running behind a standard OpenAI-compatible API endpoint, which means swapping in next year’s better model is a config change, not a re-architecture.

What deployment actually looks like

I’m not going to walk through docker-compose.yml line by line here — that’s a future post once I’ve shipped a PoC. The shape is:

One Docker Compose file containing Jitsi Meet, JVB, Jicofo, Prosody (XMPP), Jigasi, Skynet, and your model runner (Ollama or vLLM)
Caddy or Traefik in front to terminate TLS with Let’s Encrypt
A reverse-proxy rule directing /skynet/* to the AI services container and everything else to Jitsi
A small .env file with the JWT secret Skynet uses to authenticate Jitsi → AI requests
A separate volume for recording storage; assume around 1 GB per hour for HD video

Jitsi’s official Docker examples cover the meeting half. Skynet’s own documentation covers the AI half. The gap to close is the integration layer between them and the model-runner setup — which is where the PoC time goes.

Costs in plain English

All figures GBP, UK pricing, ex-VAT, verified June 2026. These are illustrative; your usage pattern will move them.

Path	One-off	Monthly	Best for
CPU-only	£0	£35–60	Teams up to ~10, batch transcripts overnight, low video volume
Rented GPU	£0	£100–300	Teams up to ~30, real-time captions, predictable opex, no on-prem space
DIY rig	£500–1,500	~£15 (electricity)	Long-term build, regular AI use, somewhere to put the box

Against Microsoft 365 with the Copilot add-on — currently roughly £30–70/user/month depending on tier and Copilot pricing on the day — the self-hosted stack pays itself back inside a year for any team of 10+ and inside about three months at 30+. The catch isn’t the money; it’s the time you’ll spend running it.

When NOT to do this

I’m allergic to “self-host everything” zealotry. Here’s where I’d stay on Teams (or Google Meet, or Zoom):

/ pros

You handle regulated client data (legal, healthcare, financial) and need a defensible answer for "where does meeting content go?"
You run frequent multi-party meetings and want AI features without a per-seat licence
You have at least one Linux-comfortable person who can babysit Docker for 2–4 hours a month
You're past 10 seats — the maths starts winning hard

/ cons

Your team is small (fewer than 5 people) and meetings are infrequent — the maintenance burden swamps the savings
Nobody on your team is comfortable with Docker, Linux, or cert management
You've no GDPR exposure and your clients don't care where their data goes
You depend on Microsoft 365 deeply for everything else and the friction of leaving Teams cancels the benefit

The last one is real. If your whole workflow runs on SharePoint, Outlook, and Teams chat, ripping out only the video piece is more friction than it’s worth. The self-hosted path makes sense when you’re already partway out of the Microsoft tent, or when the data-control argument is load-bearing for client work.

What I’d verify in the PoC

Since I haven’t deployed this yet, here’s the honest punch list I’d work through before recommending it to anyone irreversibly:

Call quality at concurrency. Five participants is easy; thirty on the same JVB is the test.
Real-time Whisper latency on rented vs DIY GPU. Anything over ~3 seconds end-to-end starts feeling laggy in live captions.
Transcript storage workflow. Where transcripts live, who can read them, and how they get into the existing knowledge base.
Failover behaviour. What happens when the AI tier is down. Meetings need to keep working with degraded features, not break entirely.
The Microsoft compatibility tail. Calendar invites, SSO, dial-in numbers — every Teams workflow your team relies on needs a replacement or a deliberate “we don’t do that any more”.

I’ll write that follow-up when I’ve actually shipped one of these stacks. If you’ve already done the build and want to compare notes, I’d genuinely love to hear from you — the case for self-hosting only gets stronger with shared receipts.