Our approach

How engineering
actually gets done
around here.

Most MSP automation lives in Power Automate flows someone built once and everyone's been afraid to touch since. We work the way an engineering team should: stack review first, target architecture before code, two-week build increments with version control and tests, then a real run-state with on-call and SLAs. Boring on purpose.

01 / Stack reviewAudit, baseline, written PoV
02 / ArchitectTarget state, sequencing, SLOs
03 / BuildTwo-week increments, in production
04 / OperateSRE-grade run-state, optional handover
01Stack review

Find where the toil is actually hiding.

What happens in this phase

Two weeks. We sit with your operations lead and your senior tech, walk the stack the way it's actually used — ConnectWise queues, NinjaOne policies, the M365 tenants that drift, the shared credential store everyone forgot about — and write down what we see.

We map the round-trip: where a ticket comes in, every handoff it survives, every system it touches, and which of those handoffs is a Power Automate flow held together with hope. We pull a real sample of P1 and P3 tickets and trace them.

No slide deck. We leave with a written point of view: what's automatable now, what's a baseline problem first, and which 6-week sprint pays back fastest.

Out of this phase

  • D1Stack inventory across PSA, RMM, tenant, security, distribution, documentation
  • D2Toil map — the 10 highest-frequency manual handoffs, ranked by hours/month
  • D3Drift baseline — where current automation has decayed since it was last shipped
  • D4Written PoV — what to build first, what to fix first, what to leave alone
  • D5Recommended first sprint (or "you don't need a sprint, here's a 4-hour fix")
  • D6Risk register — what breaks if the wrong thing ships first
// Read the queue

The ticket queue is the operating model. We start there, not on the integrations roadmap your last vendor left.

// Drift is a defect

If your conditional access policy doesn't match across tenants, that's a bug. We surface drift before we propose a build.

// Toil over tools

We don't recommend tools first. We measure where the human minutes are going, then choose the tool that removes the most.

02Architect

A target you can actually ship to.

What happens in this phase

One week of design. We draw the target state — data flow, system of record per object (ticket, asset, user, agreement), idempotency keys, retry semantics, where the queue lives, what the on-call surface looks like.

We pressure-test against your existing security posture (Entra app registrations, scopes, secret rotation), your compliance reality (SOC2, HIPAA where it touches), and the boring stuff that kills projects in week five: rate limits, webhook delivery guarantees, what happens when ConnectWise is down.

Output is a short architecture doc your senior tech can review and a sequencing plan your ops lead can defend to the team.

Out of this phase

  • D1Target architecture — systems of record, data flow, source of truth per object
  • D2Idempotency + retry contract — how the same event stays safe under retry
  • D3Auth + secret model — Entra app registrations, Graph scopes, rotation cadence
  • D4SLOs — latency, freshness, error budget per integration
  • D5Sequencing plan — what ships in week 2, week 4, week 6, with exit criteria
  • D6Failure-mode register — what breaks, how we know, what the runbook says
// Idempotent by default

Every webhook, every retry, every replay must converge to the same state. If it doesn't, we redesign it before code.

// One source of truth per object

Tickets live in PSA. Assets live in RMM. Agreements live in PSA. We sync, we don't fork. Forking is how you get drift.

// SLOs before features

We agree on freshness, latency and error budget before the first commit. Without that, "done" is a vibe, not a target.

03Build

Working flows in your tenant by week three.

What happens in this phase

Two-week increments. Production-quality from day one — code in your or our git, tests against the real PSA and RMM APIs, structured logs, metrics, alerts wired to a real on-call surface (PagerDuty, Opsgenie, whatever you use).

We pair on the first integration with your senior tech so they own it on day one rather than day ninety. Everything is reviewable, every change is deployable, every flow has a runbook checked into the same repo as the code.

By the end of week 3 of a 6-week sprint, something real is running in your environment under load. Not a demo, not a sandbox. The actual integration.

Out of this phase

  • D1Source code in version control — yours or ours, your choice
  • D2Test suite — contract tests against PSA, RMM, Graph, EDR APIs
  • D3Observability — structured logs, metrics, traces, queue depth dashboards
  • D4CI/CD — every merge deploys, every deploy is reversible
  • D5Runbooks shipped alongside code — same repo, same review
  • D6Pair-engineered with your senior tech so handover is the default state
// Production from commit one

No "we'll harden it later." First merge runs against the real APIs with the real auth and the real observability.

// Runbooks ship with code

The on-call doc lives in the repo. PR doesn't merge without it. If the flow is unrunnable at 3am, it isn't done.

// Strangler-fig over rewrites

We route traffic gradually from the old Power Automate mess to the new one. Cutover happens when error budget says it's safe.

04Operate

Boring is a compliment.

What happens in this phase

Real run-state. We watch the integration with you for the first 30 days — queue depth, error rate, latency, drift, the boring numbers that tell you whether the thing actually works under your load.

When there's a P1, we're on the call. We pair on the first incident with your team so the muscle memory transfers. Change control is real — every config change goes through PR, every secret rotation has a runbook, every dependency upgrade has a deploy window.

When the system is boring enough that you stop looking at it, you have two choices: keep us on retainer for the next thing, or take it home. Either way, we leave the lights on.

Out of this phase

  • D1On-call rotation — Xentek primary, your team shadow, then flip
  • D2SLO dashboard — freshness, error rate, queue depth, on a screen your COO can read
  • D3Incident response — runbooks, post-incident reviews, tracked actions
  • D4Change control — versioned config, secret rotation cadence, deploy windows
  • D5Quarterly drift audit — config, policies, baselines, against the original SLOs
  • D6Handover package — or retainer continuation if you'd rather we kept it
// Pair on the first incident

Your senior tech is on the bridge with us. Muscle memory transfers in real time, not in a Notion handover doc.

// MTTR over MTBF

Things break. We optimise for how fast you find out and how fast it's fixed — not for the fantasy that nothing fails.

// Boring on purpose

The goal of operate is that you stop thinking about it. If you're checking the dashboard daily by month two, we did it wrong.

Engagement model

Three named sprints
and one retainer.

The four-step method runs inside one of these shapes. Sprint 01 is the most common starting point — it's where the toil compounds fastest in a 20–200-person MSP.

6 weeksSprint 01

PSA · RMM Stitch

Two-way sync between your PSA and primary RMM — tickets, assets, time entries, contracts. The integration you've been promised by every vendor.

  • Bi-directional ticket lifecycle
  • Asset + agent sync, idempotent
  • Time entries from RMM events
4 weeksSprint 02

Microsoft Tenant Engineering

M365 and Azure tenant operations at fleet scale — onboarding/offboarding, license rightsizing, conditional access, baseline drift detection.

  • Onboard/offboard flows per tenant
  • Conditional access templates
  • Baseline drift detection across tenants
3 weeksSprint 03

Security Alert Pipeline

EDR and MDR alerts (Huntress, SentinelOne, Defender) routed into PSA with enrichment, dedup, and severity-aware queueing.

  • Alert ingestion + normalisation
  • Host/user enrichment from PSA + RMM
  • Severity routing + on-call paging
OngoingEngagement 04

MSP Engineering Retainer

Senior engineering capacity on demand — for the work that doesn't fit a sprint shape. Custom portals, dashboards, AI ticket triage, the integrations no one else will build.

  • Dedicated engineer hours / month
  • Two-week iteration cadence
  • Pause or scale at month boundary
Outcome-tied pricing on first engagement.
A real document

What a Sprint 01 SOW
actually looks like.

Not a slide deck. Not “a few weeks of integration.” A short SOW with line items, sequencing, and exit criteria you can hold us to.

Sprint 01 · sample

PSA · RMM Stitch — ConnectWise + NinjaOne

Six weeks. Five line items. Each ships behind the previous one's exit criteria. No “phase 2” that never lands.

  • Line 01
    Bi-directional ticket sync. ConnectWise ↔ NinjaOne. Status, owner, notes, attachments.Exit: 50 round-trips, 0 drift, 0 duplicate tickets in production sample.
  • Line 02
    Asset + agent reconciliation. Single source of truth on the agent record; PSA reflects within 60s.Exit: 99.5% freshness over a 7-day measurement window.
  • Line 03
    Automatic time entry from RMM events. Remote sessions and policy actions land as PSA time entries against the right ticket.Exit: spot audit of 100 sessions, <2% manual correction rate.
  • Line 04
    Observability + on-call. Queue depth, error rate, lag, alert wiring to your existing on-call surface.Exit: alerting tested via injected failure, paged within 60s.
  • Line 05
    Runbooks + handover. Operator docs in repo, two pair-debugging sessions with your senior tech, sign-off review.Exit: your tech runs a recovery scenario unassisted.
Engineering principles

The rules we won't
negotiate on.

These show up in every sprint we run. If a constraint forces us to break one, that's a conversation — not a quiet exception.

// Idempotent by default

The same event applied twice yields the same state.

Every webhook handler, every retry, every replay. If the contract isn't idempotent, we redesign the contract.

// Drift is a defect

Config, policy, baseline — drift gets a bug, not a shrug.

If your conditional access drifts across tenants, that's logged, owned, and fixed like any other defect.

// Runbooks ship with code

If it's unrunnable at 3am, it isn't done.

Operator docs live in the same repo, reviewed in the same PR. Build and operate are one workflow.

// Observability before optimization

You can't tune what you can't see.

Structured logs, metrics, traces and queue dashboards land before performance work. No premature speedups.

// Strangler-fig over rewrites

Route traffic gradually from old to new.

Cutover when error budget says it's safe. Big-bang rewrites of someone's Power Automate mess fail every time.

// Pair on the first incident

Handover happens on the bridge.

Your senior tech is on the call with us when it first breaks in production. Muscle memory transfers in real time.

Two weeks. One written
point of view on your stack.