Shipping Reports Without Shipping Software: Building a Data-Driven Reporting Platform

The image shows a man sitting at a desk with several items in front of him. He is wearing a dark suit and tie, suggesting a formal work setting. The man appears to be working on a computer or tablet that is placed on the desk in front of him. There's a notebook, a cup that seems to contain coffee, and various papers scattered around. In the background, there's a potted plant adding a touch of greenery to the space. His right hand is hovering over the device he's working on, indicating an active work session. The man is focusing on something not visible in the frame, possibly the screen of the device. The overall scene conveys a common office environment.

The Brief: Reporting Where Logic Ships Without Deployments

In October 2025, Bossie Hurley engaged me to design and build, from the ground up, a cloud-native reporting platform for one of their global enterprise SaaS clients. The system generates scheduled PDF reports of monitoring and alert data, delivered to the client’s own customers on daily and weekly cadences.

Bossie Hurley describe the shape of their client engagements this way:

Each phase of your project is supported, from brainstorming and architecture design to seamless implementation, efficient operations, and ongoing maintenance.

What follows covers the design-through-build portion of that arc.

The brief came with an unusual constraint. A reporting system built the conventional way would put every business change behind an engineering deploy: a new metric, a tweaked filter, a reshaped grouping, each one another ticket, another release, another window for the change to drift away from the request that prompted it. Bossie Hurley’s brief explicitly ruled that out. The system had to let new report types and changes to existing ones move at the speed of configuration, not deployment.

That constraint shaped almost every architectural decision that followed. It moved business logic out of compiled code and into structured configuration. It forced a stricter separation between orchestration, rendering, and storage than I would otherwise have designed. And it pushed the testing strategy hard, because once logic lives as data, the compiler can no longer catch what used to be a type error.

What follows is how I built it, the decisions that proved worth making, and where I would tell another team to start if they were trying to do the same thing.

Configuration Over Code: A Data-Driven Aggregation Pipeline

The central architectural idea is that a report’s definition does not live in code. Every part of it (the filters, the groupings, the enrichments from external systems, the calculated metrics, even the template it renders through) lives as JSON in a central Cosmos DB store. The reporting service fetches the relevant definition at runtime and evaluates it through an expression engine, building each report from configuration rather than from any class hierarchy compiled into the service itself.

A new report type starts its life as a JSON document submitted to the configuration store. The reporting service picks it up on the next scheduled run. There is no code change, no pull request, no deployment window. The cadence of business change has been decoupled from the cadence of software release, which for any system that mostly does business arithmetic on top of stored data is exactly where the value lies.

The expression engine that does this evaluation is NCalc, an established mathematical and logical expression evaluator for .NET. Every aggregation stage and every calculated metric is expressed as an NCalc expression in the configuration. Square-bracketed references like [Data.IsFail] or Sum([Data.FailureCount]) evaluate against the current pipeline item, and the system supports the operators, conditionals, and aggregate functions you would expect.

ℹ️ Note

NCalc is a small, focused library that takes a string like [Data.IsFail] == true && [Data.Severity] == 'Critical' and returns a boolean. It has been around for years, has no runtime cost beyond evaluation, and pairs cleanly with JSON because expressions are themselves strings. The alternative would have been to design a domain-specific language and write a parser for it. NCalc let us skip that step and ship behaviour that business stakeholders can reason about without learning a new syntax.

This pattern (configuration that drives behaviour, evaluated declaratively at runtime) is not new. Feature flagging applies the same instinct to a single boolean per behaviour. What is different here is the scope: not a flag, but the whole behaviour. Not “turn this on or off,” but “here is the entire report, expressed as data.”

A Vocabulary of Typed, Composable Stages

A report’s pipeline is a chain of typed aggregator stages. Each stage takes the output of the previous one, applies a specific transformation, and hands the result on. The vocabulary the platform offers includes:

  • Filter removes items whose boolean expression evaluates to false.
  • GroupBy groups items by a key expression, optionally sorts within each group, limits the number per group, and computes group-level metrics.
  • GroupByWithNestedItems does the same grouping work, but preserves the constituent items inside each group so that templates can iterate over them.
  • FlattenArray explodes an item containing an array property into one item per element, the configuration-driven equivalent of SQL’s UNNEST or pandas’ explode.
  • Calculate adds computed metrics to every item without filtering or grouping.
  • Transform modifies field values via expression, producing a new immutable item rather than mutating in place.
  • Sort orders items globally by an expression.
  • Limit truncates output to the first N items.
  • ReplaceIds is an asynchronous stage that reaches into an external service to substitute opaque identifiers with human-readable names.

Because the stages are typed and composable, the same vocabulary serves wildly different reports across the client’s monitoring estate. The same Filter and GroupBy building blocks produce a per-host failure breakdown for database health, a most-recent-run summary for scheduled jobs, and an exploded view of error messages from an event log. The reports look nothing like each other; the pipeline primitives are identical.

Here is a small worked example. The configuration below describes a report that filters to high-value orders, groups them by region, and sorts the regions by total revenue. None of the field names appear anywhere in the real client’s data; this is purely illustrative.

[
  {
    "Type": "Filter",
    "Name": "highValueOnly",
    "FilterExpression": "[Data.OrderTotal] > 500"
  },
  {
    "Type": "GroupBy",
    "Name": "byRegion",
    "GroupByExpression": "[Data.Region]",
    "Metrics": [
      { "Name": "regionName",   "Expression": "[Data.Region]" },
      { "Name": "orderCount",   "Expression": "Count()" },
      { "Name": "totalRevenue", "Expression": "Sum([Data.OrderTotal])" }
    ]
  },
  {
    "Type": "Sort",
    "Name": "byRevenue",
    "SortByExpression": "[Metrics.totalRevenue]",
    "SortDirection": "Descending"
  }
]

Adding a new region-by-region report is, in operational terms, three JSON objects in a configuration document. Reshaping the threshold from 500 to 1,000 is a single integer change. Neither requires a compile, a deploy, or even a service restart.

ℹ️ Note

Hardcoded values can also be replaced with per-client threshold references resolved at runtime. Swapping "[Data.OrderTotal] > 500" for "[Data.OrderTotal] > [TH:HighValueOrderLimit]" tells the system to look up HighValueOrderLimit in a client-specific Thresholds table and substitute its value before evaluating the expression.

A useful side benefit fell out of this design later: the leading filter stages of each pipeline can be translated automatically into server-side Cosmos DB WHERE clauses, so the database returns only the rows the pipeline will keep. The reporting service does less work, the database returns less data, and the round trip is shorter. The optimisation needed no per-report configuration; it falls out of the stage type itself.

Templates and PDF Rendering: Scriban Plus Playwright

Templates live in configuration too. The same central data store that holds report definitions also holds the HTML templates that render them. Editing a template is the same operation as editing a report: a configuration change, picked up at the next run.

The template engine is Scriban, chosen over Razor for two reasons. The first is dependency weight. Razor would have meant taking a full ASP.NET Core dependency on what needed to stay a light, fast service. That mattered doubly because the first cut of the system was Azure Functions, where cold-start time and package size both bite hard. Scriban is around 200KB, designed from the start for loading templates dynamically from strings, and pairs naturally with a configuration store because templates are themselves strings. The second reason is licensing. Scriban ships under BSD-2-Clause, with no copyleft and no runtime fees.

PDF generation is handled by Playwright, chosen over IronPDF and Gotenberg. The fidelity argument is the obvious one: Playwright renders modern HTML and CSS the way a real browser does, because it is a real browser. Lighter PDF engines often handle the easy cases and then quietly fail on the things designers actually use, like flexbox or grid or modern font features. The licensing argument runs deeper for enterprise customers: Playwright’s core is Apache 2.0, its .NET bindings are MIT, and Microsoft publishes official Docker images with Chromium baked in. For a SaaS being delivered to enterprises that run their own supply-chain reviews, having no copyleft anywhere in the stack is a deliberate design choice.

ℹ️ Note

Every library in the reporting platform’s stack ships under a permissive licence: BSD-2-Clause for Scriban, MIT and Apache 2.0 for Playwright, MIT for NCalc, and the same pattern across every other dependency I added. Total licensing cost is zero, and there is no copyleft anywhere in the system to complicate the supply-chain conversation with the end client’s procurement function.

Two Containers, One System

The deployed system runs as two services on Azure Container Apps. The Web API handles incoming report requests: it fetches the relevant configuration, executes the aggregation pipeline, renders the HTML template, calls the PDF service, and writes the returned PDF to Azure Blob Storage. The PDF service does nothing but accept rendered HTML and return a PDF, with a singleton headless Chromium browser instance kept warm inside it. The two services scale independently. Most of the load is on the orchestration side; the PDF service spends most of its time idle but needs to spin up fast when a batch lands.

The split was not the original design. The October 2025 plan had us running on Azure Functions for orchestration, with the PDF service as a single Container App. That hybrid made sense when the surface area looked small. Then the aggregator pipeline arrived and added real complexity to the orchestration layer, the Web API started doing materially more than the original brief, and the Functions budget stopped being a fit. We migrated orchestration into its own Container App, paired with the existing PDF container, and the two have scaled independently with demand ever since.

The rest of the production-readiness story is the unremarkable kind. Cross-service calls go through Polly, configured with retry and circuit-breaker policies. Correlation identifiers thread through the entire request lifecycle, so a failure in the PDF service can be traced back to the report request that triggered it. Errors surface as RFC 7807 problem responses, not stack traces. Liveness and readiness probes were in both services from the first commit. PDFs land in Azure Blob Storage via managed identity; there are no static secrets in the runtime.

How a Configuration-Driven System Stays Trustworthy

A configuration-driven system has a sharp downside that nobody mentions in the design phase: when business logic moves out of code, the compiler stops catching the things the compiler used to catch. A typo in a JSON expression does not fail at build time. A pipeline that drops every item silently looks healthy until the resulting PDF is empty. A new aggregator type that misbehaves on edge-case data degrades quietly over weeks before anyone notices.

The test suite has to be the safety net the compiler is no longer providing. So I drove the build test-first throughout, across four suite types, kept above the agreed coverage target on every commit:

  • Unit tests cover individual aggregators, expression evaluation, template rendering, and the dozens of small helper functions that hold the system together.
  • Integration tests exercise the aggregator pipeline end to end against real Azure Blob Storage, emulated locally with Azurite. They run on every commit and catch the wide class of bugs that unit tests miss: ones that only show up when components actually talk to each other.
  • Contract tests verify the boundary between the Web API and the PDF service. Either service can ship a breaking change in isolation and the contract suite catches it before anything reaches production.
  • Regression tests lock in fixes for past incidents. When a bug gets fixed, it gets a regression test, so the same bug cannot return unannounced.

Both services run together under Docker Compose for local end-to-end work, with Azurite emulating Blob Storage and WireMock.Net standing in for external HTTP dependencies. A developer working on a new aggregator runs the full stack on their laptop, including a real Chromium-driven PDF render, in seconds.

❓ Why does a configuration-driven system need more test investment, not less?

A traditional system has compile-time checks pulling in the same direction as its test suite. Both are catching mistakes; both are cheap to maintain. A configuration-driven system loses the compile-time half of that pair. Every assertion the compiler used to make for free now has to be made by a test. The test suite is not an additional safety net; it is the only safety net.

What Shipped: V1

V1 went live in May 2026, roughly seven months from the initial engagement. It serves multiple client tenants receiving daily reports, plus weekly variants for the same population, with the same Web API and the same PDF service handling both flows.

The original October 2025 design budgeted for around $0.60 per year of infrastructure spend, on the basis that scale-to-zero on the PDF container and Consumption-plan pricing on the orchestration side would push the running cost as close to zero as Azure pricing meaningfully allows. V1 came in within that envelope. For a SaaS feature being delivered into an enterprise context, the cost-to-serve sits at the bottom of the noise floor of any sensible accounting line.

The operational payoff of the data-driven design is harder to put a single number on, but the shape of it is straightforward. New report types reach production without going through a code review or a release. Edits to existing reports are configuration deltas reviewed in the same workflow as any other content change. The engineering team is no longer the bottleneck on every business question that needs a report, which was the whole point.

What I would call out as the real test of the architecture is the rate at which the team has been extending it since shipping. Composability has held up under the kind of pressure that exposes brittle abstractions: new aggregator types added without breaking existing ones, new report templates built on top of the existing stage vocabulary, new external enrichments slotted into pipelines without disturbing the rest. The system is doing what it was designed to do.

Why ‘Configuration Over Code’ Matters

Reporting systems share a particular kind of pain. The logic they encode is high-volatility (filters change, groupings change, calculations change, often in response to a single stakeholder request) while the code they live inside is low-volatility (release cycles measured in days, code review queues, compliance windows). Marrying the two means every business change incurs an engineering cost completely out of proportion to the actual work involved. A request for “the same report, but only for failed jobs” should not take a week.

The same mismatch shows up in pricing engines, eligibility rules, scoring systems, content gating, almost anywhere a business stakeholder owns the logic and an engineering team owns the deployment. Configuration-driven architectures decouple the timescales of the two. They are not free; they trade compile-time safety for runtime safety, and they push more of the design budget into the test suite to compensate. But when business-logic volatility is the dominant force in the system, that trade is straightforwardly worth making.

If you are building a system where the same engineering team keeps shipping the same kind of business-logic change month after month, the question worth asking is whether the logic belongs inside the code at all.


If your team is wrestling with a reporting system where every business change requires an engineering deploy, or you are considering a configuration-driven approach to a similar problem, please visit our services page to learn how we can help, or schedule a consultation to discuss your specific situation.