Operationalizing Full-Census Data to Optimize Quality of Experience


Now that every business is becoming a digital business, customers have much higher expectations.

Customers expect to get what they want, when they want it, at the lowest price. They expect the freedom to choose between thousands of products and configurations, while at the same time receiving suggestions and content intuitively tailored to their needs. And if a problem arises, they have the agency to quickly take their business elsewhere.

Market leading digital businesses like Amazon and Netflix made dramatic market share gains by moving faster, and orienting each new software feature and deployment around achieving customer outcomes at scale.

Unfortunately, most enterprises lack the insight to even get a handle on their existing technology stack to meet customer needs. They can try to correlate every metric and log emanating from their application estate in real time to make incremental performance improvements, but they’ll still be missing the most important part: how to find causation between so much rich telemetry data and the key indicators of customer experience.

This new customer-centric reality sets the stage for digital transformation, where enterprises must align every aspect of their business operations around technology and data, to focus on efficiently meeting customer needs.

Observability is Disconnected From Customer Experience

As my colleague Jason Bloomberg points out in the previous article in this series, most observability platforms are not customer-centric. Instead of focusing on things like conversions and repeat business, they take a system-centric approach to data—gathering the ‘golden signals’ of latency, traffic, errors, and saturation—by looking at server logs, network packets, cloud usage metrics and the like.

This ‘inward-out’ approach to monitoring and analytics is useful for benchmarking the back-end impact of increased usage, or the network impact of regional cloud outages, or unanticipated changes to applications, or lost connections to downstream data services.

DevOps teams could even run programs that replay or synthetically simulate traffic, whether captured as browser request logs or RUM (real user monitoring), in order to test the scalability of clusters, database robustness, and API response times. All of these data are staples of optimizing IT operations, but they are tied to system events rather than specific customer experience.

Digital transformation requires an ‘outward-in’ approach that regroups observability data toward optimizing customer experience at every significant touchpoint, all the way down to small groups and individual users. Client-side data is captured at the diffuse endpoints customers are using: whether on a smartphone, digital TV, kiosk, home sensory network, or a field technician’s tablet.

“[Customer] experience occurs in fine-grained user cohorts. Checkout may be slow only for Android operating system users running version 2.4 of a retail store’s app. There might be hundreds or thousands of users in an impacted cohort within a company’s user base.

“Finding these cohorts is a computational and AI problem that requires analysis across thousands of dimensions, such as all combinations of device models, app versions, geographical locations and other factors.”
– Aditya Ganjam, CPO & co-Founder, Conviva, in Forbes

The Need for Full-Census Comprehensive Telemetry Data

If system-centric observability demands full-stack telemetry data, then customer-centric observability demands full-census, comprehensive telemetry data.

Full-census telemetry ingests real-time event data from every user endpoint, from web browsers to Android and iPhone devices of every supported OS and hardware variety. Then, to be comprehensive, every action, every click, every switch between apps, developer tags in front-end code, and all of the relevant performance data of responses to each of those actions should be collected.

In many cases, ingesting such comprehensive end-to-end telemetry will not only drive up cloud data costs, the load could even bog down the responsiveness of a customer’s UI if client-side agents are churning away.

To make things even harder, merely sampling incoming data to save costs or capacity could not possibly provide full-census, comprehensive telemetry. The critical moments that impact customer experience happen in the blink of an eye, irrespective of the intervals between samples. Still, that’s too much data for any human to sort through.

Transforming Telemetry into Actionable Insights

Once full-census client data is in hand, it’s time to start cleansing it. Similar to the data analytics world, this means metatagging each event for attributes and relevance beyond its unique ID, source, and time stamp, then filtering it so useful insights can be gleaned.

Then this data can be further correlated with system level data, whether derived from incoming client-side data or from any number of leading observability and SIEM platforms, or their related historical data lakes.

The Conviva Operational Data Platform contains several capabilities for ingesting, mapping data with metadata, and then filtering and aligning metrics to quality of experience for the enterprise.

For instance, a developer using a mobile toolkit from Adobe may insert that platform’s instrumentation code on every client that it is deployed, which may generate similar ‘pings’ or flapping alerts everywhere—ok for checking status, but not always relevant to a developer’s debugging process.

If a set of app users experiences a crash, the developer would want to match a missing status alert to specific devices with the crash event, rather than dig into all of the client instances that Adobe alerting code was deployed to.

Once we’ve reduced the signal to noise ratio, there’s one critical question we need to ask of our data: “What was the customer trying to do at the time when an issue occurred?”

Mapping State Data to Critical User Flows

Critical user flows match the highest-value customer journeys enacted through an application. These dimensions are largely unique to each enterprise and its vertical industry. For a pharmacy app, it could be a prescription refill, or for a media provider, it could be the viewing of a purchased movie.

These critical user flows may happen within a React app UI a developer built for a device, but behind the scenes of each session, each step must maintain statefulness across several domains, from on-device functions such as camera inputs and storage, to third-party APIs, transaction providers, and data sources.

Conviva offers an AI-assisted approach to this problem, by offering real-time computation of stateful experience metrics from multiple data sources within a user flow, which are then semantically mapped to customer outcomes and preferences, which can be drawn from an extensive library of horizontal and vertical user flow templates, or customized for the needs of specific cohorts of customers.

With this in hand, you can do new things, like breaking down a performance budget for specific high value user actions, such as a shopping cart checkout.

Let’s say that you have a target of 2.5 seconds or less for a checkout payment interface to load, because current QoE indicators show that certain customers are abandoning carts if they have to wait 3 seconds or longer, and a specific cohort of Android 14.0 users are waiting much longer and occasionally timing out.

Looking at the performance data, developers are able to stratify the number of milliseconds each process within that checkout user flow takes. The API call to retrieve shipping info takes 1 second, but that third party service isn’t under the developer’s control. A price calculator takes 250ms, which is also under budget.

But for this Android 14.0 cohort, a cookie write has been disabled—causing transactions to hang for more than 2 seconds as they lose local state data. Time for developers to modernize to a modern session token method or secrets management!

Going forward, development and operations teams can operationalize full-census user data to bring new insights to the table when making critical decisions about which applications need improvement, and what new features to introduce based on their impact on the quality of experience of individual cohorts of users.

The Intellyx Take

“The customer is always right. If it isn’t right, we’ll make it right.”

Such maxims were often quoted by business leaders over the last 100-or-so years to signify how committed they were to customer service. Historically, if employees took these mission statements to heart and provided better service and quality products, their organizations would naturally benefit from more loyal customers.

But stating platitudes about customer focus or simply being data-driven won’t create digital transformation.

Organizations that wait until customers are complaining to start gathering and filtering full-census and comprehensive client-side telemetry data into actionable and stateful quality of experience indicators are already losing customers to churn, and failing to convert some portion of prospective buyers due to a lack of insight.

Fortunately, if you are still in business, it’s not too late to change the way you think about data in a customer-centric light. Even if you are already doing all the right things with data from a conventional observability and system availability point of view, just imagine the upside of being able to pinpoint what matters most to customers.

Republished from Intellyx