The more you understand your customers, the greater the chances are that you will offer them value that they truly appreciate. This means that it's not enough to simply conduct surveys and hand out questionnaires to know what your customers are thinking and what they want; It's essential to study their behaviors and extrapolate that data into actionable information. It's been many years since the retail industry has been quietly observing the movements and behaviors of visitors to stores. The most famous example of this is when Sam Walton and his team used learnings from early observations to experiment with store layouts, product placements, and flash sales timings to maximize revenue and customer satisfaction. E-commerce companies do this as well, tracking customers on their portals, conducting A/B tests on website layouts, sales timings, and product recommendations to increase sales and profitability. In technology parlance, I would call these features traceability and observability.
End-to-end Traceability
Any SaaS platform worth its salt must have the capability of logging and observing customer activities and events. This is not only helpful in debugging any issues that might arise but also in providing feedback that helps to improve the platform's reliability and observability.
Traceability is enabled by setting up and reviewing Observability at the right places.
Observability
Observability helps DevOps and developers understand what is happening at different layers of the architecture. At the server-infrastructural level, it helps understand why response times are slow, which servers are not responding, and what should be done as a remedial action. At the service-infrastructural level, it helps to know which functional service (API, Micro-service, etc.) is working, understand performance metrics, and analyze request/response values at various stages.
It is important for any insurance tech platform to provide Observability at all levels by providing tools and tutorials in an efficient manner. By making systems observable, the goal should be that any decision-maker within the technology and operations teams can easily identify the cause and effect in a production system.
Enabling Observability in an organization
Observability must be role-based. Not everyone needs access to all information. Too many unnecessary attributes, usually referred to as Cardinality, leads to incorrect monitoring.
Severity 1 issues such as Outages, Server, or service down are available as alerts to the C-level team, the decision-makers, and the communication team. An escalation matrix is created as a layered mechanism where information is quickly flashed to the most relevant layer.
Severity 2 issues such as performance issues and aborted customer transactions are available via the event logging tools that provide all levels of information via a "tools" approach mentioned below.
Severity 3 and 4 issues like one-off customer issues related to discarded payments and non-functioning OTPs are available in logs that are visible in the logger dashboards, available to all members of the production support team.
Top 5 tools and techniques operated at various layers
API Request and Response Logs: Shows the attributes configured and related performance metrics. This is set up B2B customer-wise isolated from other B2B customers.
Application Layer Logs: This includes application-related logs for the front-end and backend tech stacks. Database Logs are pre-set as well as front-end console logs that include errors, warnings, and time stamps. If the application service is down, the ping service that pings every minute flashes an alert to the required escalation layer.
Application Server Logs: Syslogs and server logs are set to Debug mode or Info mode as per requirements. Fatal errors and particular patterns that are known to cause high severity issues are set up as a quick view in the dashboard and set as alerts.
Business Dashboard: Shows metrics at the highest level in order to provide business insights.
To add to this, cloud CDN, and firewall logs, customer behavior logs, and marketing campaign-based logs add to the depth of what’s possible with observability.
To end, it is important to stress that a good platform is a responsible one that operates on the highest level of Observability. Customer operations and systems are traced end-to-end for quick turnarounds and incident management.
The article is authored by Vidya Sridharan is Co-founder, Riskcovry and Rahul Rao is Strategy Lead, Riskcovry