Konstantin Peselev

Totango · Senior Product Manager · 2023 – 2024

Campaigns Feature

Turning tech debt into enterprise revenue

Platform Tech Debt Enterprise Stakeholder Alignment

About Totango

Totango is a CSP (Customer Success Platform) with a primary focus on Enterprise use-cases (complex data hierarchy, vast amounts of data). The main purpose of the platform is to nurture the existing customer’s relationships, ensure the expansion and renewals, and surface churn signals in advance.

Summary

I led a zero-downtime architecture overhaul of the Totango Campaigns feature, resolving critical stability issues and unlocking new capabilities that directly secured a major expansion deal. Accumulating technical debt was causing unpredictable failures, silently eroding customer trust and blocking key enterprise use cases.

My mandate was to justify and execute a massive backend modernization while simultaneously delivering marketable, revenue-generating features to secure and maintain cross-functional buy-in.

  1. I quantified the hidden costs of system instability by cross-referencing support tickets with implementation blockers, successfully translating engineering constraints into a compelling business case for Sales and Tech.
  2. I directed the engineering team through a three-phase decoupling of the legacy monolith into a scalable, event-driven architecture, and directed the zero-downtime rollout.
  3. I leveraged the new infrastructure to scope and launch multi-channel chatbot integrations and advanced dynamic data components, unlocking valuable enterprise use-cases.

The modernized platform successfully processed bursts that far exceeded the projected peak workload within a 15-minute SLA, cleared 33 lingering architecture-related bugs, and directly influenced the closure of two top-tier enterprise deals.

The Context & Problem Space

Totango Campaigns is a feature that allows Customer Success teams to run email campaigns from within the platform they own. Thus, they are not blocked by Sales or Marketing Team. Additionally, they are able to capitalize on Customer Success specific metrics and attributes for more nuanced segmentation and messaging orchestrations.

The feature is adopted by virtually every established customer, and is a source of platform stickiness with a strong reducing impact on Totango churn rate. It’s worth noting, that Campaigns by itself didn’t generate new revenue, nor was it a key feature any enterprise customer subscribed for Totango.

The customers would routinely voice requests to add new capabilities, or improve the existing ones, as they felt Campaigns was incomplete and unreliable. A lot of the issues, as it turned out, came from the architecture limitations, as Campaigns reached its maximum workload.

Since Campaigns rarely surfaced as the reason for lost deals or churn, it was incredibly difficult to justify pulling engineering resources away from high-visibility features to address this hidden technical debt.

Approach

To build a compelling business case, I cross-referenced Support ticket with Implementation blockers to quantify the real costs of our technical debt. By translating Engineering’s architectural concerns into tangible business risks for Sales and CS, I aligned all stakeholders around three critical issues:

IssueManifestationImpact
Outdated architecture and accumulated over 10+ years tech debtFeature became unstable for any campaigns over a certain limit, producing unreliable data with no clear pattern or reasons that triggered itUsers are anxious any time they need to launch a campaign of certain size; they can’t trust the data and reach out for verification; steady flow of bug reports from automations triggered or not triggered unexpectedly
Data silosOnly dynamic data (using {attribute} in the message) from directly targeted object is allowedMultiple powerful enterprise use cases are impossible, reducing the value of communication
Single channel: emailsCampaigns worked only with emails as a channel of message deliveryMulti-channel messaging orchestration is impossible due to architecture constraints

The narrative that helped us to move forward with this initiative was the following:

Campaigns is running on the architecture that was designed over 10 years ago, and it’s evident that workload spikes choke the server, which leads to unreliable results and unexpected behavior of downstream features. The issue is not wide spread yet, but as Totango business grows, the issue will be further exacerbated. The compromised customer outreach engine gradually erodes the trust into Totango as a CS platform. This erosion rate is so subtle, that it’s almost undetectable, however it accumulates.

With the pressing need of building new innovative features that empower CS teams, we propose the new architecture design that will address the workload capacity issue, and will enable two major paths to deliver extra value to enterprise customers: break data silos, and unlock multi-channel integration.

After getting the approval, I came up with the following plan:

  1. Build a new backend process to eliminate the existing bottlenecks and other technical debt aspects
  2. After customers are migrated to the new system, deliver Multi-channel and Improved Dynamic Data features.

Execution

New Architecture

Partnering with engineers, we came up with three phased approach to the architecture overhaul.

Details about the new architecture and Data Diagrams

Old Flow (current)

old flow

The legacy architecture relied entirely on a single centralized service called “Ironman”:

  • Ironman was responsible for everything: validating services and campaigns, fetching users, filtering duplicates, filtering by frequency settings, building email content, and updating campaign attributes and statistics.
  • Because Ironman handles so many tasks, it is difficult to maintain, add new features to, and monitor for the root causes of problems.
  • This monolithic design is not scalable, leading to the workload capacity issues and unexpected downstream behavior.

Phase 1

phase 1

The first phase initiates the decoupling process to address the workload spikes:

  • The primary goal of Phase 1 is to move the user-fetching mechanism out of Ironman.
  • Two new services are introduced to handle this: the Trigger Campaign Service and the Segmentation Trigger Service.
  • The architecture begins transitioning to an event-driven model using Kafka to pass trigger and scheduling events between the old and new services.

Phase 2

phase 2

Phase 2 further alleviates the processing burden on the legacy infrastructure:

  • The mechanism for filtering duplicate users and enforcing frequency settings is moved from Ironman to the new services.
  • The initial scheduling flow now bypasses Ironman entirely, routing directly through the new Trigger and Segmentation services via Kafka before passing the qualified segments back to Ironman for final processing.

Phase 3 (final)

phase 3

The final phase achieves the goal of a fully modernized architecture:

  • Ironman is replaced completely.
  • A newly introduced Email Campaign Service takes over the final assembly and orchestration.
  • The mailing service loop is closed using a Web hooks listener service to capture campaign email statistics, which are then routed to an Account-updater-input for data storage.
  • The service is decoupled from Email channel (Mailing and Web hooks listener services are ready to be integrated to other messaging protocols)

Risk-Mitigated Rollout. After every phase was completed, QA performed robust testing on each step. Customers were migrated in cohorts, and selected on multitudes of factors (historical usage patterns, last 7 days activity, and others).

I took advantage of the engineering team’s location (Israel) by scheduling the migrations for Sundays. This ensured the deployments ran during the platform’s lowest workload day, providing a natural safety net.

Multi-channel Campaigns

This was low-effort high-impact initiative (“low-effort” if you exclude the architecture overhaul):

  • I designed the feature by utilizing the existing well-known UI/UX that helped to reduce engineering lift, decrease time-to-market, and almost eliminated learning curve for this feature.
  • The use-case is ready with some customers ready to be first adopters.

The scope of this initiative could be split into two uneven parts:

  1. Audit the existing code for hidden assumptions “campaigns is and always is emails” and fix it.
  2. Build an integration with a selected provider.

The first integration was Sunshine Conversations (a communication engine acquired by Zendesk) that powers chatbots for some of Totango existing customers.

I partnered with one Enterprise customer to validate the use case, happy path, differences and limitations of Sunshine Conversations compared with Emails.

User flow

Company “SaaSPro”, a Totango subscriber, has Zendesk chat deployed on their website (saaspro.net - I hope the domain is not owned by anyone, this is just an illustration of the point). They send email campaigns, and want to add an additional channel to their messaging orchestration. They want to send messages that will be delivered to their website visitors chatbots.

Admin designs a chatbot campaign in Totango

1. Admin designs a chatbot campaign in Totango the same way they design a traditional email campaign

Content editor for chatbot campaign

2. The content editor is the same WYSIWYG editor used in email campaigns (with some features that are not supported by Sunshine Conversation disabled)

After Chat Campaign is activated, the work within Totango is done. From a recipient perspective, they will get a personal message in the chatbot once they visit the saaspro.net website:

Chatbot recipient view

The top conversation was created automatically by Totango Campaign targeting this specific recipient when he visited the website (this is a demo showroom, not a real customer)

Advanced Dynamic Data

Campaigns allow using dynamic data - company or user attributes such as {First Name} that are replaced with actual data as the message is being processed. Companies are linked to Users in one-to-many way.

The major limitation was the data silos: it’s only possible to use data from Targeted User object, or the Company object that is linked to the Targeted User. With most enterprise customers having multi-level data hierarchy, this created a major obstacle from capitalizing on the data Totango contains.

My research with the engineering team revealed that building true cross-object data access was technically not feasible. Rather than abandoning the feature, I partnered with the Design team and identified a high-ROI trade-off: designing a solution that restricted data retrieval to the user’s immediate hierarchy tree (‘siblings’, ‘parents’, and ‘children’) with predefined object type. This pivot transformed a massive engineering lift into a manageable task, successfully enabling the most critical enterprise use cases within our existing tech constraints.

User flow

After validating the solution with several customers I scoped and directed the team to built the Segment Table Component:

Segment Table editor

User adds the component, the entire table is dynamic data - content will be defined at the moment of creating the message

Segment Table output example

Here is an example of a single email notifying a recipient (Sally) about upcoming Renewals - each line in the table is a separate Company object in Totango (type Project) that satisfies two conditions: belongs to the same branch in the hierarchy tree as targeted user (Sally) and nearing the renewal date

Impact & Outcomes

All initiatives were successfully deployed, and outperformed the goals the team set for them:

500K

Recipients in burst test

Processed within 15-min SLA

71.7%

Architecture bugs closed

As no-longer-reproducible after migration

+33%

ARR from one expansion

Influenced by chatbot capability

Architecture Overhaul. Stage 3 was deployed according to the original timeline. Workload tests passed: several campaigns targeting 500,000 recipients were activated simultaneously, the server correctly increased the bandwidth to process it on time (within 15 minutes requirement). Within first 4 weeks after deployment, all top 10 customers reported to their respective CSMs they see noticeable improvement in stability of the service. 4 major customers sent personalized messages praising they now don’t need to split large campaigns into multiple clones. Support team triaged all bugs that were put on hold and linked to the architecture bottlenecks: 71.7% were closed as they couldn’t be reproduced anymore.

Multi-Channel Campaigns. The main design partner adopted the feature immediately, and launched multiple chatbot campaigns within the first 4 weeks after the deployment. Totango CS team was able to capitalize on the positive sentiment, and secured a major expansion deal (+33% of ARR). Two prospective deals (potentially top 12 and top 9 in ARR) were successfully closed due to demonstrated capability of integrating with their chatbots.

Advanced Dynamic Data. Our top 3 customer adopted the solution one week before it was released for General Availability, and immediately went from Churn Risk to securing renewal, and went into discussions of expansion within three weeks. Within the first four weeks after the release, 16 out of top 20 customers had designed and launched campaigns that used Advanced Dynamic Data. Solution and Implementation team received two personal emails from critical customers, praising the newly released feature, and the use cases it unlocked.

Key Takeaways

The project originally started when I was digging for root causes of some evasive bugs and unexpected behavior.

What stands out here is that while it was not an original item for our roadmap, and required significant undertaking, the company was able to quickly assess the reality and adjust the plans.

I’m reminded again and again that the gap between business and tech stakeholders is not as profound, as it is sometimes portrayed: every stakeholder is aligned in the way that everyone wants the company to succeed, and no one is deliberately promoting recklessness. The misalignment can be eliminated by changing the optics:

Yes, it’s true we couldn’t afford dedicating our top engineers to work on infrastructure alone and having no meaningful new features at the end of the quarter.

It’s also true, that we could leave the system as is in hopes that customers would keep tolerating unstable processing, and there will be no perfect storm leading to a catastrophic failure. Fear-mongering rarely works as a motivator to spend scarce resources.

However, what worked in the end is uncovering and showing real risks, in a quantifiable manner. And getting maximum ROI from the new architecture - not just “do the same better”, but “do the same and something new”.

It’s a major part of PM role to find creative ways to deliver value to the customers.