One thing we’ve noticed with earlier-stage companies vs. the more mature companies we work with is the way in which they talk about integration maintenance.
Whenever we talk to mid-market customers, maintenance always comes up as one of the top challenges that they encountered when building in-house.
In contrast, early-stage startups are almost never concerned about the maintenance costs that come with building integrations in-house - their top priority is always how quickly they can build and deploy an integration.
The reason is simple - these maintenance challenges are hard to anticipate unless you’ve experienced them first hand.
As a result of working with dozens of engineering teams on their native integration implementations, and my own experience as a sr. software engineer that was responsible for maintaining services that relied on third-party APIs, I wanted to share some of the most common challenges I see teams run into.
At the most basic level, maintenance work can be triggered by:
- 3rd party APIs releasing breaking changes
- Updates to integration code when changes occur on your application
- New edge cases from customers' usage
- Issues with event listeners and webhooks
- Issues with authentication expiring
- Supporting customers whenever technical integration issues come up
Let's see how these manifest in practice.
3rd Party API Breaking Changes
This is the maintenance challenge that most teams are aware of when it comes to integrations. As much as we all hope that all the 3rd party APIs we work with are designed to be backward compatible, the reality is that these breaking changes are inevitable, poorly communicated, and relatively frequent.
Major breaking changes
Here are just a few examples of major breaking changes that we’ve dealt with on our customers’ behalf:
- HubSpot sunsets their Contacts scopes due to their release of more granular scopes
- Airtable deprecating the usage of API Key, forcing a transition to OAuth
- Facebook Ads deprecated v14.0 of their Marketing API 2 months after they released v16.0
While these are major changes that may occur, there are also many instances where you suddenly begin to get malformed responses.
Minor breaking changes
Your app may expect the response from the third-party app’s API to have certain fields, but providers can introduce breaking changes. For example, Jira removed the name and key fields and replaced them with a single accountId field for endpoints that return an Issue object. This change would’ve broken your Jira integration if your logic expected accountId in the response.
Lack of communication
If you’re lucky, the 3rd party application providers will provide ample notice before a breaking change is released so you can make the updates before it impacts your users, but we’ve seen first hand times when that has been unreliable. Here are just a few common scenarios:
- The 3rd party API provider doesn’t provide sufficient notice for the breaking change (or any notice at all)
- The notice goes to a developer who is no longer at the company
- The team maintaining the integration didn’t realize the breaking change would affect their integration
As you scale your integration roadmap, the volume of these breaking changes will begin to compound and will derail your roadmap as the changes are always ‘urgent’.
This hits close to home for us because our own integrations engineering team spends about a sprint every month or two dealing with these 3rd party API breaking changes as they come up, so our customers don’t have to worry about maintaining those references.
Breaking Changes in Your Application
We can’t forget that every native integration that’s built consists of two dependencies - one on your application, and one of the 3rd party application that you’re integrated with. As your product evolves and your team ships new features or changes existing ones, the integrations that they interact will also need to be updated.
This occurs most frequently in SaaS startups because they need to ship and iterate much more quickly in order to go from 0-1 and reach product market fit, which often results in changes that will break the existing integrations.
For later stage companies, while the release of breaking changes may not be as frequent, the surface area of the breaking changes are often much more significant as there is a lot of tech debt will have been accumulated as the product evolved.
While you can’t avoid dealing with breaking changes in your application even if you use Paragon, having your integration logic isolated from the rest of your codebase makes it much easier to maintain.
For example, if you were maintaining multiple CRM integrations that require you to sync contacts created in your application, and the way in which contacts are created in your app changes, all you would need to update is where/how the [.inline-code-highlight]paragon.event("Contact Created",payload)[.inline-code-highlight] App Event is triggered. Once you make that single change, it will update all your CRM integrations at once.
Handling Edge Cases
It’s one thing to build out the integration and make a few calls to the third-party app’s API. But in production, your integration has to be compatible with the unique data (and volume of data) that each of your customers bring. So this leads us to the maintenance challenges around API edge cases.
Modern APIs that return large amounts of data usually do so through pagination. There are different pagination types that providers use, such as offset, keyset, and cursor paging and depending on the pagination strategy they use, your application must seamlessly support it.
But things can occasionally go wrong when dealing with pagination. One example is the issue of page synchronization. For environments where data can change rapidly on the third-party app, maintaining synchronization between pages can get difficult - as your application is iterating through paginated data, several changes can occur, leading to data inconsistency or duplication.
While this is a problem we’ve solved for all our API abstractions, a lot of our customers have asked to have pagination support for all API requests. Here’s a sneak peak of our Request Step Pagination feature that’s coming soon:
If your app is sending large volumes of requests to a third-party app’s API, you will inevitably run into rate limits. If incorrectly handled, rate limits will hurt your app’s functionality and reliability by introducing downtime, data loss and inconsistencies, and increased latency.
To effectively handle rate limits, it’s critical for you to have a good understanding of API usage patterns and identify potential spikes and optimize API calls by batching requests or spreading them out. Also, introducing retry strategies, such as an exponential backoff, can help handle rate limits gracefully by lessening the likelihood of your application running into them in the first place.
Large payload sizes
When initially building out an integration and testing locally, data payload sizes from third-party APIs are usually small and requests are sent at a low volume. In production, your application must be configured to function correctly at scale. If not properly handled, large data sizes can cripple your infrastructure, clog up your job queues, and lead to sluggish response times and failed requests.
Triggers/Webhooks Not Firing
Integration developers want to be notified when changes happen on third-party apps and this is oftentimes done through webhooks. For example, suppose your product has a Stripe integration and whenever an invoice is fulfilled, your app needs to ingest the new invoice data in real-time. Setting this up looks simple enough - you create a URL for your app’s endpoint to receive webhooks and provide that to the third-party app. But in practice, maintaining webhooks is challenging as they’re highly prone to failure. Here are just a few examples of issues that we’ve seen (and had to address) occur with webhooks:
- Webhook subscriptions can expire and may need to be refreshed (ie. Google Drive webhooks expire after 7 days of inactivity)
- Webhooks not being fired at all (a silent error that is hard to monitor)
- The third-party app introduces breaking changes to their webhooks
- Increased latency between changes on the third-party app and webhooks firing.
To detect failures and webhooks not firing, you need to have an effective monitoring infrastructure to continually refresh and test that they are firing as expected. We had to learn this the hard way.
In fact, there have been multiple instances where we had to build CRON based triggers to address limitations or issues with the native webhooks that the 3rd party services provide.
We wrote about the challenges with auth in much more detail here if you want a more comprehensive take.
The tl;dr is, working with OAuth for integrations is incredibly challenging, primarily due to all the 3rd party service specific implementations of the so called standard.
Yet having auth work properly is the pre-requisite to any of your integrations functioning properly. From differing refresh policies to forced de-authorization scenarios and race conditions, there are many edge cases with OAuth that you are almost impossible to QA for when you first ship the integration. While we've solved auth for pretty much any OAuth/API Key based integration through our authentication layer, every once in a while we still encounter new edge cases that arise from the thousands of users that use the integrations built by our 100+ customers.
Debugging Customer Issues
A final piece that doesn’t get talked about enough is the amount of time support takes up. Given the two-sided dependency with integrations, errors are twice as likely to come up. The worst part, often times the 3rd party services don’t provide clear error messages.
That makes debugging near impossible for non-technical support teams - as a result, everything gets escalated to your engineering team, even if the fix is as simple as telling a user to re-authenticate.
Unless you’re able to build a robust monitoring/support dashboard for your customer support team to see every user’s integration activity and errors, things will keep getting escalated to your engineering team.
Additionally, without a monitoring/user-management infrastructure that allows you to reliably and safely modify customers’ integration states when supporting/debugging customer issues, your engineers may have to write one-off scripts that introduce risk.
To avoid this, make sure your support and engineering teams have tools that enable them to easily identify the root cause of errors, and manage customers’ integrations on their behalf.
In our customers’ case, they usually assign their support team a Support role, which allows them to go into the Connected Users Dashboard and look into every execution for any customer.
This was just a quick brain dump of some of the common issues that we’ve had to deal with when building Paragon (to save our customers from having to encounter them). Maintenance is a never ending challenge when it comes to integrations, and what’s frustrating for many engineering teams is that a lot of these errors are hard to anticipate until you ship to prod and customers start using them. Not only that, once you gain adoption for the integration, customers will start ask you to support additional use cases, which compounds the number of integration features your team needs to maintain.
The kicker is, we haven't even covered the challenges with maintaining/refactoring your integration infrastructure as the number of requests it needs to execute scales, but that’s for us to dive into in a separate article.
If you’re interested in exploring platforms to help you offload maintenance and accelerate integration development, here's a build vs. buy guide you should check out.