Platform

Use Cases

Developers

Resources

Pricing

For AI

Start for free

Book a demo

Platform

Use Cases

Developers

Resources

Pricing

For AI

Start for free

Book a demo

MVP to Production AI

How to Build Production-Ready RAG

Learn how to build a full stack enterprise knowledge AI application with data ingestion and permissions for RAG

View the GitHub repository for this tutorial

View the GitHub repo for this tutorial

Watch Video Tutorial

It's easier than ever to build interoperability into your AI product. There’s RAG, tools, MCP servers, and agent frameworks. But how do you make sure your integrated AI product is not just function, but also production-ready?

This tutorial series will help you build for production-readiness, starting with the data ingestion layer for RAG. We’ll go into the table stake features your production-ready AI application needs to have for RAG, including:

Seamless and secure access to 3rd-party data
Robust permissions & access control on 3rd-party data
Up-to-date data

In this chapter of our tutorial series, we’ll discuss what goes into each of these features for production-ready RAG and walk through step-by-step how to build them.

Production-grade RAG

By now, you've probably seen integration-enabled RAG (Retrieval Augmented Generation) in AI applications, where LLMs can be granted access to external data in the form of documents users give them or 3rd-party integrations users enable. A good example is our customer You.com, an enterprise AI knowledge agent product that enables users to upload files as context from OneDrive and Google Drive via file pickers.

Products like You.com’s agents aren’t just LLMs with local file uploading. Production-grade RAG requires the following:

Seamless and secure access for 3rd-party data ingestion

Your users’ data lives in other SaaS applications, whether it be file storage sources like OneDrive, CRMs, or other 3rd-party platforms that your users' context lives in.

Seamless and secure access to 3rd-party data requires the following:

Seamless: End-users should be able to authenticate into these 3rd-party SaaS platforms from within your application
Secure: End-users should be able to authorize your AI application to access their data, have your application request access/refresh tokens on their behalf, and keep those tokens securely

Robust permissions

Using your customers’ data from 3rd-party platforms also means respecting the 3rd-party permissions on that data. If an intern at one of your enterprise customers can’t access the folder of their colleague's performance reviews from the company's Google Drive, they definitely shouldn’t be able to access them from your RAG application.

With simple use cases and smaller applications, you might be able to get away with having each user isolated to their own set of documents or workspace for RAG. However, when building a product for larger user bases, teams, or enterprises, you’ll need to be able to handle different authorization patterns, such as propagating permissions across teams or admins authorizing access for an entire organization.

Up-to-date data

Your customers’ business context is constantly changing, especially in enterprises with large teams. Files are constantly being updated; new leads are constantly being added; new call recordings are constantly being updated.

As such, your AI product needs to keep up with your customers’ data, making sure that your AI can respond and decision on the most up-to-date context. Data permissions are similarly dynamic with events such as new employees being added to existing organizations, files being shared with additional colleagues, or employees leaving a team and therefore needing their access revoked.

Production-grade RAG needs to have mechanisms in place to change their underlying knowledge bases and permissions to reflect underlying changes in integrated data sources.

If those requirements seem daunting at first, that’s alright! We’ll walk you through the step-by-step process for building a production-ready AI application together.

Watch the video tutorial where we demo what we build live and walk through the entire implementation.

Tutorial Overview

YourApp.ai is an enterprise knowledge AI application that we built to represent your AI product. YourApp.ai is a multi-tenant application that can ingest its users’ external CRM and file data into its knowledge base and answer questions with that knowledge.

In addition to chat functionality, our application also has a dashboard view of data synced for RAG (as seen in the second screenshot). The dashboard view in YourApp.ai shows what data has been ingested and indexed for RAG retrieval (shown in the Objects Synced table) as well as when data was last retrieved and updated (shown in the Sync Events table).

In this tutorial, we separated the development process for YourApp.ai into three different layers: the UI, the integrations infrastructure, and the RAG/AI backend.

As we build out each layer, we’ll reference back to our production-ready requirements and show how each requirement fits into the YourApp.ai application.

Let’s get started building your AI app!

Step 1: The UI Layer

The UI layer (or client layer) is the end-user-facing part of our application. This layer encompasses the user authentication and AI chat features in the YourApp.ai application.

Quick Note: YourApp.ai is an AI chat product; however, if your AI product is not chat-based, the first section (User Authentication) is still relevant, but feel free to skim/skip the second section (AI Chat).

Step 1a: User Authentication

Seamless and secure access to 3rd-party data starts with secure authorization to our application. We first needed secure authentication for YourApp.ai before we could address accessing customer data from their 3rd-party SaaS platforms.

Our enterprise application needed to support different authentication practices ****- user/password, SSO, OAuth, MFA, email verification, bot detection. Rather than hand-roll our own authentication, we went with WorkOS’s AuthKit to make this process simpler.

AuthKit works like OAuth, where WorkOS authenticates users with redirects and authorization codes. But rather than implementing the logic in our application ourselves, AuthKit (WorkOS’s node.js library) was used with our frontend framework to abstract that logic.

For our Next.js implementation, all we needed to do to implement production-ready authentication was:

Set up our login and redirect URLs in the WorkOS dashboard
Add middleware (code run before each request in our app to enforce auth)

import { authkitMiddleware } from '@workos-inc/authkit-nextjs';

//we can specify what routes we allow unauthenticated
export default authkitMiddleware({
  middlewareAuth: {
    enabled: true,
    unauthenticatedPaths: ["/api/sync", "/api/sync/(.*)"],
  },
});

Set up two routes (one for the redirect callback, another for logging in) using AuthKit

//callback route
import { handleAuth } from '@workos-inc/authkit-nextjs';

export const GET = handleAuth();

/////////////////////////////////////////////////////////////
//login route
import { getSignInUrl } from '@workos-inc/authkit-nextjs';
import { redirect } from 'next/navigation';

export const GET = async () => {
	const signInUrl = await getSignInUrl();

	return redirect(signInUrl);
};

That’s actually the full code to set up production-ready authentication using WorkOS. For the full picture on AuthKit, check out their AuthKit documentation.

Step 1b: AI Chat

The chat UI for YourApp.ai was built using the AI SDK by Vercel, a node.js library for building AI apps. Using their useChat React hook, we could track messages, prompts, and chat state easily in our chat UI component.

import { useChat } from "@ai-sdk/react";
import { MarkdownMessage } from "./markdown-message";

export const Chat = () => {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    maxSteps: 3
  });
	///...
  return (
    <div className="flex flex-col max-w-screen w-[700px] mx-auto h-full">
      <div className="space-y-4 flex-1 overflow-y-auto pr-2">
        {messages.map(m => (
          <MarkdownMessage m={m} key={m.id} />
        ))}
      </div>
      <form onSubmit={handleSubmit} className="flex-shrink-0 py-4">
        <textarea id="chatTextArea"
          rows={3}
          className="bg-muted w-full p-2 border border-gray-300 rounded shadow-xl"
          value={input}
          placeholder="Say something..."
          onChange={handleInputChange}
        />
      </form>
    </div>
  )
}

As you can see, the messages, user input, and submissions were all handled by the useChat hook, making it easy to implement smooth chat UIs with streaming, message history, tool calling, and more.

Step 2: The Integration Infrastructure Layer

This layer encompasses all the integration logic and infrastructure for working with 3rd-party providers like Google Drive, Box, Salesforce, etc. For RAG, this encompasses:

syncing all the data in these 3rd-party data sources into our vector database*
enforcing the native permissions from these 3rd-party data sources
keeping all of this synced data and permissions up-to-date.

*Note: there are RAG implementations that use 3rd-party API calls at query-time instead of ingesting a copy of users' data into a vector database. While 3rd-party API tool calling will be covered in the next chapter of our tutorial series, our evaluations show that vector search proves to be more accurate for RAG. Check out the full study we did on querying with tools vs vector search.

You’ll notice this touches all 3 of our production requirements: seamless and secure access to 3rd-party data, robust permissions, and up-to-date data.

Challenges with integrations for production-grade RAG

I’ll use an example to help contextualize integration logic and infrastructure. Imagine you need to ingest all of your users’ Salesforce data for RAG.

Assuming you've already set up the Salesforce OAuth flow and refresh mechanisms, you’d need to research the Salesforce API endpoint to retrieve our users’ Contacts, Leads, and Accounts data and store it in our vector database
Next, you would need to research how Salesforce handles permissions - users can access Contacts data but not Leads, users can be part of a group that has access to Accounts, etc. - and store those permissions in an ACL, table, or graph database
We also need webhooks or CRON-based services to pull any data or permissions changes
Lastly, you need to put all of this together with a RAG pipeline that retrieves Salesforce data from your vector database and consults your database of permissions to authorize context to your LLM, before returning the LLM response to users

If this seems like a lot to handle, we can tell you firsthand that it is - especially when you bring in other 3rd-party data sources into the fold. In fact, we’ve actually put together previous tutorials outlining this exact process.

Having gone through this complex implementation ourselves, in this updated tutorial we’ll be using:

Paragon’s platform and SDK for 3rd-party authentication
Managed Sync - Paragon’s API product built for RAG and permissions - to greatly reduce the complexity we described above.

Let’s start with the 3rd-party authentication.

Step 2a: Seamless and Secure Access to 3rd-Party Data

Starting with the first production-ready requirement, we’re using Paragon to handle the end-user authentication to each 3rd-party data source. Whereas WorkOS handles all the facets of authenticating to our app, Paragon handles all the nuances with enabling users to authenticate and sync 3rd-party data.

This means that users can authenticate into Google Drive or any other 3rd-party provider directly in YourApp.ai and authorize our app to sync their data to our database for RAG.

With Paragon, a fully white-labeled authentication portal can be embedded in our application, allowing end users to log in to their 3rd-party platform (Salesforce in this example) in our application. Similar to WorkOS, Paragon has a node.js SDK that works with any frontend framework. A few lines of code is all it takes to render our 3rd-party authentication portal.

<IntegrationTile
  integration={integration}
  onConnect={() => paragon!.connect(integration.type, {})}
  key={integration.type}
/>

Step 2b: A Managed Approach to Data Ingestion and Refreshes

Managed Sync is a Paragon product that makes it easy to ship production-ready integrations for data ingestion. There are 2 components of Managed Sync - the Sync API for performant data ingestion for RAG and the Permissions API for native permissions on ingested data.

Starting with the Sync API, we used the Sync API to create data pipelines managed by Paragon’s integration infrastructure for YourApp.ai. This means we didn’t need to worry about what 3rd-party APIs to use, scaling microservices, keeping queues for retries and failures, monitoring, etc. We do still need to handle the vector database indexing and RAG pipeline, but Managed Sync “manages” the integration infrastructure we need to get 3rd-party data to the vector database indexing step.

Here’s how it works:

Essentially, a single API call was used to trigger a background job - Sync pipeline - that pulls all file data (or record data for CRMs) from our integration providers.

//triggers a sync pipeline
curl --request POST <https://managed-sync.useparagon.com/sync> \\
  --header 'Authorization: Bearer <token>' \\
  --header 'Content-Type: application/json' \\
  --data '{
  "integration": "googledrive",
  "pipeline": "files",
  "configuration": {}
}'

After triggering a Sync pipeline, YourApp.ai would then listen to Managed Sync’s webhooks to be notified whenever a sync is completed (can be configured to sync data every day, every hour, or even every minute). These webhooks ensured that our application keeps up-to-date data in our vector database via file_updated webhooks in the Google Drive example.

When our app received the webhook that a sync had completed and the data was ready to be pulled, we triggered the following methods to pull the contents of each synced file or record and index it to our Pinecone vector database.

The pullSyncedRecords method was used to call the Sync API to get a paginated data set of file/record data.

export const pullSyncedRecords = async (user: string, syncId: string, headers: Headers, cursor?: string): Promise<Array<string>> => {
  let erroredRecords: Array<string> = []

  const recordRequest = await fetch(process.env.MANAGED_SYNC_API + `/sync/${syncId}/records?pageSize=100&${cursor ? `cursor=${cursor}` : ""}`, {
    method: "GET",
    headers: headers,
  });
  const recordResponse: SyncedRecords = await recordRequest.json();
  for (const data of recordResponse.data) {
    const indexResponse = await indexRecordContent(user, syncId, headers, data.id);
    if (!indexResponse.success && indexResponse.erroredRecord) {
      erroredRecords.push(indexResponse.erroredRecord);
    }
  }
  if (recordResponse.paging.remaining_records > 0) {
    let newErroredRecords: Array<string> = await pullSyncedRecords(user, syncId, headers, recordResponse.paging.cursor);
    erroredRecords = erroredRecords.concat(newErroredRecords);
  }
  return erroredRecords;
}

For CRMs, pulling the record data is enough. However, for file storage providers, the record data returns the filename, update times, and links to the original file. For the actual file contents, we needed an additional step, where the pullSyncedRecords method was used to call the indexRecordContent method. This method was used to call an additional Sync API endpoint to pull the file/record text contents and index the text to our Pinecone vector database.

const indexRecordContent = async (user: string, syncId: string, headers: Headers, recordId: string): Promise<{ success: boolean, erroredRecord?: string }> => {
  const contentRequest = await fetch(process.env.MANAGED_SYNC_API + `/sync/${syncId}/records/${recordId}/content`,
    {
      method: "GET",
      headers: headers,
    });
  const metadata: Array<SyncedObject> = await getSyncedObjectById({ id: recordId });
  const contentResponse = await contentRequest.json();

  try {
    if (metadata.length > 0) {
      const numUpserted = pineconeService.upsertText({
        text: contentResponse as string,
        namespaceName: user,
        metadata: {
          url: metadata[0].data.url,
          record_name: metadata[0].data.name,
          source: metadata[0].source
        }
      });
      return { success: true }
    }
    return { success: false, erroredRecord: recordId };
  } catch (err) {
    console.error(`[INDEX] unable to index record ${recordId}: ${err}`)
    return { success: false, erroredRecord: recordId };
  }
}

In the indexRecordContent method, it’s important to note that the /sync/${syncId}/records/${recordId}/content endpoint returns data in a normalized format for each object type. All File Storage objects (Google Drive, Dropbox, Box files) will have the exact same normalized schema, meaning we don’t need integration-specific logic for each 3rd-party. The Sync API likewise has normalized formats for CRM records and Documents.

In YourApp.ai, we wanted to give users a view of data we synced from the Sync API - first in Object Synced to view all files/records ready for RAG retrieval and then in the Sync Events to see up-to-date data changes.

Now that we’ve walked through how data is synced to our vector database for RAG, we’ll discuss how to use Managed Sync’s Permissions API, designed to be used alongside the Sync API to enforce Robust Permissions for production-grade RAG.

Step 2c: A Managed Approach to Permissions

Permissions can be one of the largest challenges for RAG pipelines that handle 3rd-party data. If you recall our Salesforce example:

Next, you need to research how Salesforce handles permissions - users can access Contacts data but not Leads, users can be part of a group that has access to Accounts, etc. - and store those permissions in an ACL, table, or graph database

Trying to mirror those native permissions can be a difficult task that you cannot get wrong in production. Similar to syncing data, you would need to store the 3rd-party permissions data in an ACL or graph database. Building across multiple 3rd-party integrations, your permissions graph could look something like this.

If you’re interested in learning about the nuances of permissions with multiple integrations, read our tutorial on what it’s like to build a permissions system from scratch rather than a managed approach.

Rather than try to model the native permissions behavior from those 3rd-parties, Managed Sync’s Permissions API manages the permissions modeling of the 3rd-party, allowing us to just check access with a simple API request.

curl --request POST <https://managed-sync.useparagon.com/permissions/check> \\
  --header 'Authorization: Bearer <token>' \\
  --header 'Content-Type: application/json' \\
  --data '{
  "object": {},
  "user": {},
  "role": "<string>"
}'

We’ll be using this simple Permissions API check in the RAG & AI layer to ensure file/record permissions are enforced on RAG context.

Step 3: RAG & AI Layer

Our last layer is the backend AI implementation that takes the data & permissions from the integration infrastructure layer and puts it to use in our production-grade RAG pipeline. We’ll go over how we:

Indexed and searched the synced file/record data with Pinecone
Implemented a permissions step with Permissions API

Step 3a: Indexing to Pinecone

There were a few reasons we chose Pinecone for YourApp.ai’s vector database.

We were looking for a managed vector database that supported key features like namespaces and metadata (more on this later)
Pinecone’s integrated inference gives their users access to a suite of embedding and reranking models that can be used directly with their API

With Pinecone’s upsertRecords API, we were easily able to implement a function that took chunked text data and file/record metadata (such as file names and source URLs) and indexed them to Pinecone.

public async upsertText({ text, metadata, namespaceName }: { text: string, metadata: any, namespaceName: string }): Promise<any> {
  const chunkSize = 512;
  const overlap = 20;
  const data = [];

  let end = 0;
  let beg = 0;
  while (end < text.length) {
    end = beg + chunkSize;
    data.push({
      ...metadata,
      id: v4(),
      chunk_text: end >= text.length ? text.slice(beg) : text.slice(beg, end),
    });
    beg = end - overlap;
  }

  const namespace = this._pc.index(process.env.PINECONE_INDEX!).namespace(namespaceName);
  const numUpserted = await namespace.upsertRecords(data);
  return numUpserted;
}

On the RAG vector retrieval side, we used the Pinecone searchRecords API to not only search for semantically similar text, but also rerank results for better RAG performance (if you’re interested in how to fine-tune RAG performance, read this study our team conducted).

public async retrieveContext({ query, namespaceName }: { query: string, namespaceName: string }): Promise<any> {
  const namespace = this._pc.index(process.env.PINECONE_INDEX!).namespace(namespaceName);
  const searchWithText = await namespace.searchRecords({
    query: {
      topK: 5,
      inputs: { text: query },
    },
    fields: ['chunk_text', 'url', 'record_name', 'source'],
    rerank: {
      model: 'bge-reranker-v2-m3',
      rankFields: ['chunk_text'],
      topN: 3,
    },
  });

  return searchWithText;
}

Notice that in neither method did we need to implement a method for transforming text to vector. It’s thanks to Pinecone Integrated Inference that we mentioned earlier! While Pinecone of course supports custom embeddings and search, their managed solution is a good choice to use their tried-and-true models.

Step 3b: Permissions as Part of the RAG Pipeline

As Robust Permissions is part of our product-ready requirements, it’s an essential part of YourApp.ai’s RAG pipeline.

Managed Sync, despite the name, doesn’t just manage data syncs; it also manages permissions, as we’ve gone over in step 2c. For YourApp.ai, Managed Sync has ingested and indexed a database that models the native 3rd-party permissions where our indexed vector database was sourced from - all on Paragon’s infrastructure. This means we do NOT need to index our own permissions database. Instead, we can consult the Permissions API right in our RAG pipeline.

After context is retrieved from Pinecone, YourApp.ai has a subsequent step that checks each source with the Permissions API (this is why it’s important that vector databases support metadata; also notice that in our application screenshot below, we rendered text previews, file names, and the link to the original URL using metadata).

const enforcePermissionsOnContext = async (contexts: Array<any>, userId: string): Promise<Array<any>> => {
  const permRequest = await fetch(process.env.MANAGED_SYNC_URL + "/permissions/list-objects", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${session.paragonUserToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      user: {
        id: userId
      },
      role: 'reader',
    }),
  });
  const permResponse = await permRequest.json();
  const permObjects: Array<any> = permResponse.objects;
  const permSet = new Set(permObjects.map((obj) => {
    return obj.id;
  }));
  const allowedContext = contexts.filter((context) => permSet.has(context.filename.split("/").at(-1)));
  return allowedContext;
}

This method ensures only allowed context gets to the LLM for answer synthesis and generation.

Wrapping Up

In this tutorial, we covered requirements necessary for RAG-enabled SaaS applications to go from MVP to production - seamless and secure access to data, robust permissions, and up-to-date data.

Building for these 3 requirements can certainly be a lot for product and engineering teams. Products like WorkOS’s Authkit, Pinecone’s Integrated Inference, and Paragon’s Managed Sync make shipping production-ready AI features easier and faster.

In the next chapter, we’ll be talking about tool calling for production-ready AI. Subscribe to Inference - a monthly newsletter by the team at Paragon to help you take your AI features from MVP to production - to be notified when the newest tutorial drops.

If you’re interested in Paragon, Managed Sync, and how we can help with the data ingestion and permissions components of your RAG pipeline, reach out to talk with our team and get a personalized demo. See you next time!

CHAPTERS

Table of contents will appear here.

Access GitHub repository

Jack Mu

,

Developer Advocate

mins to read

Ship native integrations 7x faster with Paragon

Explore the platform

Permissions for RAG Deep Dive

Ready to get started?

Join hundreds of SaaS companies that are scaling their integration roadmaps with Paragon

Book a demo

Start free trial

Ready to get started?

Join hundreds of SaaS companies that are scaling their integration roadmaps with Paragon

Book a demo

Start free trial

Ready to get started?

Join hundreds of SaaS companies that are scaling their integration roadmaps with Paragon

Book a demo

Ready to get started?

Join hundreds of SaaS companies that are scaling their integration roadmaps with Paragon

Book a demo

Start free trial

The embedded integration platform for developers

Platform

Authentication

Connect Portal

All integrations

Custom integration builder

Monitoring

Cloud & Self-Hosting

Use Cases

Ingestion for RAG

Integrations for AI Agents

Real-time Sync

Pre-built Automations

Workflow Builder Actions

Products

Workflows

ActionKit

Developers

Documentation

API Reference

Changelog

Status page

Support

Get Started

Connect SDK

Displaying the Connect Portal

Building Workflows

Sample App

Popular Tutorials

RAG & AI Agents

Salesforce integration

HubSpot integration

Slack integration

Popular Categories

CRM

File Storage

Project Management

Documents

Resources

Security

Blog

Customer stories

Careers

Product tour

The embedded integration platform for developers

Platform

Authentication

Connect Portal

All integrations

Custom integration builder

Monitoring

Cloud & Self-Hosting

Use Cases

Ingestion for RAG

Integrations for AI Agents

Real-time Sync

Pre-built Automations

Workflow Builder Actions

Products

Workflows

ActionKit

Developers

Documentation

API Reference

Changelog

Status page

Support

Get Started

Connect SDK

Displaying the Connect Portal

Building Workflows

Sample App

Popular Tutorials

RAG & AI Agents

Salesforce integration

HubSpot integration

Slack integration

Popular Categories

CRM

File Storage

Project Management

Documents

Resources

Security

Blog

Customer stories

Careers

Product tour

The embedded integration platform for developers

Platform

Authentication

Connect Portal

All integrations

Custom integration builder

Monitoring

Cloud & Self-Hosting

Use Cases

Ingestion for RAG

Integrations for AI Agents

Real-time Sync

Pre-built Automations

Workflow Builder Actions

Products

Workflows

ActionKit

Developers

Documentation

API Reference

Changelog

Status page

Support

Get Started

Connect SDK

Displaying the Connect Portal

Building Workflows

Sample App

Popular Tutorials

RAG & AI Agents

Salesforce integration

HubSpot integration

Slack integration

Popular Categories

CRM

File Storage

Project Management

Documents

Resources

Security

Blog

Customer stories

Careers

Product tour

The embedded integration platform for developers

Platform

Authentication

Connect Portal

All integrations

Custom integration builder

Monitoring

Cloud & Self-Hosting

Use Cases

Ingestion for RAG

Integrations for AI Agents

Real-time Sync

Pre-built Automations

Workflow Builder Actions

Products

Workflows

ActionKit

Developers

Documentation

API Reference

Changelog

Status page

Support

Get Started

Connect SDK

Displaying the Connect Portal

Building Workflows

Sample App

Popular Tutorials

RAG & AI Agents

Salesforce integration

HubSpot integration

Slack integration

Popular Categories

CRM

File Storage

Project Management

Documents

Resources

Security

Blog

Customer stories

Careers

Product tour

How to Build Production-Ready RAG

Production-grade RAG

Tutorial Overview

Step 1: The UI Layer

Step 1a: User Authentication

Step 1b: AI Chat

Step 2: The Integration Infrastructure Layer

Step 2a: Seamless and Secure Access to 3rd-Party Data

Step 2b: A Managed Approach to Data Ingestion and Refreshes

Step 2c: A Managed Approach to Permissions

Step 3: RAG & AI Layer

Step 3a: Indexing to Pinecone

Step 3b: Permissions as Part of the RAG Pipeline

Wrapping Up

CHAPTERS

TABLE OF CONTENTS

Jack Mu

,

Ship native integrations 7x faster with Paragon

Ready to get started?

Ready to get started?

Ready to get started?

Ready to get started?