Tutorials

March 25, 2026

How to Build a QA Process for Your Intercom Support Team (From Scratch)

Varun Arora

Intercom is built for fast, personal customer support. But here's something most Intercom teams discover the hard way: Intercom tells you how fast your team responds. It doesn't tell you how well they respond.

CSAT surveys show you how customers felt. Conversation ratings give you volume trends. But neither tells you whether your agents followed the right process, used the right tone, or actually resolved the problem correctly. That's the gap Quality Assurance fills.

If you're running a support team on Intercom and don't have a QA process in place, this guide will walk you through building one from scratch. No fluff, no theory. Just the steps that actually work, based on how high-performing support teams operate.

Why Intercom Teams Need a Separate QA Process

Before we get into the how, it's worth understanding why Intercom's built-in tools aren't enough on their own.

Intercom's CX Score

It is their newest quality metric. It uses AI to rate every conversation across resolution, sentiment, and service quality. It's a step forward from CSAT, which depends on customers voluntarily filling out surveys. But Intercom's own documentation is clear: CX Score is not intended for assessing individual teammate performance. It measures customer experience, not whether your agents followed your internal standards.

Intercom's Monitors

This feature is their closest offering to native QA. It lets you set up scorecards and review Fin AI conversations against defined criteria. But as of early 2026, Monitors is in closed beta and currently focused on Fin conversations. Human agent QA is listed as an upcoming feature, not a current one.

Conversation ratings (CSAT)

These are useful but structurally limited. Response rates are typically low (often under 20%), the feedback is biased toward extremes (very happy or very frustrated customers), and a good CSAT score doesn't mean the agent followed your process. An agent could give a technically incorrect answer in a friendly tone and still get a smiley face.  Intercom’s own blog acknowledges this limitation- as they put it in their quality benchmark report, relying on any single metric gives you an incomplete picture of your team’s performance.

This means if you want to know whether your agents are consistently delivering support that meets your internal standards, you need a process that lives outside of what Intercom provides natively.

Step 1: Define What 'Good' Looks Like for Your Team

Every QA process starts with this question: What does a great support conversation look like at your company?

Don't overthink this. Start with 3-5 things you care about most. For most Intercom teams, these fall into three pillars:

Soft Skills

This covers tone, empathy, personalisation, and professionalism. Did the agent greet the customer by name? Did they acknowledge the frustration before jumping into a solution? Did they sound like a human or a template?

Intercom themselves use an internal framework called PREACH: Proud, Responsible, Empathetic, Articulate, Concise, and Human. You don't have to copy it, but it's a useful reference for what "good tone" looks like in a messaging-first support environment.

Issue Resolution

Did the agent actually solve the problem? Did they solve it correctly? Did they confirm resolution with the customer before closing the conversation? This is where you catch agents who close tickets prematurely or give technically inaccurate answers that happen to satisfy the customer in the moment.

Process Adherence

Did the agent follow your internal procedures? This includes things like: using the correct tags and attributes, escalating correctly when needed, adding internal notes for context, following up when promised, and using macros appropriately rather than sending copy-pasted responses that miss the context of the conversation.

Pro tip:

Write these standards down in a shared document. Your agents need to know what they're being evaluated on before you start evaluating them. Surprise QA breeds resentment. Transparent QA builds trust.

Step 2: Build Your QA Scorecard

A scorecard turns your quality standards into something measurable. It's the rubric your reviewers will use to evaluate every conversation.

Here's a simple scorecard structure that works well for Intercom teams:

Category

What to evaluate

Suggested weight

Tone & Empathy

Friendly, personalised, empathetic. No robotic language.

20%

Accuracy

Correct information. No misleading or wrong answers.

25%

Resolution

Problem fully solved. Customer confirmed or clearly satisfied.

25%

Process

Correct tags, notes, escalation paths, macros used properly.

15%

Efficiency

No unnecessary back-and-forth. Clear, concise communication.

15%

The weights above are starting points. Adjust them based on what matters most to your team. If you're in a regulated industry, process adherence might carry 40%. If you're a consumer brand, tone and empathy might dominate. The key is that your scorecard reflects your definition of quality, not a generic one.

If you want a deeper dive on scorecard design, Help Scout’s guide to support QA and Front’s scorecard template are both excellent references.

Step 3: Decide What to Review (and How Often)

You can't review every conversation manually. The industry standard for manual QA is somewhere between 2-5% of total conversations. The question is which 2-5% to pick.

There are four practical sampling strategies for Intercom teams:

  1. Random sampling gives you an unbiased baseline. Pull a random set of conversations each week across all agents. This is the best way to get a true picture of your team's quality when you're starting out.

  2. Targeted sampling focuses on high-impact conversations. Filter by low CSAT ratings, conversations with many replies (which often indicate confusion or back-and-forth), escalated conversations, or conversations involving VIP customers. Intercom's reporting lets you filter and export conversations by these attributes.

  3. New hire reviews should be a default. Every new agent's first 2-4 weeks of conversations should be reviewed at a much higher rate, ideally 50-100%. This is where you catch training gaps before they become habits.

  4. Complaint-triggered reviews catch the conversations that matter most. When a customer complains, escalates, or churns, reviewing the conversation that preceded it gives you the clearest signal of what went wrong.

It takes roughly 10-15 minutes per conversation review once you have a scorecard in place, so the time investment for a manager is around 7-8 hours per week based on the volume being audited.

Step 4: Set Up Your Review Workflow in Intercom

Here’s where it gets practical. Intercom doesn’t have a native QA workflow, so you need to build one outside of Intercom.

Why Spreadsheets and Manual QA Don’t Work

The default instinct is to start with a spreadsheet. Export conversations from Intercom, open them one by one, score them in Google Sheets, and share feedback over Slack or in 1:1s. Most teams try this. Most teams also abandon it within a month. Here’s why:

Intercom’s CSV export gives you conversation metadata, but not the actual message content. So you’re either clicking into every conversation manually in the Inbox, or building API scripts to pull full transcripts. Before you’ve scored a single conversation, you’ve already spent hours on data plumbing.

Then there’s the scoring itself. Spreadsheets don’t enforce consistency. Two reviewers scoring the same conversation will drift without calibration. There’s no audit trail, no way to track whether feedback was delivered, and no dashboards showing agent quality trends over time. You end up with a spreadsheet that’s two weeks out of date and a QA process that exists in theory but not in practice.

The biggest problem with manual QA isn’t that it’s slow. It caps your coverage at 2-3% of conversations, which means 97% of your customer interactions go completely unreviewed. You’re making decisions about agent quality based on a tiny, possibly unrepresentative sample. That’s not quality assurance. That’s guesswork.

What a Proper QA Workflow Looks Like

A dedicated QA tool that integrates directly with Intercom eliminates the logistics and lets you focus on what actually moves the needle: reviewing conversations and coaching your team. Here’s what the right tool should give you:

  • Automatic conversation ingestion. Conversations flow in from Intercom automatically. No CSV exports, no API scripts, no manual copying of links. You open the tool and your conversations are there, ready for review.

  • AI-powered scoring at scale. Instead of manually reviewing 2-3% of conversations, AI evaluates 100% of your volume against your scorecard criteria. Your team reviews the ones that are flagged, not random picks from a haystack.

  • Custom scorecards that match your standards. The scorecard you built in Step 2 should live inside your QA tool, not in a separate spreadsheet. This keeps scoring consistent across reviewers and ties directly to agent performance dashboards.

  • Agent-level dashboards and trends. You should be able to see, at a glance, how each agent is trending on accuracy, tone, process adherence, and resolution quality over weeks and months. This is what turns QA from a one-off audit into a continuous improvement system.

  • Built-in coaching workflows. Feedback should be tied directly to the reviewed conversation, visible to the agent, and trackable. If your review scores live in one place and your coaching conversations happen somewhere else, the loop never closes.

There are several QA tools that integrate with Intercom: Zendesk QA (formerly Klaus), Score AI (which we built specifically for this use case), MaestroQA, Scorebuddy, etc. The right choice depends on your team size, budget, and whether you need QA for just Intercom or across multiple helpdesks. But the principle is the same regardless of which tool you pick: your QA process should not depend on your manager’s discipline to maintain a spreadsheet. It should run on a system that makes quality visible, consistent, and actionable by default.

Step 5: Close the Loop with Coaching

This is where most QA programs fail. Teams invest in scorecards and reviews, then the scores sit in a spreadsheet and nothing changes. QA without coaching is just surveillance.

Here's how to make QA actually improve your team's performance:

Share feedback within 48 hours.

The longer you wait, the less the agent remembers about the conversation. Fast feedback is specific feedback.

Lead with what went well.

Every review should highlight at least one thing the agent did right. This isn't about being nice. It's about reinforcing the behaviours you want to see more of.

Be specific, not vague.

"Your tone needs work" is useless. "In this conversation, the customer expressed frustration in their second message, and your reply jumped straight to troubleshooting without acknowledging it. Try leading with something like 'I understand this is frustrating' before moving into the fix" is actionable.

Track improvements over time.

An agent who scored 65% on accuracy in Week 1 and 82% in Week 4 is making progress. Make that visible. Celebrate the improvement. If scores stay flat or decline, that's a signal for a deeper conversation about training gaps or role fit.

Step 6: Measure and Iterate

Once your QA process is running, you need to know if it's working. Track these metrics:

Quality Score (QS):

This is your core QA metric. It's the average score across all reviewed conversations, expressed as a percentage. According to the 2022 Customer Service Quality Benchmark Report from Intercom, Klaus, and Support Driven, the average IQS across companies that track it rose from 81% to 89%. But what matters more than the absolute number is the trend. Are scores improving month over month?

CSAT correlation:

Compare your QA Score with your CSAT scores. If your QA Score is high but CSAT is low, your internal standards might not align with what customers actually value. If CSAT is high but QA Score is low, your customers might be easy to please, but your process has gaps that will bite you later.

Review coverage:

What percentage of total conversations are you actually reviewing? If you're at 2% and want to get to 10%, that's a capacity decision. If you're at 2% and can't get above it without burning out your managers, that's a tool decision.

Coaching completion rate:

Are the feedback sessions actually happening? A QA score without a follow-up conversation is just a number.

Revisit your scorecard once a month:

Customer expectations evolve, your product changes, and your team's skill level grows. The QA process should evolve with them.

What About QA for Fin AI Agent?

If you're using Fin, Intercom's AI agent, you have an additional QA challenge. Fin is handling a growing share of your conversations, and its responses need quality oversight too.

Intercom's CX Score does cover Fin conversations, and their Monitors feature (currently in beta) is specifically designed for evaluating Fin's responses against scorecards. But if you don't have Monitors access yet, you can apply the same QA framework outlined above to Fin conversations:

  • Sample Fin-handled conversations weekly and score them on accuracy, helpfulness, and whether they correctly escalated when needed.

  • Pay special attention to conversations where Fin resolved without a human handoff. These are the highest-risk for quality issues, because no human saw the conversation.

  • Track Fin's QA scores separately from your human agents. This gives you a clear picture of where Fin is performing well and where it needs tuning.

As AI handles more of your volume, QA for AI agent responses becomes just as important as QA for human agents. Perhaps more so, because a bad AI response can be sent to thousands of customers before anyone notices.

Getting Started: Your First Two Weeks

Week 1

Define your quality standards. Build a simple scorecard. Pick 5 conversations per agent and review them. Score in a spreadsheet. Share feedback in your next 1:1.

Week 2

Repeat the reviews. Compare Week 1 and Week 2 scores. Run one group calibration session where two reviewers score the same conversation and compare notes. Adjust your scorecard based on what you learned.

That's it. You now have a functioning QA process. It's not perfect. It doesn't have to be. The goal in the first month isn't to build a perfect system. It's to build the habit of reviewing conversations, giving feedback, and tracking quality over time.

Once that habit is in place, you can decide whether to scale with a spreadsheet or bring in a tool. Either way, you'll be making that decision from a position of clarity rather than guesswork.