Calibrations: Advanced Features

Team consensus and analytics reporting

Group Calibration Sessions

Group calibrations allow multiple calibrators to review conversations together and reach a consensus, creating a "gold standard" evaluation that can be compared against the AI.

Starting a Group Calibration

  1. From a calibration report, click Start Group Calibration


  2. Select the conversations you want to calibrate as a group using the checkboxes


  3. Click Done Selecting

Note: You can change the selected conversations later if needed by clicking "Change Conversations".

Completing a Group Calibration

Once started, the Group Calibration section appears at the top of the report.

Review Process

For each selected conversation:

  1. Expand the conversation in the accordion

  2. Review each playbook and criterion

  3. Select the correct evaluation from the available options:

    • AI's evaluation

    • Each calibrator's evaluation

Consensus Selection

Radio buttons display:

  • "AI: [value]" - The AI's evaluation

  • "[Calibrator Name]: [value]" - Each calibrator's evaluation

  • Any remarks provided by the AI or calibrator

Select the option that the team agrees represents the correct evaluation.

Progress Tracking

Each conversation shows a checkmark icon when all criteria have been reviewed and consensus values selected.

Saving and Completing

  • Save Progress: Saves your current selections without completing the calibration (you can return later)

  • Complete Calibration: Finalizes the group calibration (only enabled when all conversations are complete)

Group Calibration Report View

After completing a group calibration, the report switches to a tabbed view:

Group Calibration Tab

Displays a table comparing AI evaluations vs. consensus values:

  • AI Column: The AI's original evaluation

  • Consensus Column: The group's agreed-upon correct evaluation

  • AI Variance Row: Shows percentage of disagreements between AI and consensus

Highlighted cells (red/pink background) indicate where the AI differed from the consensus.

Individual Calibration Tab

Shows the original individual calibration table with all calibrators' evaluations.

Using Group Calibration Insights

The completed group calibration provides valuable data:

High AI variance on specific criteria may indicate:

  • The playbook criterion needs clearer definition

  • The AI needs retraining on edge cases

  • The criterion might be too subjective for consistent evaluation

Use these insights to:

  1. Update playbook descriptions and examples

  2. Create test cases from consensus evaluations

  3. Train the AI on the consensus values

  4. Adjust scoring weights if needed

Analytics & Performance Tracking

Monitor your calibration program's effectiveness through the Analytics dashboard.

Accessing Calibration Analytics

Navigate to Analytics > QA Performance > Calibrations

Filtering Data

Use the Completed Date filter to analyze specific time periods:

  • Last 7 days

  • Last 30 days

  • Custom date ranges

Interpreting the Data

Healthy calibration programs typically show:

  • Consistent weekly or daily completion rates

  • Gradual decrease in variance over time (as playbooks improve)

  • High completion rates relative to scheduled sessions

Warning signs to watch for:

  • Multiple missed or overdue sessions

  • Increasing variance over time

  • Single calibrator completing all reviews (should be distributed)

Best Practices

Starting Your Calibration Program

Week 1-2: Pilot Phase

  1. Add 2-3 calibrators initially

  2. Sample 5-10 conversations per session

  3. Run daily sessions to build momentum

  4. Focus on reviewing results together as a team

Week 3-4: Establish Rhythm

  1. Adjust sample size based on calibrator capacity

  2. Fine-tune frequency (daily vs. weekly)

  3. Begin identifying common variance patterns

  4. Start making playbook improvements based on findings

Month 2+: Optimize

  1. Run regular group calibrations for complex cases

  2. Build test case library from calibrated conversations

  3. Track variance trends over time

  4. Expand calibrator team if needed

Calibrator Selection

Good calibrator candidates:

  • Deep understanding of your quality standards

  • Consistent, reliable judgment

  • Available time to complete reviews on schedule

  • Able to provide constructive feedback

Tips:

  • Start with 2-3 calibrators to keep sessions manageable

  • Rotate calibrators periodically to avoid bias

  • Ensure calibrators represent different teams/perspectives

Sampling Strategy

Balanced sampling considers:

  • Conversation scores: Include both high and low scoring conversations

  • Time of day: Morning, afternoon, and evening interactions

  • Channels: Calls, emails, chats in proportion to your volume

  • Representatives: All active representatives unless excluded

The system automatically applies these weights, but you can adjust the sample size to focus more deeply on specific patterns.

Responding to High Variance

When you see high variance between AI and calibrators:

1. Review the Specific Disagreements

  • Click into cells to read remarks

  • Look for patterns (same criterion, same type of conversation, etc.)

2. Diagnose the Root Cause

  • Is the criterion description unclear?

  • Is this an edge case not covered in the playbook?

  • Do calibrators need training on this criterion?

  • Is the AI missing context?

3. Take Action

  • Unclear criteria: Rewrite playbook descriptions with specific examples

  • Edge cases: Document them in the playbook or add to test cases

  • Training needs: Share calibration results in team meetings

  • AI issues: Create test cases to train the AI on correct evaluations

4. Measure Improvement

  • Track variance percentage over time

  • Expect gradual decrease as playbooks improve

  • Celebrate wins when variance drops below 10%

Group Calibration Best Practices

When to use group calibration:

  • Complex edge cases that need discussion

  • High-variance criteria needing clarification

  • Training new calibrators

  • Quarterly deep-dives into playbook accuracy

How to run effective sessions:

  1. Review individually first: Have calibrators complete individual reviews before the group session

  2. Focus discussion: Only discuss items with disagreement

  3. Document decisions: Use remarks to explain why consensus was reached

  4. Update playbooks: Immediately update criteria based on group discussions

  5. Create tests: Add consensus conversations to test suites

Frequency recommendations:

  • Monthly group calibrations for mature programs

  • Weekly during playbook development or major changes

  • Ad-hoc for specific training needs

Maintaining Momentum

Keep calibrators engaged:

  • Share insights from calibration data in team meetings

  • Recognize calibrators who complete reviews on time

  • Demonstrate playbook improvements driven by their feedback

  • Rotate focus areas (different playbooks, criteria) to maintain interest

Avoid calibration fatigue:

  • Don't over-sample (10-20 conversations per session is usually sufficient)

  • Vary the conversation types in samples

  • Provide adequate time for completion (3-7 days)

  • Automate reporting and recognition where possible

Continuous Improvement Cycle

  1. Calibrate: Run regular sessions per your schedule

  2. Analyze: Review variance metrics and specific disagreements

  3. Improve: Update playbooks, train calibrators, adjust AI

  4. Validate: Create test cases to prevent regression

  5. Repeat: Continue the cycle with decreasing variance over time

Success metrics to track:

  • Variance trends (decreasing over time)

  • Calibrator completion rates (consistently high)

  • Test case pass rates (improving over time)

  • QA team confidence scores (via surveys)