Playbook Testing

Take playbooks live with the confidence and certainty of AI QA alignment

Playbook Testing

Taking playbooks live can come with a lot of unsurety about whether the AI QA system will understand the new or updated scorecards. Playbook tests are intended to give you the confidence to take playbooks live.

To setup and run tests for any one playbook, you can click on the "Test Playbook" button inside any Playbook.

Test Suites: Conversations

Your test suites are a collection of conversations that have entered Score AI from one of your enabled integrations. Conversations that cover relevant customer scenarios and representative responses should be maintained as part of test suites to ensure effective coverage off your customer support process.

On clicking the ▶︎ button on an individual conversation or using the "▶︎ Run all" button will allow you to run the tests. Each test will be run against the entire playbook the same way our AI QA system would do it in-practice. Once a test run is completed you will be presented with evaluation results for each criteria within the table.

Adding Conversations

Conversations can be added to test suites from 3 places:

  1. From a calibration report using the "Add as test case" button inside the 3 dot menu beside each conversation name.

  2. From inside a conversation using the "Add as test case" button.

Test Results

When test runs are successfully completed you will be presented with the results along with "Pass" or "Fail" labels assigned to each evaluation. Each criterion's evaluation is compared to an expected result that is taken from the source of the test case.

  • When adding conversations to your tests from a calibration report, you will be prompted to pick a calibrator whose audits should be taken as the expected answers.

  • When adding the conversation from the conversations page, the audits of the playbook you see will be taken. In case you wish to take different audits, we recommend using the "Edit" functionality on the criteria you wish to change and then proceed with adding the test case.

You can click on any cell and review the complete test results, containing remarks and knowledge base citations, as well as view up to 10 previous test results.

Editing Playbook Criteria

We have provided an edit option so that you can edit playbook evaluation criteria on the spot as you work through your failing tests.

Note: Tests can only be run when playbook changes are saved (i.e., a new version is created). We encourage you to work through all your failing tests while simultaneously editing related evaluation criteria. Once you arrive at an updated playbook, you can save all your changes and run the tests again to review if the changes have worked.

Best Practices for Playbook Testing

  1. Only add effective conversations as test cases: Don't pick the last 20 conversations audited by the AI and call that a test suite. The goal should be to build a collection of conversations that cover both common and edge cases that occur in your company's support interactions.

  2. Maintain your test cases:

    1. Adding conversations should not be a one-time activity but instead an ongoing activity where your team actively conducts calibration sessions and occasional manual audits, leading to an identification of new conversations which the AI QA system is not handling well.

    2. Cleanup old test cases that are no longer a concern for your process - there's a delete button for a reason!

  3. Run tests before you go live: Ensure that after you make any edits to your playbook, you visit the tests page and hit the "▶︎ Run all" button. This will make sure that an update like a removal of a bullet point from an elaboration change doesn't break an older (and likely forgotten) customer scenario.

  4. Test as part of your workflow: When trying to align our AI QA to your process during onboarding or when you have process changes you want to deploy, it is imperative that you incorporate playbook testing into your workflow so that you can make Score AI work for you. We have prepared a diagram below which suggests a workflow that might be ideal for some teams.