Skip to main content

User Testing for CUI

User Testing

  • Special considerations for VUIs
  • Does your VUI understand the way people actually talk to it
  • Is the dialog management effective

Why testing with real users

  • Par and parcel of human-centered/user centered design
  • Find problems early on in the process and fix them
  • Most technology is designed to be used by and useful for people so should be tested with them too
  • People draw on prior experience when interacting with technology which can help or hinder
  • Improves your product
  • People can have expert/local knowledge

Designing a user study

  • What is the goal
  • What tasks are you asking the user to do
  • What sort of participants
  • What data collection and analysis are you doing

Task definition

  • Designed to exercise the parts of the system you want to test
  • Focused on primary dialog of paths
    • High risk areas
    • Address the major goals
  • Write the task definitions carefully to avoid biasing the participant
  • Describe the goal of the task

Task ordering

  • To avoid order effects, randomise tasks if possible, use a Latin Square design, each task in every person
  • Use counterbalance if using conditions

Data Analysis

  • Myriad approaches
  • Quantitave
    • Hypothesis testing
    • Descriptive and inferential statistics
    • Algorithmic/mathematical
  • Qualitative
    • Thematic analysis (themes)
    • Conversation analysis
    • Interaction analysis

Early Stage and Usability Testing

Early-stage testing

  • Testing concepts
  • Table reads with sample dialogcs
  • Initial reactions to mock-ups
  • Wizard of Oz testing
    • Human behind the curtain simulates fully working system
    • Realistic but much cheaper/quicker
    • Elicitation study to learn what lan/terms people use

Prelease and Pilot Testing

  • Dialog traversal testing
    • Purpose is to make sure that the system accurately implements the dialog specification in complete detail
    • Test all transactions
  • Loading testing
    • Verifying that the system will perform under the stress of many concurrent user sessions