Strategies for Testing AI-Based Systems

Learn practical techniques, tools, and strategies to test AI and agentic AI systems for quality, reliability, security, and safety.

Description

The rapid proliferation of Artificial Intelligence (AI) and autonomous agentic systems presents unique and complex challenges for traditional software testing practices. Testers must evolve their skills to evaluate the quality, reliability, security, and ethical behavior of these intelligent systems.

This hands-on course equips testers with essential techniques and strategies for effectively testing AI and agentic AI systems. Participants learn how to apply specialized testing methods, tools, and workflows to validate model and solution behavior, from planning and execution through automation and reporting.

Key takeaways from this class include:

  • Understanding the key differences between traditional software testing and AI systems testing
  • Defining quality metrics and test strategies for ML models and AI data pipelines
  • Testing agentic AI behavior, including goals, plans, tool use, and emergent outcomes
  • Performing adversarial and robustness testing for AI components
  • Analyzing and reporting ethical and safety aspects of AI performance
  • Leveraging tools for data quality analysis, explainability (XAI), and continuous validation

Who Should Attend

This course is ideal for software testers, quality assurance engineers, and test managers who validate systems that include AI, machine learning, or agentic components. A foundational understanding of software testing principles and high-level AI/ML concepts is recommended.

Laptop and RDP Required

This class includes hands-on activities using sample software. Each attendee should bring a laptop with a remote desktop protocol (RDP) client pre-installed. Connection details and credentials are provided during class. Please coordinate with your IT administrator beforehand to confirm your RDP client can access a virtual machine in an AWS environment.

Course Duration and Schedule

Two-Day Format

8:30 AM - 4:30 PM each day with a 1-hour lunch break and morning and afternoon breaks.

Three-Day Format

11:30 AM - 5:00 PM each day with afternoon breaks.

Upcoming Training

✓ Guaranteed to Run

Course Certification Date Location Price Register
Strategies for Testing AI-Based Systems Jul 20 - Jul 22, 2026 Virtual Classroom $1,495 Register
Strategies for Testing AI-Based Systems Aug 11 - Aug 13, 2026 Virtual Classroom $1,495 Register
Strategies for Testing AI-based Systems Sep 20 - Sep 21, 2026 STARWEST 2026 - Anaheim, CA $1,595 Register
Strategies for Testing AI-Based Systems Oct 6 - Oct 8, 2026 Virtual Classroom $1,495 Register

Course Outline

Session 1: Foundations of AI QA

  • Introduction to the QA shift from deterministic to probabilistic validation
  • The oracle problem in AI testing and why expected vs. actual is not enough
  • Defining the system under test for GenAI and agentic AI
  • Inference testing focus vs. model training focus
  • Exercise #1: Identifying AI testing challenges

Session 2: Golden Dataset and Regression Testing

  • Building a curated golden dataset as a baseline for quality
  • Detecting regressions from prompt or model changes
  • Data sourcing strategies: production data vs. synthetic data
  • Labeling expected pass/fail outcomes
  • Exercise #2: Building a golden test set

Session 3: Metrics and Evaluation Methodologies

  • Key quality metrics: faithfulness, relevance, coherence, and toxicity
  • Heuristic evaluation techniques and semantic similarity checks
  • LLM-as-a-judge approaches for response grading
  • Choosing metrics based on risk and business context
  • Exercise #3: Evaluating outputs with multiple methods

Session 4: Testing RAG and Prompt Behavior

  • Retrieval testing: context precision and context recall
  • Verifying retrieved document chunks and IDs
  • Prompt testing: zero-shot vs. few-shot templates
  • Hallucination detection and groundedness checks
  • Exercise #4: Testing retrieval and faithfulness

Session 5: Testing Agentic AI Behavior

  • Unique characteristics of agentic systems: memory, state, and planning
  • Testing tool/API selection correctness
  • Verifying extracted parameters and action quality
  • Validating Thought -> Plan -> Action -> Observation flows
  • Exercise #5: Agent behavior validation

Session 6: Trajectory and Multi-Agent Testing

  • Evaluating trajectory quality, not just final answers
  • Identifying inefficient loops and redundant actions
  • Testing agent handoffs and routing behavior
  • Preventing instability and infinite conversation loops
  • Exercise #6: Trajectory analysis using execution logs

Session 7: Robustness and Non-Functional Requirements

  • Robustness testing for API failures and degraded dependencies
  • Boundary and out-of-distribution input testing
  • Latency testing: time to first token vs. total generation time
  • Cost and token consumption monitoring for sustainability
  • Exercise #7: NFR testing and error handling scenarios

Session 8: Security, Fairness, and Red Teaming

  • Prompt injection testing and defensive validation
  • PII leakage checks and sensitive data protections
  • Fairness and bias testing across response categories
  • Safety guardrails and output filtering strategies
  • Exercise #8: Red-team attack and mitigation assessment

Session 9: Automation and MLOps for AI Testing

  • Running continuous AI tests in CI/CD pipelines
  • Automating golden set execution for regression detection
  • Tooling overview for AI quality evaluation and observability
  • Shift-right monitoring for drift and user feedback signals
  • Exercise #9: CI/CD automation design for AI QA

Session 10: Wrap-up and Future of AI Testing

  • Consolidating practical strategies and testing patterns
  • Reviewing governance, ethics, and production quality controls
  • Emerging trends: self-healing systems and formal verification
  • Final retrospective and action planning