Strategies for Testing AI-Based Systems

Learn practical techniques, tools, and strategies to test AI and agentic AI systems for quality, reliability, security, and safety.

Description

The rapid proliferation of Artificial Intelligence (AI) and autonomous agentic systems presents unique and complex challenges for traditional software testing practices. Testers must evolve their skills to evaluate the quality, reliability, security, and ethical behavior of these intelligent systems.

This hands-on course equips testers with essential techniques and strategies for effectively testing AI and agentic AI systems. Participants learn how to apply specialized testing methods, tools, and workflows to validate model and solution behavior, from planning and execution through automation and reporting.

Key takeaways from this class include:

Understanding the key differences between traditional software testing and AI systems testing
Defining quality metrics and test strategies for ML models and AI data pipelines
Testing agentic AI behavior, including goals, plans, tool use, and emergent outcomes
Performing adversarial and robustness testing for AI components
Analyzing and reporting ethical and safety aspects of AI performance
Leveraging tools for data quality analysis, explainability (XAI), and continuous validation

Who Should Attend

This course is ideal for software testers, quality assurance engineers, and test managers who validate systems that include AI, machine learning, or agentic components. A foundational understanding of software testing principles and high-level AI/ML concepts is recommended.

Laptop and RDP Required

This class includes hands-on activities using sample software. Each attendee should bring a laptop with a remote desktop protocol (RDP) client pre-installed. Connection details and credentials are provided during class. Please coordinate with your IT administrator beforehand to confirm your RDP client can access a virtual machine in an AWS environment.

Course Duration and Schedule

Two-Day Format

8:30 AM - 4:30 PM each day with a 1-hour lunch break and morning and afternoon breaks.

Three-Day Format

11:30 AM - 5:00 PM each day with afternoon breaks.

Upcoming Training

✓ Guaranteed to Run

	Course	Date	Location	Price	Register
✓	Strategies for Testing AI-Based Systems	Jul 20 - Jul 22, 2026	Virtual Classroom	$1,495	Register
	Strategies for Testing AI-Based Systems	Aug 11 - Aug 13, 2026	Virtual Classroom	$1,495	Register
	Strategies for Testing AI-based Systems	Sep 20 - Sep 21, 2026	STARWEST 2026 - Anaheim, CA	$1,595	Register
	Strategies for Testing AI-Based Systems	Oct 6 - Oct 8, 2026	Virtual Classroom	$1,495	Register

Course Outline

Session 1: Foundations of AI QA

Introduction to the QA shift from deterministic to probabilistic validation
The oracle problem in AI testing and why expected vs. actual is not enough
Defining the system under test for GenAI and agentic AI
Inference testing focus vs. model training focus
Exercise #1: Identifying AI testing challenges

Session 2: Golden Dataset and Regression Testing

Building a curated golden dataset as a baseline for quality
Detecting regressions from prompt or model changes
Data sourcing strategies: production data vs. synthetic data
Labeling expected pass/fail outcomes
Exercise #2: Building a golden test set

Session 3: Metrics and Evaluation Methodologies

Key quality metrics: faithfulness, relevance, coherence, and toxicity
Heuristic evaluation techniques and semantic similarity checks
LLM-as-a-judge approaches for response grading
Choosing metrics based on risk and business context
Exercise #3: Evaluating outputs with multiple methods

Session 4: Testing RAG and Prompt Behavior

Retrieval testing: context precision and context recall
Verifying retrieved document chunks and IDs
Prompt testing: zero-shot vs. few-shot templates
Hallucination detection and groundedness checks
Exercise #4: Testing retrieval and faithfulness

Session 5: Testing Agentic AI Behavior

Unique characteristics of agentic systems: memory, state, and planning
Testing tool/API selection correctness
Verifying extracted parameters and action quality
Validating Thought -> Plan -> Action -> Observation flows
Exercise #5: Agent behavior validation

Session 6: Trajectory and Multi-Agent Testing

Evaluating trajectory quality, not just final answers
Identifying inefficient loops and redundant actions
Testing agent handoffs and routing behavior
Preventing instability and infinite conversation loops
Exercise #6: Trajectory analysis using execution logs

Session 7: Robustness and Non-Functional Requirements

Robustness testing for API failures and degraded dependencies
Boundary and out-of-distribution input testing
Latency testing: time to first token vs. total generation time
Cost and token consumption monitoring for sustainability
Exercise #7: NFR testing and error handling scenarios

Session 8: Security, Fairness, and Red Teaming

Prompt injection testing and defensive validation
PII leakage checks and sensitive data protections
Fairness and bias testing across response categories
Safety guardrails and output filtering strategies
Exercise #8: Red-team attack and mitigation assessment

Session 9: Automation and MLOps for AI Testing

Running continuous AI tests in CI/CD pipelines
Automating golden set execution for regression detection
Tooling overview for AI quality evaluation and observability
Shift-right monitoring for drift and user feedback signals
Exercise #9: CI/CD automation design for AI QA

Session 10: Wrap-up and Future of AI Testing

Consolidating practical strategies and testing patterns
Reviewing governance, ethics, and production quality controls
Emerging trends: self-healing systems and formal verification
Final retrospective and action planning