Guides

Testing Your Agents

AI agent code presents unique testing challenges: LLM responses are non-deterministic, tool execution has side effects, and multi-agent workflows involve complex async interactions. This guide provides practical patterns for testing each layer of an Acton AI application.


Testing strategies

Testing AI agent systems effectively requires a layered approach:

LayerWhat to testHow
ConfigurationAgent configs, tool selection, builder setupUnit tests with direct struct construction
Tool executionIndividual tool behavior, input validationUnit tests with mock inputs
Error handlingError classification, retry logic, error propagationUnit tests with constructed errors
DelegationTask tracking, state transitions, cleanupUnit tests with DelegationTracker
Conversation flowHistory management, system promptsIntegration tests with a running LLM
End-to-endFull prompt-to-response pipelineIntegration tests with Ollama or a mock server

Unit testing agent configuration

AgentConfig is a plain data struct that supports serialization. Test it without any async runtime or LLM:

use acton_ai::agent::AgentConfig;

#[test]
fn file_reader_agent_has_correct_tools() {
    let config = AgentConfig::new(
        "You are a file reader assistant.",
    )
    .with_tools(&["read_file", "glob"])
    .with_name("FileReader");

    assert_eq!(config.tools.len(), 2);
    assert!(config.tools.contains(&"read_file".to_string()));
    assert!(config.tools.contains(&"glob".to_string()));
    assert_eq!(config.name, Some("FileReader".to_string()));
}

#[test]
fn power_agent_has_all_builtins() {
    let config = AgentConfig::new("Power user")
        .with_all_builtins();

    // Should have all available builtin tools
    assert!(config.tools.len() >= 9);
    assert!(config.tools.contains(&"bash".to_string()));
    assert!(config.tools.contains(&"read_file".to_string()));
    assert!(config.tools.contains(&"calculate".to_string()));
}

#[test]
fn agent_config_serialization_roundtrip() {
    let config = AgentConfig::new("Test agent")
        .with_name("TestBot")
        .with_tools(&["read_file", "bash"])
        .with_max_conversation_length(50);

    let json = serde_json::to_string(&config).unwrap();
    let deserialized: AgentConfig = serde_json::from_str(&json).unwrap();

    assert_eq!(config, deserialized);
}

#[test]
fn default_agent_has_no_tools() {
    let config = AgentConfig::default();
    assert!(config.tools.is_empty());
    assert_eq!(config.max_conversation_length, 100);
    assert!(config.enable_streaming);
}

Testing agent state transitions

AgentState is a simple enum. Test its transition logic directly:

use acton_ai::agent::AgentState;

#[test]
fn idle_agent_can_accept_prompts() {
    assert!(AgentState::Idle.can_accept_prompt());
    assert!(AgentState::Completed.can_accept_prompt());
}

#[test]
fn active_agents_reject_prompts() {
    assert!(!AgentState::Thinking.can_accept_prompt());
    assert!(!AgentState::Executing.can_accept_prompt());
    assert!(!AgentState::Waiting.can_accept_prompt());
    assert!(!AgentState::Stopping.can_accept_prompt());
}

#[test]
fn only_stopping_is_terminal() {
    assert!(!AgentState::Idle.is_terminal());
    assert!(!AgentState::Completed.is_terminal());
    assert!(AgentState::Stopping.is_terminal());
}

#[test]
fn active_states_are_identified() {
    assert!(AgentState::Thinking.is_active());
    assert!(AgentState::Executing.is_active());
    assert!(AgentState::Waiting.is_active());
    assert!(!AgentState::Idle.is_active());
}

Testing delegation tracking

The DelegationTracker manages task state without any async dependencies, making it straightforward to test:

use acton_ai::agent::delegation::{
    DelegatedTask, DelegatedTaskState, DelegationTracker,
};
use acton_ai::types::{AgentId, TaskId};
use std::time::Duration;

#[test]
fn track_and_complete_outgoing_task() {
    let mut tracker = DelegationTracker::new();
    let task_id = TaskId::new();
    let agent_id = AgentId::new();

    let task = DelegatedTask::new(
        task_id.clone(),
        agent_id,
        "code_review".to_string(),
    );
    tracker.track_outgoing(task);

    assert_eq!(tracker.pending_outgoing_count(), 1);

    // Complete the task
    let task = tracker.get_outgoing_mut(&task_id).unwrap();
    task.complete(serde_json::json!({"approved": true}));

    assert_eq!(tracker.pending_outgoing_count(), 0);
    assert!(task.is_terminal());
}

#[test]
fn track_incoming_task_acceptance() {
    let mut tracker = DelegationTracker::new();
    let task_id = TaskId::new();
    let from_agent = AgentId::new();

    tracker.track_incoming(
        task_id.clone(),
        from_agent,
        "analysis".to_string(),
    );

    assert_eq!(tracker.pending_incoming_count(), 1);
    assert!(tracker.accept_incoming(&task_id));
    assert_eq!(tracker.pending_incoming_count(), 0);
}

#[test]
fn overdue_detection() {
    let task_id = TaskId::new();
    let agent_id = AgentId::new();
    let task = DelegatedTask::new(task_id, agent_id, "test".to_string())
        .with_deadline(Duration::from_millis(1));

    std::thread::sleep(Duration::from_millis(5));
    assert!(task.is_overdue());
}

#[test]
fn cleanup_removes_only_terminal_tasks() {
    let mut tracker = DelegationTracker::new();
    let agent_id = AgentId::new();

    let task1_id = TaskId::new();
    let task2_id = TaskId::new();

    let task1 = DelegatedTask::new(
        task1_id.clone(), agent_id.clone(), "done".to_string()
    );
    let task2 = DelegatedTask::new(
        task2_id.clone(), agent_id, "pending".to_string()
    );

    tracker.track_outgoing(task1);
    tracker.track_outgoing(task2);

    // Complete task1
    tracker.get_outgoing_mut(&task1_id).unwrap()
        .complete(serde_json::json!({}));

    tracker.cleanup_completed();

    assert!(tracker.get_outgoing(&task1_id).is_none());  // Removed
    assert!(tracker.get_outgoing(&task2_id).is_some());   // Still tracked
}

Testing error handling

All error types are Clone + PartialEq, making them easy to construct and assert against:

use acton_ai::error::{
    ActonAIError, ActonAIErrorKind,
    KernelError, KernelErrorKind,
    AgentError, AgentErrorKind,
};
use acton_ai::tools::error::{ToolError, ToolErrorKind};
use acton_ai::llm::error::{LLMError, LLMErrorKind};
use std::time::Duration;

#[test]
fn acton_ai_error_classification() {
    let config_err = ActonAIError::configuration("app_name", "cannot be empty");
    assert!(config_err.is_configuration());
    assert!(!config_err.is_runtime_shutdown());

    let shutdown_err = ActonAIError::runtime_shutdown();
    assert!(shutdown_err.is_runtime_shutdown());
}

#[test]
fn llm_error_retry_classification() {
    // Retriable errors
    assert!(LLMError::network("timeout").is_retriable());
    assert!(LLMError::rate_limited(Duration::from_secs(10)).is_retriable());
    assert!(LLMError::model_overloaded("gpt-4").is_retriable());
    assert!(LLMError::timeout(Duration::from_secs(30)).is_retriable());
    assert!(LLMError::api_error(500, "internal error", None).is_retriable());
    assert!(LLMError::api_error(503, "unavailable", None).is_retriable());

    // Non-retriable errors
    assert!(!LLMError::authentication_failed("bad key").is_retriable());
    assert!(!LLMError::api_error(400, "bad request", None).is_retriable());
    assert!(!LLMError::invalid_request("missing field").is_retriable());
}

#[test]
fn rate_limit_provides_retry_after() {
    let err = LLMError::rate_limited(Duration::from_secs(60));
    assert_eq!(err.retry_after(), Some(Duration::from_secs(60)));

    let other = LLMError::network("timeout");
    assert_eq!(other.retry_after(), None);
}

#[test]
fn tool_error_retry_classification() {
    assert!(ToolError::timeout("slow", Duration::from_secs(30)).is_retriable());
    assert!(ToolError::sandbox_error("transient").is_retriable());
    assert!(!ToolError::not_found("missing").is_retriable());
    assert!(!ToolError::validation_failed("tool", "bad args").is_retriable());
}

#[test]
fn errors_support_equality_comparison() {
    let err1 = KernelError::shutting_down();
    let err2 = KernelError::shutting_down();
    assert_eq!(err1, err2);

    let err3 = KernelError::spawn_failed("reason");
    assert_ne!(err1, err3);
}

#[test]
fn error_display_messages_are_actionable() {
    let err = ToolError::not_found("calculator");
    let msg = err.to_string();
    assert!(msg.contains("calculator"));
    assert!(msg.contains("not found"));
    assert!(msg.contains("verify"));  // Actionable guidance
}

Testing tool execution

Testing tool definitions

Verify tool definitions are correctly structured:

use acton_ai::tools::builtins::get_tool_definition;

#[test]
fn bash_tool_definition_is_valid() {
    let def = get_tool_definition("bash").unwrap();
    assert_eq!(def.name, "bash");
    assert!(!def.description.is_empty());

    // Verify the schema has the expected structure
    let schema = &def.input_schema;
    assert_eq!(schema["type"], "object");
    assert!(schema["properties"]["command"].is_object());
}

#[test]
fn all_builtin_tools_have_valid_definitions() {
    use acton_ai::tools::builtins::BuiltinTools;

    for tool_name in BuiltinTools::available() {
        let def = get_tool_definition(tool_name);
        assert!(def.is_ok(), "Tool '{}' has no definition", tool_name);

        let def = def.unwrap();
        assert!(!def.name.is_empty());
        assert!(!def.description.is_empty());
    }
}

Testing path validation

Path validation is pure logic with no async dependencies:

use acton_ai::tools::security::{PathValidator, PathValidationError};
use std::path::{Path, PathBuf};
use tempfile::TempDir;

#[test]
fn validates_paths_within_allowed_root() {
    let dir = TempDir::new().unwrap();
    let file = dir.path().join("test.txt");
    std::fs::write(&file, "content").unwrap();

    let validator = PathValidator::new()
        .clear_allowed_roots()
        .with_allowed_root(dir.path().to_path_buf());

    assert!(validator.validate(&file).is_ok());
}

#[test]
fn rejects_path_traversal() {
    let validator = PathValidator::new();
    let result = validator.validate(Path::new("/some/../../../etc/passwd"));
    assert!(matches!(
        result,
        Err(PathValidationError::DeniedPattern { pattern, .. }) if pattern == ".."
    ));
}

#[test]
fn rejects_git_directory_access() {
    let dir = TempDir::new().unwrap();
    let git_dir = dir.path().join(".git");
    std::fs::create_dir(&git_dir).unwrap();

    let validator = PathValidator::new()
        .clear_allowed_roots()
        .with_allowed_root(dir.path().to_path_buf());

    let result = validator.validate(&git_dir);
    assert!(matches!(
        result,
        Err(PathValidationError::DeniedPattern { pattern, .. }) if pattern == ".git"
    ));
}

#[test]
fn rejects_paths_outside_allowed_roots() {
    let allowed = TempDir::new().unwrap();
    let outside = TempDir::new().unwrap();
    let file = outside.path().join("secret.txt");
    std::fs::write(&file, "secret").unwrap();

    let validator = PathValidator::new()
        .clear_allowed_roots()
        .with_allowed_root(allowed.path().to_path_buf());

    assert!(matches!(
        validator.validate(&file),
        Err(PathValidationError::OutsideAllowedRoots { .. })
    ));
}

#[test]
fn validates_parent_for_new_files() {
    let dir = TempDir::new().unwrap();
    let new_file = dir.path().join("new_file.txt");

    let validator = PathValidator::new()
        .clear_allowed_roots()
        .with_allowed_root(dir.path().to_path_buf());

    // File doesn't exist, but parent does and is allowed
    assert!(validator.validate_parent(&new_file).is_ok());
}

Using StubSandbox for tests

The StubSandbox and StubSandboxFactory are test-only implementations that do not require a hypervisor. They return placeholder responses instead of executing code.

Test-only

StubSandbox is only available in #[cfg(test)] builds. It does NOT sandbox code and must never be used in production.

#[cfg(test)]
mod sandbox_tests {
    use acton_ai::tools::sandbox::{
        Sandbox, SandboxFactory,
        StubSandbox, StubSandboxFactory,
    };

    #[tokio::test]
    async fn stub_sandbox_returns_placeholder() {
        let sandbox = StubSandbox::new();
        let result = sandbox
            .execute("echo hello", serde_json::json!({}))
            .await;

        assert!(result.is_ok());
        let value = result.unwrap();
        assert_eq!(value["status"], "stub");
    }

    #[tokio::test]
    async fn destroyed_sandbox_rejects_execution() {
        let mut sandbox = StubSandbox::new();
        assert!(sandbox.is_alive());

        sandbox.destroy();
        assert!(!sandbox.is_alive());

        let result = sandbox.execute("code", serde_json::json!({})).await;
        assert!(result.is_err());
    }

    #[tokio::test]
    async fn stub_factory_creates_sandboxes() {
        let factory = StubSandboxFactory::new();
        assert!(factory.is_available());

        let sandbox = factory.create().await.unwrap();
        assert!(sandbox.is_alive());
    }

    #[test]
    fn stub_sandbox_sync_execution() {
        let sandbox = StubSandbox::new();
        let result = sandbox.execute_sync(
            "some code",
            serde_json::json!({"arg": 1}),
        );
        assert!(result.is_ok());
        assert_eq!(result.unwrap()["status"], "stub");
    }
}

Testing sandbox configuration

ProcessSandboxConfig has a validate() method that can be tested without spawning a child process:

use acton_ai::tools::sandbox::{HardeningMode, ProcessSandboxConfig};
use std::time::Duration;

#[test]
fn valid_sandbox_config_passes_validation() {
    let config = ProcessSandboxConfig::new()
        .with_timeout(Duration::from_secs(60))
        .with_memory_limit(Some(128 * 1024 * 1024))
        .with_hardening(HardeningMode::Enforce);

    assert!(config.validate().is_ok());
}

#[test]
fn rejects_zero_timeout() {
    let config = ProcessSandboxConfig::new().with_timeout(Duration::ZERO);
    assert!(config.validate().is_err());
}

#[test]
fn rejects_zero_memory_limit() {
    let config = ProcessSandboxConfig::new().with_memory_limit(Some(0));
    assert!(config.validate().is_err());
}

#[test]
fn unlimited_memory_is_valid() {
    let config = ProcessSandboxConfig::new().with_memory_limit(None);
    assert!(config.validate().is_ok());
}

#[test]
fn rejects_empty_env_allowlist() {
    let config = ProcessSandboxConfig::new().with_env_allowlist(Vec::<String>::new());
    assert!(config.validate().is_err());
}

Integration testing with a running LLM

For end-to-end tests that verify actual LLM interaction, use a local Ollama instance:

use acton_ai::prelude::*;

/// Integration test requiring Ollama running locally.
/// Run with: cargo test -- --ignored
#[tokio::test]
#[ignore = "requires running Ollama instance"]
async fn basic_prompt_returns_response() {
    let runtime = ActonAI::builder()
        .app_name("integration-test")
        .ollama("qwen2.5:7b")
        .launch()
        .await
        .expect("Failed to launch runtime");

    let response = runtime
        .prompt("What is 2 + 2? Answer with just the number.")
        .collect()
        .await
        .expect("Prompt failed");

    assert!(!response.text.is_empty());
    assert!(response.text.contains("4"));
    assert!(response.token_count > 0);

    runtime.shutdown().await.unwrap();
}

#[tokio::test]
#[ignore = "requires running Ollama instance"]
async fn conversation_maintains_context() {
    let runtime = ActonAI::builder()
        .app_name("conv-test")
        .ollama("qwen2.5:7b")
        .launch()
        .await
        .unwrap();

    let conv = runtime.conversation()
        .system("You are a math tutor. Be concise.")
        .build()
        .await;

    let r1 = conv.send("What is 5 + 3?").await.unwrap();
    assert!(!r1.text.is_empty());

    // Verify history is maintained
    assert_eq!(conv.len(), 2); // user + assistant

    let r2 = conv.send("Now multiply that by 2").await.unwrap();
    assert!(!r2.text.is_empty());
    assert_eq!(conv.len(), 4); // 2 user + 2 assistant

    runtime.shutdown().await.unwrap();
}

#[tokio::test]
#[ignore = "requires running Ollama instance"]
async fn tool_execution_works() {
    let runtime = ActonAI::builder()
        .app_name("tool-test")
        .ollama("qwen2.5:7b")
        .with_builtin_tools(&["calculate"])
        .launch()
        .await
        .unwrap();

    let response = runtime
        .prompt("Use the calculate tool to compute 42 * 17")
        .use_builtins()
        .collect()
        .await
        .unwrap();

    // The LLM should have called the calculate tool
    assert!(!response.tool_calls.is_empty());
    assert!(response.tool_calls.iter().any(|tc| tc.name == "calculate"));

    runtime.shutdown().await.unwrap();
}

Running integration tests

Integration tests that require Ollama are marked with #[ignore]. Run them explicitly with:

cargo test -- --ignored

Make sure Ollama is running locally on port 11434 with the qwen2.5:7b model pulled.


Testing conversation history

Test history management without an LLM by verifying the builder and structural properties:

use acton_ai::messages::Message;

#[test]
fn message_construction() {
    let user_msg = Message::user("Hello");
    let assistant_msg = Message::assistant("Hi there!");

    // Messages can be serialized for persistence testing
    let json = serde_json::to_string(&user_msg).unwrap();
    let deserialized: Message = serde_json::from_str(&json).unwrap();
    assert_eq!(user_msg.content, deserialized.content);
}

#[test]
fn chat_config_defaults() {
    use acton_ai::conversation::ChatConfig;

    let config = ChatConfig::new();
    // Check defaults match documentation
    let debug = format!("{:?}", config);
    assert!(debug.contains("You: "));
    assert!(debug.contains("Assistant: "));
}

#[test]
fn chat_config_input_mapper() {
    use acton_ai::conversation::ChatConfig;

    let mut config = ChatConfig::new()
        .map_input(|s| format!("[admin] {}", s));

    // The mapper is stored as Option<Box<dyn FnMut>>
    let debug = format!("{:?}", config);
    assert!(debug.contains("has_input_mapper"));
}

Testing patterns summary

What you are testingAsync?LLM needed?Pattern
AgentConfig constructionNoNoDirect struct creation and assertion
AgentState transitionsNoNoCall methods and assert results
DelegationTrackerNoNoTrack/complete tasks and check counts
Error types and classificationNoNoConstruct errors and test predicates
PathValidatorNoNoCreate temp dirs and validate paths
ProcessSandboxConfigNoNoBuilder methods and validate()
StubSandbox executionYesNo#[tokio::test] with stub
Tool definitionsNoNoget_tool_definition() and schema checks
Prompt executionYesYes#[ignore] with running Ollama
Conversation flowYesYes#[ignore] with running Ollama
Tool invocationYesYes#[ignore] with running Ollama

Next steps

Previous
Error Handling