acton-ai - Agentic AI on the actor model

AI agent code presents unique testing challenges: LLM responses are non-deterministic, tool execution has side effects, and multi-agent workflows involve complex async interactions. This guide provides practical patterns for testing each layer of an Acton AI application.

Testing strategies

Testing AI agent systems effectively requires a layered approach:

Layer	What to test	How
Configuration	Agent configs, tool selection, builder setup	Unit tests with direct struct construction
Tool execution	Individual tool behavior, input validation	Unit tests with mock inputs
Error handling	Error classification, retry logic, error propagation	Unit tests with constructed errors
Delegation	Task tracking, state transitions, cleanup	Unit tests with `DelegationTracker`
Conversation flow	History management, system prompts	Integration tests with a running LLM
End-to-end	Full prompt-to-response pipeline	Integration tests with Ollama or a mock server

Unit testing agent configuration

AgentConfig is a plain data struct that supports serialization. Test it without any async runtime or LLM:

use acton_ai::agent::AgentConfig;

#[test]
fn file_reader_agent_has_correct_tools() {
    let config = AgentConfig::new(
        "You are a file reader assistant.",
    )
    .with_tools(&["read_file", "glob"])
    .with_name("FileReader");

    assert_eq!(config.tools.len(), 2);
    assert!(config.tools.contains(&"read_file".to_string()));
    assert!(config.tools.contains(&"glob".to_string()));
    assert_eq!(config.name, Some("FileReader".to_string()));
}

#[test]
fn power_agent_has_all_builtins() {
    let config = AgentConfig::new("Power user")
        .with_all_builtins();

    // Should have all available builtin tools
    assert!(config.tools.len() >= 9);
    assert!(config.tools.contains(&"bash".to_string()));
    assert!(config.tools.contains(&"read_file".to_string()));
    assert!(config.tools.contains(&"calculate".to_string()));
}

#[test]
fn agent_config_serialization_roundtrip() {
    let config = AgentConfig::new("Test agent")
        .with_name("TestBot")
        .with_tools(&["read_file", "bash"])
        .with_max_conversation_length(50);

    let json = serde_json::to_string(&config).unwrap();
    let deserialized: AgentConfig = serde_json::from_str(&json).unwrap();

    assert_eq!(config, deserialized);
}

#[test]
fn default_agent_has_no_tools() {
    let config = AgentConfig::default();
    assert!(config.tools.is_empty());
    assert_eq!(config.max_conversation_length, 100);
    assert!(config.enable_streaming);
}

Testing agent state transitions

AgentState is a simple enum. Test its transition logic directly:

use acton_ai::agent::AgentState;

#[test]
fn idle_agent_can_accept_prompts() {
    assert!(AgentState::Idle.can_accept_prompt());
    assert!(AgentState::Completed.can_accept_prompt());
}

#[test]
fn active_agents_reject_prompts() {
    assert!(!AgentState::Thinking.can_accept_prompt());
    assert!(!AgentState::Executing.can_accept_prompt());
    assert!(!AgentState::Waiting.can_accept_prompt());
    assert!(!AgentState::Stopping.can_accept_prompt());
}

#[test]
fn only_stopping_is_terminal() {
    assert!(!AgentState::Idle.is_terminal());
    assert!(!AgentState::Completed.is_terminal());
    assert!(AgentState::Stopping.is_terminal());
}

#[test]
fn active_states_are_identified() {
    assert!(AgentState::Thinking.is_active());
    assert!(AgentState::Executing.is_active());
    assert!(AgentState::Waiting.is_active());
    assert!(!AgentState::Idle.is_active());
}

Testing delegation tracking

The DelegationTracker manages task state without any async dependencies, making it straightforward to test:

use acton_ai::agent::delegation::{
    DelegatedTask, DelegatedTaskState, DelegationTracker,
};
use acton_ai::types::{AgentId, TaskId};
use std::time::Duration;

#[test]
fn track_and_complete_outgoing_task() {
    let mut tracker = DelegationTracker::new();
    let task_id = TaskId::new();
    let agent_id = AgentId::new();

    let task = DelegatedTask::new(
        task_id.clone(),
        agent_id,
        "code_review".to_string(),
    );
    tracker.track_outgoing(task);

    assert_eq!(tracker.pending_outgoing_count(), 1);

    // Complete the task
    let task = tracker.get_outgoing_mut(&task_id).unwrap();
    task.complete(serde_json::json!({"approved": true}));

    assert_eq!(tracker.pending_outgoing_count(), 0);
    assert!(task.is_terminal());
}

#[test]
fn track_incoming_task_acceptance() {
    let mut tracker = DelegationTracker::new();
    let task_id = TaskId::new();
    let from_agent = AgentId::new();

    tracker.track_incoming(
        task_id.clone(),
        from_agent,
        "analysis".to_string(),
    );

    assert_eq!(tracker.pending_incoming_count(), 1);
    assert!(tracker.accept_incoming(&task_id));
    assert_eq!(tracker.pending_incoming_count(), 0);
}

#[test]
fn overdue_detection() {
    let task_id = TaskId::new();
    let agent_id = AgentId::new();
    let task = DelegatedTask::new(task_id, agent_id, "test".to_string())
        .with_deadline(Duration::from_millis(1));

    std::thread::sleep(Duration::from_millis(5));
    assert!(task.is_overdue());
}

#[test]
fn cleanup_removes_only_terminal_tasks() {
    let mut tracker = DelegationTracker::new();
    let agent_id = AgentId::new();

    let task1_id = TaskId::new();
    let task2_id = TaskId::new();

    let task1 = DelegatedTask::new(
        task1_id.clone(), agent_id.clone(), "done".to_string()
    );
    let task2 = DelegatedTask::new(
        task2_id.clone(), agent_id, "pending".to_string()
    );

    tracker.track_outgoing(task1);
    tracker.track_outgoing(task2);

    // Complete task1
    tracker.get_outgoing_mut(&task1_id).unwrap()
        .complete(serde_json::json!({}));

    tracker.cleanup_completed();

    assert!(tracker.get_outgoing(&task1_id).is_none());  // Removed
    assert!(tracker.get_outgoing(&task2_id).is_some());   // Still tracked
}

Testing error handling

All error types are Clone + PartialEq, making them easy to construct and assert against:

use acton_ai::error::{
    ActonAIError, ActonAIErrorKind,
    KernelError, KernelErrorKind,
    AgentError, AgentErrorKind,
};
use acton_ai::tools::error::{ToolError, ToolErrorKind};
use acton_ai::llm::error::{LLMError, LLMErrorKind};
use std::time::Duration;

#[test]
fn acton_ai_error_classification() {
    let config_err = ActonAIError::configuration("app_name", "cannot be empty");
    assert!(config_err.is_configuration());
    assert!(!config_err.is_runtime_shutdown());

    let shutdown_err = ActonAIError::runtime_shutdown();
    assert!(shutdown_err.is_runtime_shutdown());
}

#[test]
fn llm_error_retry_classification() {
    // Retriable errors
    assert!(LLMError::network("timeout").is_retriable());
    assert!(LLMError::rate_limited(Duration::from_secs(10)).is_retriable());
    assert!(LLMError::model_overloaded("gpt-4").is_retriable());
    assert!(LLMError::timeout(Duration::from_secs(30)).is_retriable());
    assert!(LLMError::api_error(500, "internal error", None).is_retriable());
    assert!(LLMError::api_error(503, "unavailable", None).is_retriable());

    // Non-retriable errors
    assert!(!LLMError::authentication_failed("bad key").is_retriable());
    assert!(!LLMError::api_error(400, "bad request", None).is_retriable());
    assert!(!LLMError::invalid_request("missing field").is_retriable());
}

#[test]
fn rate_limit_provides_retry_after() {
    let err = LLMError::rate_limited(Duration::from_secs(60));
    assert_eq!(err.retry_after(), Some(Duration::from_secs(60)));

    let other = LLMError::network("timeout");
    assert_eq!(other.retry_after(), None);
}

#[test]
fn tool_error_retry_classification() {
    assert!(ToolError::timeout("slow", Duration::from_secs(30)).is_retriable());
    assert!(ToolError::sandbox_error("transient").is_retriable());
    assert!(!ToolError::not_found("missing").is_retriable());
    assert!(!ToolError::validation_failed("tool", "bad args").is_retriable());
}

#[test]
fn errors_support_equality_comparison() {
    let err1 = KernelError::shutting_down();
    let err2 = KernelError::shutting_down();
    assert_eq!(err1, err2);

    let err3 = KernelError::spawn_failed("reason");
    assert_ne!(err1, err3);
}

#[test]
fn error_display_messages_are_actionable() {
    let err = ToolError::not_found("calculator");
    let msg = err.to_string();
    assert!(msg.contains("calculator"));
    assert!(msg.contains("not found"));
    assert!(msg.contains("verify"));  // Actionable guidance
}

Testing tool execution

Testing tool definitions

Verify tool definitions are correctly structured:

use acton_ai::tools::builtins::get_tool_definition;

#[test]
fn bash_tool_definition_is_valid() {
    let def = get_tool_definition("bash").unwrap();
    assert_eq!(def.name, "bash");
    assert!(!def.description.is_empty());

    // Verify the schema has the expected structure
    let schema = &def.input_schema;
    assert_eq!(schema["type"], "object");
    assert!(schema["properties"]["command"].is_object());
}

#[test]
fn all_builtin_tools_have_valid_definitions() {
    use acton_ai::tools::builtins::BuiltinTools;

    for tool_name in BuiltinTools::available() {
        let def = get_tool_definition(tool_name);
        assert!(def.is_ok(), "Tool '{}' has no definition", tool_name);

        let def = def.unwrap();
        assert!(!def.name.is_empty());
        assert!(!def.description.is_empty());
    }
}

Testing path validation

Path validation is pure logic with no async dependencies:

use acton_ai::tools::security::{PathValidator, PathValidationError};
use std::path::{Path, PathBuf};
use tempfile::TempDir;

#[test]
fn validates_paths_within_allowed_root() {
    let dir = TempDir::new().unwrap();
    let file = dir.path().join("test.txt");
    std::fs::write(&file, "content").unwrap();

    let validator = PathValidator::new()
        .clear_allowed_roots()
        .with_allowed_root(dir.path().to_path_buf());

    assert!(validator.validate(&file).is_ok());
}

#[test]
fn rejects_path_traversal() {
    let validator = PathValidator::new();
    let result = validator.validate(Path::new("/some/../../../etc/passwd"));
    assert!(matches!(
        result,
        Err(PathValidationError::DeniedPattern { pattern, .. }) if pattern == ".."
    ));
}

#[test]
fn rejects_git_directory_access() {
    let dir = TempDir::new().unwrap();
    let git_dir = dir.path().join(".git");
    std::fs::create_dir(&git_dir).unwrap();

    let validator = PathValidator::new()
        .clear_allowed_roots()
        .with_allowed_root(dir.path().to_path_buf());

    let result = validator.validate(&git_dir);
    assert!(matches!(
        result,
        Err(PathValidationError::DeniedPattern { pattern, .. }) if pattern == ".git"
    ));
}

#[test]
fn rejects_paths_outside_allowed_roots() {
    let allowed = TempDir::new().unwrap();
    let outside = TempDir::new().unwrap();
    let file = outside.path().join("secret.txt");
    std::fs::write(&file, "secret").unwrap();

    let validator = PathValidator::new()
        .clear_allowed_roots()
        .with_allowed_root(allowed.path().to_path_buf());

    assert!(matches!(
        validator.validate(&file),
        Err(PathValidationError::OutsideAllowedRoots { .. })
    ));
}

#[test]
fn validates_parent_for_new_files() {
    let dir = TempDir::new().unwrap();
    let new_file = dir.path().join("new_file.txt");

    let validator = PathValidator::new()
        .clear_allowed_roots()
        .with_allowed_root(dir.path().to_path_buf());

    // File doesn't exist, but parent does and is allowed
    assert!(validator.validate_parent(&new_file).is_ok());
}

Using StubSandbox for tests

The StubSandbox and StubSandboxFactory are test-only implementations that do not require a hypervisor. They return placeholder responses instead of executing code.

Test-only

StubSandbox is only available in #[cfg(test)] builds. It does NOT sandbox code and must never be used in production.

#[cfg(test)]
mod sandbox_tests {
    use acton_ai::tools::sandbox::{
        Sandbox, SandboxFactory,
        StubSandbox, StubSandboxFactory,
    };

    #[tokio::test]
    async fn stub_sandbox_returns_placeholder() {
        let sandbox = StubSandbox::new();
        let result = sandbox
            .execute("echo hello", serde_json::json!({}))
            .await;

        assert!(result.is_ok());
        let value = result.unwrap();
        assert_eq!(value["status"], "stub");
    }

    #[tokio::test]
    async fn destroyed_sandbox_rejects_execution() {
        let mut sandbox = StubSandbox::new();
        assert!(sandbox.is_alive());

        sandbox.destroy();
        assert!(!sandbox.is_alive());

        let result = sandbox.execute("code", serde_json::json!({})).await;
        assert!(result.is_err());
    }

    #[tokio::test]
    async fn stub_factory_creates_sandboxes() {
        let factory = StubSandboxFactory::new();
        assert!(factory.is_available());

        let sandbox = factory.create().await.unwrap();
        assert!(sandbox.is_alive());
    }

    #[test]
    fn stub_sandbox_sync_execution() {
        let sandbox = StubSandbox::new();
        let result = sandbox.execute_sync(
            "some code",
            serde_json::json!({"arg": 1}),
        );
        assert!(result.is_ok());
        assert_eq!(result.unwrap()["status"], "stub");
    }
}

Testing sandbox configuration

ProcessSandboxConfig has a validate() method that can be tested without spawning a child process:

use acton_ai::tools::sandbox::{HardeningMode, ProcessSandboxConfig};
use std::time::Duration;

#[test]
fn valid_sandbox_config_passes_validation() {
    let config = ProcessSandboxConfig::new()
        .with_timeout(Duration::from_secs(60))
        .with_memory_limit(Some(128 * 1024 * 1024))
        .with_hardening(HardeningMode::Enforce);

    assert!(config.validate().is_ok());
}

#[test]
fn rejects_zero_timeout() {
    let config = ProcessSandboxConfig::new().with_timeout(Duration::ZERO);
    assert!(config.validate().is_err());
}

#[test]
fn rejects_zero_memory_limit() {
    let config = ProcessSandboxConfig::new().with_memory_limit(Some(0));
    assert!(config.validate().is_err());
}

#[test]
fn unlimited_memory_is_valid() {
    let config = ProcessSandboxConfig::new().with_memory_limit(None);
    assert!(config.validate().is_ok());
}

#[test]
fn rejects_empty_env_allowlist() {
    let config = ProcessSandboxConfig::new().with_env_allowlist(Vec::<String>::new());
    assert!(config.validate().is_err());
}

Integration testing with a running LLM

For end-to-end tests that verify actual LLM interaction, use a local Ollama instance:

use acton_ai::prelude::*;

/// Integration test requiring Ollama running locally.
/// Run with: cargo test -- --ignored
#[tokio::test]
#[ignore = "requires running Ollama instance"]
async fn basic_prompt_returns_response() {
    let runtime = ActonAI::builder()
        .app_name("integration-test")
        .ollama("qwen2.5:7b")
        .launch()
        .await
        .expect("Failed to launch runtime");

    let response = runtime
        .prompt("What is 2 + 2? Answer with just the number.")
        .collect()
        .await
        .expect("Prompt failed");

    assert!(!response.text.is_empty());
    assert!(response.text.contains("4"));
    assert!(response.token_count > 0);

    runtime.shutdown().await.unwrap();
}

#[tokio::test]
#[ignore = "requires running Ollama instance"]
async fn conversation_maintains_context() {
    let runtime = ActonAI::builder()
        .app_name("conv-test")
        .ollama("qwen2.5:7b")
        .launch()
        .await
        .unwrap();

    let conv = runtime.conversation()
        .system("You are a math tutor. Be concise.")
        .build()
        .await;

    let r1 = conv.send("What is 5 + 3?").await.unwrap();
    assert!(!r1.text.is_empty());

    // Verify history is maintained
    assert_eq!(conv.len(), 2); // user + assistant

    let r2 = conv.send("Now multiply that by 2").await.unwrap();
    assert!(!r2.text.is_empty());
    assert_eq!(conv.len(), 4); // 2 user + 2 assistant

    runtime.shutdown().await.unwrap();
}

#[tokio::test]
#[ignore = "requires running Ollama instance"]
async fn tool_execution_works() {
    let runtime = ActonAI::builder()
        .app_name("tool-test")
        .ollama("qwen2.5:7b")
        .with_builtin_tools(&["calculate"])
        .launch()
        .await
        .unwrap();

    let response = runtime
        .prompt("Use the calculate tool to compute 42 * 17")
        .use_builtins()
        .collect()
        .await
        .unwrap();

    // The LLM should have called the calculate tool
    assert!(!response.tool_calls.is_empty());
    assert!(response.tool_calls.iter().any(|tc| tc.name == "calculate"));

    runtime.shutdown().await.unwrap();
}

Running integration tests

Integration tests that require Ollama are marked with #[ignore]. Run them explicitly with:

cargo test -- --ignored

Make sure Ollama is running locally on port 11434 with the qwen2.5:7b model pulled.

Testing conversation history

Test history management without an LLM by verifying the builder and structural properties:

use acton_ai::messages::Message;

#[test]
fn message_construction() {
    let user_msg = Message::user("Hello");
    let assistant_msg = Message::assistant("Hi there!");

    // Messages can be serialized for persistence testing
    let json = serde_json::to_string(&user_msg).unwrap();
    let deserialized: Message = serde_json::from_str(&json).unwrap();
    assert_eq!(user_msg.content, deserialized.content);
}

#[test]
fn chat_config_defaults() {
    use acton_ai::conversation::ChatConfig;

    let config = ChatConfig::new();
    // Check defaults match documentation
    let debug = format!("{:?}", config);
    assert!(debug.contains("You: "));
    assert!(debug.contains("Assistant: "));
}

#[test]
fn chat_config_input_mapper() {
    use acton_ai::conversation::ChatConfig;

    let mut config = ChatConfig::new()
        .map_input(|s| format!("[admin] {}", s));

    // The mapper is stored as Option<Box<dyn FnMut>>
    let debug = format!("{:?}", config);
    assert!(debug.contains("has_input_mapper"));
}

Testing patterns summary

What you are testing	Async?	LLM needed?	Pattern
`AgentConfig` construction	No	No	Direct struct creation and assertion
`AgentState` transitions	No	No	Call methods and assert results
`DelegationTracker`	No	No	Track/complete tasks and check counts
Error types and classification	No	No	Construct errors and test predicates
`PathValidator`	No	No	Create temp dirs and validate paths
`ProcessSandboxConfig`	No	No	Builder methods and `validate()`
`StubSandbox` execution	Yes	No	`#[tokio::test]` with stub
Tool definitions	No	No	`get_tool_definition()` and schema checks
Prompt execution	Yes	Yes	`#[ignore]` with running Ollama
Conversation flow	Yes	Yes	`#[ignore]` with running Ollama
Tool invocation	Yes	Yes	`#[ignore]` with running Ollama

Next steps

Error Handling -- understand all error types for test assertions
Secure Tool Execution -- learn about StubSandbox vs ProcessSandbox
Multi-Agent Collaboration -- test delegation and agent coordination