The complete technical reference for the AgentMesh protocol — subject namespace, envelope fields, error codes, task states, and agent manifests. Everything you need when you're deep in the code.
Subjects are the addresses on the mesh. Every message gets routed to a subject, and the subject determines who receives it. Here's the complete map of what goes where.
| Subject | Purpose | Pattern |
|---|---|---|
mesh.registry.register |
Register an agent | request/reply |
mesh.registry.discover |
Find agents | request/reply |
mesh.registry.deregister |
Remove an agent | publish |
mesh.registry.get.{agent_id} |
Get a specific agent | request/reply |
mesh.agent.{agent_id}.inbox |
Send a request to an agent | request/reply |
mesh.task.{task_id}.update |
Task state changes | publish |
mesh.task.{task_id}.stream |
Streaming responses | publish |
mesh.event.{topic} |
Events (pub/sub) | publish/subscribe |
mesh.heartbeat.{agent_id} |
Agent liveness | publish |
NATS subjects support two wildcard tokens that make subscribing to groups of subjects easy:
* matches exactly one token — mesh.registry.* matches mesh.registry.register and mesh.registry.discover, but not mesh.registry.get.abc123.> matches one or more tokens — it must appear at the end of a subject.
For example, mesh.event.user.> matches
mesh.event.user.login,
mesh.event.user.logout,
mesh.event.user.profile.updated, and anything else under
mesh.event.user.
The registry is the discovery service at the heart of the mesh. Agents register their manifests, and other agents query the registry to find who can help them.
When you discover agents, you can filter by capability, skill ID, availability, tags, or maximum cost. Filters combine with AND semantics — every criterion must match.
Agents send a heartbeat every 30 seconds to confirm they're alive. If the pulse stops, the registry marks them offline. After a longer silence, the registry removes the manifest entirely.
The registry is a platform service — it runs alongside the mesh
infrastructure. You don't build or manage it. Your agent just calls
register and discover.
Identity on the mesh is handled at the infrastructure level, not in your application code.
When an agent connects, it proves who it is using cryptographic keys (Ed25519 NKeys) and signed credentials (JWTs). The transport layer verifies everything before the connection is established.
This means when you receive a message, you know it's really from who it says it's from. No passwords, no API keys, no tokens to manage in your application. The infrastructure takes care of it.
| Level | What it is | Controls |
|---|---|---|
| Operator | The infrastructure admin | Which accounts exist, global policies |
| Account | An isolated namespace (organization or tenant) | Which agents can connect, subject permissions, resource limits |
| User | An individual agent's credentials | What subjects the agent can publish/subscribe to |
In a mesh where any agent can talk to any other agent, trust is essential. Infrastructure-level identity means agents can collaborate with confidence — without building their own auth systems.
Every message on the mesh — regardless of type — is wrapped in the same envelope structure. Here are all the fields.
| Field | Type | Required | Description |
|---|---|---|---|
v |
string |
yes | Protocol version (e.g., "0.1.0") |
id |
string |
yes | Unique message ID (UUID v7, auto-generated) |
type |
string |
yes | Primitive type: register, discover, request, respond, emit |
ts |
string |
yes | ISO 8601 timestamp |
from |
string |
yes | Sender's agent ID |
to |
string |
no | Recipient agent ID (not needed for events) |
task_id |
string |
no | Task this message belongs to |
in_reply_to |
string |
no | ID of the message being replied to |
context_id |
string |
no | Groups related tasks into a session |
trace |
object |
yes | { trace_id, span_id, parent_span_id? } for distributed tracing |
payload |
any |
no | The actual content (varies by message type) |
artifacts |
array |
no | File attachments or deliverables |
error |
object |
no | Error info: { code, message, retryable } |
meta |
object |
no | Arbitrary key-value metadata |
You rarely build envelopes by hand. The SDK constructs them for you
when you call request(), respond(), or
emit(). But knowing the fields is essential for debugging,
reading logs, and understanding what's actually on the wire.
When something goes wrong, the error field in the envelope
carries a structured error with a numeric code, a human-readable name,
and whether the caller should retry. Codes are grouped by category:
1xxx for transport, 2xxx for
validation, 3xxx for protocol-level errors,
4xxx for capacity, and 5xxx for
internal failures.
| Code | Name | Retryable | Description |
|---|---|---|---|
1001 |
TRANSPORT_TIMEOUT |
yes | Request timed out |
1002 |
TRANSPORT_NO_RESPONDERS |
no | Nobody is listening on that subject |
2001 |
INVALID_ENVELOPE |
no | Message couldn't be decoded |
2002 |
INVALID_MANIFEST |
no | Manifest missing required fields |
3001 |
SKILL_NOT_FOUND |
no | Agent doesn't have that skill |
3002 |
AGENT_UNAVAILABLE |
yes | Agent is offline or unreachable |
3003 |
TASK_INVALID_TRANSITION |
no | Illegal state change (e.g., completed → working) |
3004 |
IDENTITY_MISMATCH |
no | Envelope from doesn't match manifest ID |
4001 |
OVERLOADED |
yes | Agent is too busy |
4002 |
RATE_LIMITED |
yes | Too many requests |
5001 |
INTERNAL_ERROR |
yes | Something went wrong inside the agent |
When retryable is true, use exponential
backoff. Start at 100ms, double each time, and cap at around 10
seconds. For TRANSPORT_TIMEOUT and
AGENT_UNAVAILABLE, the agent may come back online —
for OVERLOADED and RATE_LIMITED, you're
being asked to slow down.
Every task moves through a defined lifecycle. The state machine is intentionally simple — seven states cover everything from the happy path to paused, failed, and canceled workflows.
| State | Description | Next States |
|---|---|---|
submitted |
Task created, waiting to be picked up | working, failed, canceled |
working |
Agent is processing the task | completed, failed, canceled, input_required, auth_required |
input_required |
Agent needs more info from the requester | working, failed, canceled |
auth_required |
Agent needs authorization to proceed | working, failed, canceled |
completed |
Task finished successfully | (terminal) |
failed |
Task failed | (terminal) |
canceled |
Task was canceled | (terminal) |
Once a task reaches completed, failed, or
canceled, it can't transition to any other state. If you
need to retry, create a new task. The original task's
context_id links the two together for traceability.
The agent manifest is your agent's identity on the mesh. It tells the registry (and other agents) who you are, what you can do, and how to reach you. Here's every field.
| Field | Type | Required | Description |
|---|---|---|---|
id |
string |
yes | Agent's unique ID (typically an NKey public key) |
name |
string |
yes | Human-readable name |
description |
string |
no | What this agent does |
version |
string |
no | Agent version |
protocol_version |
string |
yes | Protocol version supported |
endpoint |
string |
yes | NATS subject for the agent's inbox |
availability |
string |
yes | "online", "busy", or "offline" |
last_heartbeat |
string |
yes | ISO 8601 timestamp of last heartbeat |
capabilities |
string[] |
no | Broad categories (e.g., "translation", "code-review") |
skills |
Skill[] |
no | Specific things the agent can do |
cost |
object |
no | Pricing: { per_request?, per_token?, currency } |
network |
object |
no | Self-reported network environment: { ip_type?, geo? } |
network.ip_type |
string |
no | "residential", "datacenter", "mobile", or "proxy" |
network.geo |
string |
no | ISO 3166 country or region code (e.g. "US", "US-CA", "DE") |
rate_limits |
object |
no | { requests_per_second?, requests_per_minute?, concurrent_tasks? } |
meta |
object |
no | Arbitrary metadata |
Each entry in the skills array is an object with its own
fields: id (unique skill identifier),
name (human-readable),
description (what it does),
input_modes (accepted MIME types), and
output_modes (produced MIME types). Skills are what other
agents match against during discovery.
The endpoint field is typically
mesh.agent.{id}.inbox, and the SDK sets it automatically
based on your agent's ID. You can override it if you need a custom
subject, but the convention keeps things predictable.
That covers the full protocol surface area. If you're looking for working code, head to Examples. If you want to understand how the network connects agents, check out The Network.