v0.3.0 live on PyPI — pip install cortexops

Stop shipping agents
you can't trust

Evaluate, observe, and gate LangGraph and CrewAI agents before they reach production. Built by a Senior AI Engineer at PayPal.

View on GitHub

Python

from cortexops import CortexTracer, EvalSuite

tracer = CortexTracer(project="payments-agent")  # key auto-loaded
agent  = tracer.wrap(your_langgraph_app)   # zero refactoring

results = EvalSuite.run(
    dataset="golden_v1.yaml",
    agent=agent,
    fail_on="task_completion < 0.90",  # CI gate
)
print(results.summary())

Live observability

See inside every agent run.
In real time.

Click any trace row to see the node waterfall — exactly which step took how long, which tools were called, and what the output was. Debug a 2am incident in 30 seconds.

CortexOps Observability

project payments-agent

Live · 5s

Task completion

94.2%

↑ 2.1%

Error rate

2.3%

↓ 0.8%

Avg latency

487ms

P95 latency

1,240ms

Total traces

1,847

Case

Latency

Failure

Time

4398c8e8refund_approved342ms—09:24:11

b21fa3c2balance_check198ms—09:24:08

9f3e1d77dispute_escalation3,240msTIMEOUT09:23:55

c84ab910refund_approved287ms—09:23:41

e12cd456kyc_verification743ms—09:23:30

a77bf219fraud_detection1,820msHALLUCI...09:23:18

Health

Success97.7%

Eval gatePassing

Regressions0

Failures

TIMEOUT2

HALLUCI...1

Latency

<200ms312

200–500498

>1s23

Open live dashboard →

How it works

From prototype to
trusted production

Four steps. No refactoring. Works with any LangGraph or CrewAI agent.

Instrument

Wrap any agent in one line. CortexTracer auto-detects LangGraph and CrewAI.

tracer = CortexTracer(project="my-agent") agent = tracer.wrap(your_app)

Define

Write golden datasets in YAML. Expected keywords, tool calls, latency budgets.

expected_output_contains: - refund - approved max_latency_ms: 3000

Gate

One fail_on expression. PRs are blocked when quality drops — automatically.

EvalSuite.run( fail_on="task_completion < 0.90")

Observe

Live traces, Slack alerts, waterfall debug view. Root cause in 30 seconds.

GET /v1/traces ?project=payments → node waterfall

Features

Everything your agent needs
to ship safely

Zero-config instrumentation

Wraps LangGraph, CrewAI, or any callable with one line. No decorators, no config files, no refactoring.

Free

Golden dataset evals

YAML-based test cases with expected outputs, tool calls, and latency budgets. Run locally or in CI.

Free

CI eval gate

Block PRs with a single fail_on expression. Works with GitHub Actions, GitLab CI, any CI system.

Free

Hosted trace storage

90-day retention. Node waterfall, tool calls, latency breakdown. Live dashboard at app.getcortexops.com.

Pro

Slack + webhook alerts

Get paged when production regresses — before your users notice. Configurable thresholds per project.

Pro

LLM-as-judge scoring

GPT-4o evaluates open-ended outputs against your criteria. Heuristic fallback always included.

Pro

vs LangSmith

Know your bill
before you ship

LangSmith charges $39/seat plus $2.50–$5.00 per 1,000 traces. At 50k traces/month that's $164 per seat. CortexOps is $49/seat. Flat.

Capability

CortexOps

LangSmith

Pricing model

$49/seat flat

$39/seat + trace fees

Trace cost

Unlimited — included

$2.50–$5.00 / 1k

CI eval gate

1-line fail_on

Manual setup required

Framework lock-in

None — any agent

Best with LangChain

Payments domain

Built-in templates

Not available

Free local evals

Unlimited

5k traces/month

Open source

MIT licensed

Proprietary

Pricing

Start free.
Scale with your agents.

No credit card required for free tier. Pro starts with a cancel anytime.

Free

Full SDK, unlimited local evals, CI gate. Forever free.

✓ pip install cortexops
✓ Unlimited local eval runs
✓ GitHub Actions CI gate
✓ Golden dataset YAML format
✓ CLI tool

Complete docs at
docs.getcortexops.com

18 pages covering installation, golden datasets, CI gate, LangGraph, CrewAI, API reference, and more. No GitHub redirect.

Quickstart

Install the SDK and run your first eval in under 2 minutes.

# Install
pip install cortexops

from cortexops import CortexTracer, EvalSuite

tracer = CortexTracer(project="my-agent")
agent  = tracer.wrap(my_agent)

results = EvalSuite.run(
    dataset="golden_v1.yaml",
    agent=agent,
    verbose=True,
)

Read full docs →

Stop shipping agentsyou can't trust

See inside every agent run.In real time.

From prototype totrusted production

Everything your agent needsto ship safely

Know your billbefore you ship

Start free.Scale with your agents.

Complete docs atdocs.getcortexops.com

Quickstart

Start your Pro subscription

Redirecting to PayPal…

Stop shipping agents
you can't trust

See inside every agent run.
In real time.

From prototype to
trusted production

Everything your agent needs
to ship safely

Know your bill
before you ship

Start free.
Scale with your agents.

Complete docs at
docs.getcortexops.com