11 months active inside a Tier-1 European carrier tenant (under NDA). ~2B spans/day. 1.986 hosts. 20TB bizevents bucket recalibrated. Now booking fixed-scope 2-week audits, EUR 9.500, 3x savings guarantee or you pay nothing.
Doctrine refined inside central banking, supranational finance, sovereign treasuries, and carrier-scale aviation. The kind of tenants where every config change is reviewed and every retention day costs.
Every Veristack DPS audit runs through this checklist. Findings above are all instances of one or more of these. We publish it because no Big-4 deliverable will: this is the kind of doctrine that comes from operating tenants at carrier scale, not from a deck.
Detect: fetch dt.system.buckets | sort estimated_uncompressed_bytes desc cross-referenced against query audit log.
Typical impact: 30-75% of that bucket's bill.
Detect: fetch logs | filter loglevel=="DEBUG" | summarize count(), by:{k8s.namespace.name, log.source}
Typical impact: 20-40% of default_logs volume.
Detect: Buckets matching *_pre_*, *_int_*, *_uat_* with retention >7d.
Typical impact: 80% of those bucket lines if reduced to 7d.
Detect: Smartscape host inventory grouped by monitoringMode, cross-checked with active service detection.
Typical impact: 10-35% of host-hour bill.
Detect: fetch logs | filter dt.entity.service == null | summarize count()
Typical impact: Blocks every other policy. Always fix first.
Detect: Inspect OpenPipeline processor configs and dashboard tile queries.
Typical impact: Direct DPS cost on every dashboard query.
Detect: fetch metrics | summarize cardinality() per dimension.
Typical impact: Cost explosion + dashboard slowness.
Detect: Join spans across services on trace.id; near-zero matches reveal propagation gap.
Typical impact: End-to-end debug impossible. MTTR up by orders of magnitude.
All 7 detection queries are validated on a real carrier-scale tenant. Read the DQL Starter Pack →
Unfiltered queries, metric cardinality explosions, and logs in hot storage for 35 days when they're only useful for 48 hours. You're overspending 20-35% without realizing it.
Davis AI fires hundreds of non-actionable alerts. Your on-call team ignores them all. When a real incident hits, it drowns in the noise.
Classic dashboards, RBAC policies, and pre-Grail configurations. Your team knows they need to migrate but can't carve out the engineering time.
No retainer bloat. No discovery calls. No hourly overages. Select the outcome your environment needs.
Full audit through the 8 anti-patterns: bucket retention, DEBUG in prod, host modes, attribution, taxonomy, trace boundary. PDF report, DQL library, remediation backlog. 3x savings guarantee or you pay only the deposit.
Sample-based percentile band analysis on real query telemetry. Identify retention >4-12× what is actually consumed. Concrete recalibration plan per bucket. Typical recovery 30-75% on the targeted line.
OpenPipeline ingest filters, DEBUG drops, taxonomy enforcement. Service attribution gaps closed first (otherwise no other policy works). Typical 20-40% cut on default_logs without dashboard impact.
Up to 20 dashboards. Legacy filters translated to optimized DQL. Heavy tiles refactored to pre-aggregated metrics via OpenPipeline (zero Grail consumption per refresh).
Davis AI sensitivity tuning, profile consolidation, maintenance windows. Routing redesign aligned with ownership, not severity. Concrete reduction target signed before kickoff.
Production-ready OTel Collector architecture. Smartscape topology mapping. Exporter batch tuning (the batch=1 default costs you 40% in bandwidth). Unified OneAgent + OTel dashboards.
4-window multi-burn-rate alerting framework (99.99% availability, p95 latency targets). Tag canonicalization to unblock production-scoped SLOs. Fast burn + slow burn tuned per service.
Historical consumption analysis. Predictive forecasting. Shelfware identification cross-referenced against query audit log. Data-driven battle-cards for the renewal conversation.
For teams that want ongoing senior expertise without hiring an FTE and without the meeting tax. Same async model: written briefs, written replies, no Zoom.
60+ recipes. PDF + private GitHub repo. DQL queries validated on a real carrier-scale tenant: cost attribution, anomaly detection, retention review, DDU waste, hot/cold tier rebalance, OpenPipeline processors. Updates included for 12 months.
One curated drop per quarter: new DQL templates, OpenPipeline patterns from real audits, Davis AI tuning notes, bucket retention case studies. Delivered as PDF + repo update. For SRE teams up to 8 people. Zero meetings.
Slack + email channel with senior response under 24h on business days. One written deep-dive per month on the topic of your choice. Zero Zoom, zero standups. Cancel after 6 months with 30-day notice.
Full playbook license + 4 quarterly Q&A rounds. Submit your team's questions in batch by email or Slack; receive consolidated written answers within 5 business days. No live calls, no scheduling. Replaces "office hours" with something you can actually read later.
We run our leg at maximum speed, hand you the baton, and let you run yours. No calendar Tetris. No forced context-switching.
Pick a fixed-price package. We send a secure onboarding form. You provide a scoped API token. No sales calls required.
You receive a detailed execution plan and milestone tracker via email. All communication is asynchronous, written, and on your schedule.
Our architects work in the background. Twice-weekly status memos keep you informed. Questions arrive via email. You respond when it fits.
Architecture diagrams, DQL scripts, configuration exports, and a written summary of every change made. Permanent documentation, not a meeting recording.
"Jon Ander is a true expert in modern Observability. Beyond his deep technical knowledge of Dynatrace (Grail) and OpenTelemetry, he knows how to turn data into real value. He helped us implement Monitoring as Code and drastically improved our incident response times. If you need a Senior SRE who can take your platform's reliability to the next level, I highly recommend him."
"Jon Ander was key in our digital acceleration project. He works perfectly in remote environments and his English communication is flawless. He delivered high-quality code ahead of deadlines, significantly improving our observability stack. Highly recommended for international squads."
A temporary API token with read-only scope. We never request admin access, and all tokens are revoked at engagement end. For implementations, we request scoped write access to specific configuration APIs only.
We operate under a Zero Trust model. All access is scoped, time-limited, and logged. We never extract raw telemetry data. Audit logs are immutable and shared with you at engagement end. We're happy to sign your NDA before onboarding.
You pay only the 50% deposit. We back the audit with a 3x savings guarantee because the 8 anti-patterns we hunt are present in nearly every enterprise tenant we have audited. If your environment is genuinely clean, you will know in week one and we will tell you.
We deliver fixed-scope packages. No retainers, no recurring fees. If you need follow-up work after delivery, you purchase another package. Most clients start with the DPS Cost Optimization Audit and then select targeted packages based on its findings.
Meetings evaporate when they end. Our deliverables persist: documented configurations, DQL repositories, architecture diagrams, written analysis. Your team can revisit them months later. Plus, your engineers keep their flow state intact.
50% upfront, 50% on delivery acceptance. Or 100% upfront with a 5% discount. We invoice via Stripe. No surprises, no overages, no scope creep charges.
Submit your requirements and we'll initiate the async onboarding protocol within 24 hours.
Have a complex requirement outside our standard catalog? We accept custom async engagements for enterprise Dynatrace environments.
Send a detailed technical brief. We'll respond with a written analysis of our proposed approach and a fixed-price quote.
No "let's jump on a quick call" required.