← Back to articles

VCF Operations 9.0 — The Control Plane of VMware Cloud Foundation

10 Jun 2026 • VMware Cloud Foundation • 12 min read

If you have designed VMware estates before, you probably know vRealize / Aria Operations as the place you went to look at dashboards. In VMware Cloud Foundation 9.0 that tool has been re-architected into something far bigger: VCF Operations is now the control plane of the entire private cloud. It does not just watch the fleet — it builds, licenses, secures, and lifecycle-manages it.

This article is written for VCF architects: what VCF Operations actually is in 9.0, how it is built, the deployment models you must choose between, and the design decisions that matter.

Think of VCF Operations as the cockpit of an aircraft. Older tools were like a single altimeter on the dashboard — useful, but isolated. The 9.0 cockpit unifies every instrument (metrics, logs, flows, capacity, cost, security) and the controls (licensing, identity, certificates, upgrades) into one seat.


From Monitoring Tool to Converged Console

Broadcom describes VCF Operations (formerly VMware Aria Operations) as the platform that helps you "build, manage, operate and secure your private cloud infrastructure by deploying and maintaining its fleet-level components." The key shift for architects is convergence: metrics, logs, and network flows now live in a single console, removing the "swivel-chair" effect of jumping between a metrics tool and a separate logging tool.

In VCF 9.0 there is one VCF Operations instance per VCF Fleet. It governs every VCF Instance, workload domain, vCenter, and NSX Manager in that fleet — this is what makes "fleet-wide" operations possible.


The Functional Areas of VCF Operations

The official overview groups VCF Operations into several functional areas. As an architect, this is the menu of capabilities you can design around:

Functional Area What it delivers
Fleet Management Operational consistency and centralized management of VCF components at scale
Operations Management Performance, cost and capacity monitoring with faster troubleshooting
Workload Operations Keeps critical apps healthy; native integration with vSphere Supervisor
Performance Monitoring AI-driven troubleshooting and remediation, app metrics from VMs
FinOps & Capacity Cost analysis, capacity forecasting and utilization optimization
Workload Mobility VCF Operations HCX — migrate and interconnect workloads across the private cloud
Security Management Security operations, event auditing and compliance checks

Architecture & Appliances

VCF Operations is not a single VM. It is a family of appliances deployed (and lifecycle-managed) as part of the fleet:

  • VCF Operations (analytics) node — the core engine that ingests metrics, runs analytics, dashboards and alerting. Can scale from a single node to a cluster.
  • VCF Operations fleet management appliance — the management plane for licensing, identity, certificates, passwords and lifecycle.
  • VCF Operations Collector — collects data from targets and forwards it to the analytics cluster; deployed in groups for scale and for remote sites.
  • VCF Operations for Logs — integrated log management (Day-2 deployment) with content packs, log-based alerts and dashboards.
  • VCF Operations for Networks — network and flow visibility (Day-2 deployment).
+--------------------------------------------------------+
|                   VCF Operations                        |
|  +----------------+  +----------------+  +-----------+  |
|  | Analytics node |  | Fleet mgmt     |  | Collector |  |
|  | (metrics/UI)   |  | (license/cert/ |  | group     |  |
|  |                |  |  identity/LCM) |  |           |  |
|  +----------------+  +----------------+  +-----------+  |
|     + (Day-2) VCF Operations for Logs / for Networks    |
+--------------------------------------------------------+
            |  collects from / acts on
            v
   vCenter • ESX • vSAN • NSX • VCF Automation (the Fleet)

Deployment Models: Simple vs HA vs Continuous Availability

This is the most important design decision for an architect. VCF Operations supports three models, trading footprint against resilience.

1. Simple (single node)

  • Smallest footprint; can scale-up and scale-out later
  • One analytics node plus fleet management and collector
  • Relies on vSphere HA to restart the node after a host failure
  • Trade-off: slower recovery and a possible interruption of monitoring, alerting and fleet management during a failure

2. High Availability (3-node cluster)

  • A three-node analytics cluster: primary, replica, and data node
  • Data is stored in pairs across fault domains (a primary copy and a replica copy), so the loss of a single node does not lose data
  • Rapid recovery from a single-node failure; optional external load balancer (requires an extra IP/FQDN and a SAN certificate update)
  • Recommended for most production fleets

3. Continuous Availability (across two zones)

  • Nodes are paired across two fault domains / availability zones, with a witness node in a third
  • Survives the loss of an entire availability zone with no service interruption
  • Requires an equal node count in both fault domains and a network latency of ≤ 10 ms between zones
  • Involves manual operations after the initial install — plan for it

Continuous Availability is not "HA with extra nodes." It assumes a stretched, two-zone topology with a witness and a strict latency budget. Validate the 10 ms inter-zone latency before committing to this model in a design.

Choosing a model

Model Nodes Best for
Simple 1 Labs, small estates, where short monitoring gaps are acceptable
High Availability 3 (primary + replica + data) Most production fleets in a single site
Continuous Availability Paired across 2 zones + witness Mission-critical, multi-AZ designs needing zero-interruption operations

Scaling: Collectors and Remote Sites

VCF Operations scales in two dimensions — the analytics platform and the collectors. Collector groups offload data ingestion and let you reach remote sites without stretching the analytics cluster. For the network side, VCF Operations for Networks must move from a single node to a clustered deployment once you exceed roughly 10,000 VMs or 4 million active flows, and clustering requires large-size appliances.

Size the analytics cluster for steady-state metrics, but size collectors for where the data is produced. Remote or edge locations are a collector-placement problem, not a reason to stretch the cluster.


Fleet Management: The Day-2 Superpower

Fleet management is what turns VCF Operations from an observability tool into a control plane. It consolidates the administrative tasks that used to be scattered across many products:

Capability What it means for the architect
Licensing VCF Operations is the License Manager — a single license file per instance, tracking cores, vSAN TiB and advanced services (e.g. Private AI Foundation with NVIDIA)
Lifecycle Download bundles, run prechecks, and upgrade management and workload components (vCenter, ESX, VCF Automation) from one place
Identity & SSO Single Sign-On across the fleet via the VCF Identity Broker, with federation (AD FS, Entra ID, Okta, Ping, OAuth 2.0)
Certificates Unified, non-disruptive TLS management with auto-renewal across multiple Certificate Authorities and external import
Passwords Alerting on expiring admin/root credentials and centralized rotation ("break-glass" passwords)
Configuration & Tags Scheduled drift detection (with Git integration) and unified tag management across components

Observability & Extensibility

Beyond the built-in metrics, VCF Operations gives architects several levers:

  • Converged data — metrics, logs and flows in one console; vSAN health/IOPS/latency, NSX network state, and audit events all surface in the same UI.
  • Diagnostic Findings — real-time visibility into active findings such as VMSA/CVE exposure and security vulnerabilities across the fleet.
  • Management Packs & Marketplace — find, download and install integrations (the Solutions Catalog) to extend monitoring to third-party systems; a Management Pack Builder exists for custom packs.
  • VCF Operations for Logs — content packs bundle pre-built dashboards, alerts and saved queries; you can create log-based alerts and explore logs directly inside VCF Operations.

FinOps, Capacity & Compliance

Two areas matter especially for design conversations with the business:

  • FinOps & Capacity — cost analysis and predictive capacity forecasting let you right-size before you run out of headroom, and justify (or defer) hardware spend with data.
  • Security & Compliance — built-in compliance checks and host hardening rules, with the option of automated compliance remediation through orchestrator workflows.

Design Best Practices for VCF Architects

  • Default to the HA model for production; reserve Simple for labs and reserve Continuous Availability for true multi-AZ requirements.
  • Treat VCF Operations as a Tier-0 service — if it is down you lose fleet licensing, identity and lifecycle, not just dashboards. Protect it accordingly.
  • Plan collector placement early, especially for remote/edge sites and large flow volumes.
  • Design identity first — the Identity Broker and SSO underpin access to the whole fleet; integrate your IdP up front.
  • Use the single license file model to your advantage for consumption tracking and reporting.
  • Validate inter-zone latency (≤ 10 ms) before proposing Continuous Availability.

What's New in 9.1 (Worth Knowing)

If you are designing for the future, VCF 9.1 pushes VCF Operations further:

  • Automated licensing — in connected mode, license files download automatically every 24 hours, removing the manual re-acknowledgement that 9.0 required (every 180 days or less).
  • Bulk certificate operations — execute imports and renewals across all components at once.
  • Greater scale — a single instance can manage up to 5,000 ESX hosts (2×), with up to 256 clusters upgraded in parallel (4×).

Summary

  • VCF Operations is the control plane of VCF 9.0 — one instance per fleet.
  • It converges metrics, logs and flows and adds fleet management (licensing, identity, certificates, passwords, lifecycle).
  • Architects choose between Simple, High Availability (3 nodes), and Continuous Availability (two zones + witness).
  • Scale the analytics cluster and the collectors independently.
  • Because it owns licensing, identity and lifecycle, treat it as a Tier-0 service in every design.

Sources: VMware Cloud Foundation 9.0 / 9.1 official documentation (techdocs.broadcom.com) and the VMware Cloud Foundation blog (blogs.vmware.com/cloud-foundation).