Yazamco Private Cloud Design

Yazamco Private Cloud — Network & Storage Design (N+1 with 4 HV / 3 Storage)

Proxmox VE platform with per-tenant VLANs (VRF tiers), ZFS-based NVMe storage (DRBD RF=2 hot tier), immutable backups (optional), and dual-ISP edge with HA firewalls. Four hypervisors (3 active + 1 HA), three storage nodes (2 active + 1 HA). N+1 at each failure domain.

Network Diagram (High-Level)

%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'Inter, sans-serif', 'fontSize': '14px'}}}%% flowchart TB %% === STYLES === classDef internet fill:#166534,stroke:#22c55e,stroke-width:2px,color:#fff,border-radius:12px classDef edge fill:#991b1b,stroke:#f87171,stroke-width:3px,color:#fff,border-radius:16px,font-weight:bold classDef core fill:#5b21b6,stroke:#c4b5fd,stroke-width:2px,color:#fff,border-radius:12px classDef oob fill:#1f2937,stroke:#6b7280,stroke-width:1.5px,color:#e5e7eb,border-radius:10px classDef network fill:#1e3a8a,stroke:#93c5fd,stroke-width:2px,color:#fff,border-radius:10px classDef compute fill:#1e40af,stroke:#60a5fa,stroke-width:2px,color:#fff,border-radius:12px classDef storage fill:#1e293b,stroke:#475569,stroke-width:2px,color:#e2e8f0,border-radius:12px %% === INTERNET === subgraph Internet["Dual ISPs (2×1 Gbps)"] ISP1[(ISP #1\neBGP)] ISP2[(ISP #2\neBGP)] end class ISP1,ISP2 internet %% === EDGE SECURITY === subgraph Edge["NGFW HA + WAF/DDoS"] direction LR FW_A["Firewall A\n**Active**"] FW_B["Firewall B\n**Standby**"] WAF["Fortinet WAF\nPer-Tenant"] DDoS["ISP DDoS Scrub"] end class FW_A,FW_B,WAF,DDoS edge ISP1 -->|"eBGP / Static"| FW_A ISP2 -->|"eBGP / Static"| FW_B FW_A <-->|"HA Sync\n< 3s Failover"| FW_B FW_A --- WAF ISP1 --- DDoS ISP2 --- DDoS %% === CORE / ToR === subgraph Core["Collapsed Core / ToR\nMLAG + LACP"] direction LR ToR_A["ToR/Core A\n25G Ports"] ToR_B["ToR/Core B\n25G Ports"] ToR_A <== "2×100G ISL\nMLAG" ==> ToR_B end class ToR_A,ToR_B core FW_A -->|"VLAN Trunks"| ToR_A FW_B -->|"VLAN Trunks"| ToR_B %% === OOB === subgraph OOB["Out-of-Band Management"] OOB_SW["1G OOB Switch"] LTE["LTE Failover"] BASTION["Bastion Host\nMFA + JIT"] end class OOB_SW,LTE,BASTION oob OOB_SW --- LTE OOB_SW --- BASTION %% === VRFs / VLANs === subgraph Networks["VRFs & VLANs"] VRF_MGMT[["MGMT\n10.10.0.0/16\nVLAN 10"]] VRF_STORE[["STORAGE\n10.20.0.0/16\nVLAN 30\nMTU 9000"]] VRF_CLUSTER[["CLUSTER\n10.30.0.0/16\nVLAN 40"]] VRF_SERV[["SERVICES\n10.40.0.0/16\nVLAN 50"]] VRF_SILVER[["TENANT-SILVER\nVLAN 1000–1199"]] VRF_GOLD[["TENANT-GOLD\nVLAN 2000–2199"]] end class VRF_MGMT,VRF_STORE,VRF_CLUSTER,VRF_SERV,VRF_SILVER,VRF_GOLD network ToR_A -.-> VRF_MGMT & VRF_STORE & VRF_CLUSTER & VRF_SERV & VRF_SILVER & VRF_GOLD ToR_B -.-> VRF_MGMT & VRF_STORE & VRF_CLUSTER & VRF_SERV & VRF_SILVER & VRF_GOLD %% === PROXMOX HVs === subgraph Compute["Proxmox VE\n4 HVs: 3+1 HA"] direction TB HV1["HV-1\n4×10G (2/ToR)\n+1G OOB"] HV2["HV-2\n4×10G (2/ToR)\n+1G OOB"] HV3["HV-3\n4×10G (2/ToR)\n+1G OOB"] HV4["HV-4\n4×10G (2/ToR)\n+1G OOB\n**Spare**"] end class HV1,HV2,HV3,HV4 compute HV1 -->|"bond0: Data\nbond1: Storage"| ToR_A HV1 --> ToR_B HV2 --> ToR_A & ToR_B HV3 --> ToR_A & ToR_B HV4 --> ToR_A & ToR_B %% === ZFS STORAGE === subgraph Storage["ZFS Storage\n3 Nodes: 2+1 HA"] ST1["STOR-1\n24× NVMe Gen4"] ST2["STOR-2\n24× NVMe Gen4"] ST3["STOR-3\n24× NVMe Gen4\n**Spare**"] end class ST1,ST2,ST3 storage ST1 -->|"25G MLAG"| ToR_A ST1 --> ToR_B ST2 --> ToR_A & ToR_B ST3 --> ToR_A & ToR_B BASTION -->|"Secure Access"| VRF_MGMT %% === ACTIVE & WORKING LINKS === click FW_B "https://docs.fortinet.com/document/fortigate/7.4.1/administration-guide/573688/ha-active-passive-cluster" "FortiGate HA Failover" click ToR_A "https://www.arista.com/en/support/advisory/mlag" "Arista MLAG Guide" click HV4 "https://pve.proxmox.com/wiki/High_Availability" "Proxmox HA Auto-Migration" click ST3 "https://openzfs.github.io/openzfs-docs/man/8/zpool-replace.8.html" "ZFS Auto-Rebuild from Spare"

Storage Fabric (ZFS)

%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'Inter, sans-serif', 'fontSize': '14px'}}}%% flowchart TB %% === STYLES === classDef storage fill:#1e293b,stroke:#475569,stroke-width:2px,color:#e2e8f0,border-radius:12px classDef hot fill:#c53030,stroke:#e53e3e,stroke-width:3px,color:#fff,border-radius:16px,font-weight:bold classDef warm fill:#dd6b20,stroke:#f6ad55,stroke-width:2px,color:#fff,border-radius:12px classDef compute fill:#1e40af,stroke:#60a5fa,stroke-width:2px,color:#fff,border-radius:12px classDef backup fill:#0d9488,stroke:#5eead4,stroke-width:2px,color:#fff,border-radius:12px classDef note fill:#f8fafc,stroke:#94a3b8,stroke-dasharray:8 4,stroke-width:1.5px,color:#475569,border-radius:8px,font-style:italic %% === ZFS STORAGE CLUSTER === subgraph ZFS_Cluster["ZFS Storage Cluster\n3 Nodes: 2 Active + 1 HA"] direction LR ST1["STOR-1\n24× NVMe Gen4"] ST2["STOR-2\n24× NVMe Gen4"] ST3["STOR-3\n24× NVMe Gen4\n**Spare Node**"] end class ST1,ST2,ST3 storage %% === HOT TIER === subgraph HOT["HOT Tier: DRBD\n**RF=2 Replication**"] LVs["Replicated VM Disks\nUltra-Low Latency\niSCSI / NVMe-oF"] end class LVs hot %% === WARM TIER === subgraph WARM["WARM/COLD Tier: ZFS RAIDZ2"] RAIDZ2["Per-Node RAIDZ2\n4×6 NVMe vdevs\nHigh Capacity"] end class RAIDZ2 warm %% === CONNECTIONS === ST1 -->|ZFS Pool| LVs ST2 -->|ZFS Pool| LVs ST3 -->|ZFS Pool| LVs ST1 -->|Local vdevs| RAIDZ2 ST2 -->|Local vdevs| RAIDZ2 ST3 -->|Local vdevs| RAIDZ2 %% === PROXMOX COMPUTE === subgraph PVE["Proxmox VE Cluster\n4 HVs: 3+1 HA"] direction TB VMs["**VM Disks** → HOT Tier\nTemplates → SERVICES VRF"] end class VMs compute PVE <== "25G Storage Network\nLACP + MTU 9000" ==> LVs %% === BACKUP & DR === subgraph BACKUP["Backup & Disaster Recovery"] direction TB PBS["Proxmox Backup Server\nEncrypted Landing Zone"] S3["Immutable S3 (Optional)\n14–30d WORM Lock"] DR["Warm DR Site (Optional)\nGold VMs Only"] end class PBS,S3,DR backup PVE -->|"VSS-Aware Backups"| PBS PBS -->|"GFS: 30d/12m/7y"| S3 PVE -.->|"Async Replication"| DR %% === RESILIENCY NOTE === note["**Resiliency**\n1 Node Down → HOT Tier **UP**\nRebuild to ST3 → Full HA"] note --> LVs note --> RAIDZ2 class note note %% === ACTIVE & WORKING LINKS === click ST3 "https://linbit.com/drbd/" "Spare node auto-joins on failure" click LVs "https://linbit.com/linstor/" "DRBD ensures zero RPO" click DR "https://pve.proxmox.com/wiki/High_Availability" "Proxmox HA & Replication Docs"

IP Plan (RFC1918)

Core Infrastructure IP Subnets
Scope Subnet Gateway (ToR VRRP) Notes
OOB Mgmt 10.10.0.0/24 10.10.0.1 iDRAC/iLO, OOB switch, bastion
Proxmox Mgmt 10.10.10.0/24 10.10.10.1 PVE APIs, SSH (MFA via VPN)
Storage (ZFS) 10.20.20.0/24 10.20.20.1 MTU 9000, isolated VRF
Cluster/vMotion/Sync 10.30.10.0/24 10.30.10.1 Corosync, live migration
Services 10.40.10.0/24 10.40.10.1 AD/DNS/DHCP, KMS, PBS
Tenants (Silver) 10.100.0.0/16 per-VLAN SVI VLANs 1000–1199 (/24 each)
Tenants (Gold) 10.101.0.0/16 per-VLAN SVI VLANs 2000–2199 (/24 each)
S2S VPN Hubs 10.40.20.0/24 10.40.20.1 IPsec hubs/loopbacks
Mgmt loopbacks 10.10.250.0/24 n/a devices’ /32s for routing

VLAN / VRF Matrix

VLAN Segmentation and VRF-Lite
VLAN ID Name VRF Purpose ACL Notes
10 MGMT MGMT Proxmox mgmt, infra mgmt Restricted to bastion/VPN; no tenant reachability
20 OOB MGMT OOB iDRAC/iLO OOB only; no east–west
30 STORAGE STORAGE ZFS traffic Hosts↔Storage only; MTU 9000
40 CLUSTER CLUSTER Corosync, live-migration Hosts↔Hosts; deny to tenants
50 SERVICES SERVICES AD/DNS/DHCP, KMS, PBS Egress to Internet via FW; no inbound from tenants
1000–1199 TENANT-SILVER-* TENANT-SILVER Per-tenant L2/L3 Deny inter-tenant; allow to WAF/required services
2000–2199 TENANT-GOLD-* TENANT-GOLD Per-tenant L2/L3 (high sensitivity) Separate VRF; stricter egress; optional encryption

Switch Port Map (per ToR)

Example Port Allocation per ToR Switch
Consumer Ports per ToR Speed LAG/MLAG Count
Inter-Switch Link (PeerLink) 2 100G MLAG peerlink 1 bundle
Hypervisors (4×, each 2×10G per ToR) 8 10G SFP+ LACP bond0/bond1 4 bundles
Storage Nodes (3×, each 1×25G per ToR) 3 25G SFP28 LACP per node 3
Firewalls (A/B) uplinks 2 10G LACP per FW 2
OOB uplink (mgmt vlan access) 1 1G Access 1
Spare / growth ≥12 10/25G future
Minimum ToR spec 48×SFP28 + 8×QSFP28 10G compatible

ZFS Layout (Primary & Warm/Cold)

ZFS Storage Configuration
Item Value Notes
Storage nodes 3 N+1 at node level via RF=2 (DRBD)
NVMe per node 24 × 3.84 TB PCIe Gen4, enterprise NVMe
HOT tier (VM disks) RF=2 over striped ZFS Replica count 2 across nodes (service survives single storage-node loss)
HOT usable (est.) ~124 TB 3 nodes × 24 × 3.84 TB raw × 0.5 RF × 0.9 reserve
IOPS/latency (cluster) p95 < 2 ms; ~600–800k 4k read / 150–250k 4k write Conservative, depends on NVMe model & CPU
WARM/COLD tier ZFS RAIDZ2 vdevs (per node) 4 vdev groups of 6 drives each
WARM usable (est.) ~165 TB (non-shared) For backups, archives, templates; not active/active shared
Network 2×25G per storage node (MLAG) Storage VLAN, MTU 9000
Filesystems Block for VM disks; NFS for ISO/templates Exported to Proxmox as needed

Capacity & HA Summary (4 HV / 3 Storage)

N+1 Failure Scenario (1 HV or 1 Storage Node loss)
Resource Current Need With 20% Burst Proposed Day-1 After 1 Node Failure Meets SLO?
vCPU 876 1,051 4 HV nodes, 2×48-core CPU each ⇒ 384 phys cores @ 4:1 = 1,536 vCPU 3 nodes ⇒ 288 cores @ 4:1 = 1,152 vCPU (~9.6% headroom over 1,051)
RAM (GB) 2,628 3,154 4×1,024 = 4,096 phys; @1.1:1 ⇒ 4,506 vRAM 3×1,024 = 3,072 phys; @1.1:1 ⇒ 3,379 vRAM (~4.6% headroom over 3,154) ⚠️ tight
Storage HOT usable 73 TB + headroom ~124 TB usable (RF=2 over 3 nodes, ~10% reserve) Service continues on 2 nodes; rebuild restores RF=2
Network Hosts: 4×10G; Storage: 2×25G/node; ToR: 25G-capable MLAG/VPC; MTU 9000 on storage

Risks

  • ZFS shared storage complexity (RF=2 via DRBD): validate DRBD placement & fencing; run node-loss drills.
  • Host uplinks 4×10G while ToR is 25G-capable: uplift busiest hosts to 2×25G if east–west >60%.
  • Public IPv4 per-tenant pressure (/29 per ISP initially): pre-arrange additional PA blocks; use WAF + SNI consolidation where feasible.
  • RAM margin after 1 HV failure is tight (~4.6%): enable ballooning/KSM; consider 1.25–1.5 TB RAM per HV or plan early HV#5 gate.
Accessibility
Scroll to Top