Build a hardened ESXi 6.5 image for HPE hardware

As part of the series for ESXi 6.5 this post should give you an idea of how to handle a ESXi image build in detail. No long introduction. Let’s start:

Preparation

  1. Get you the latest ESXi 6.5 offline bundle available from
    VMware Patch Repository (MyVMware Login required)

  2. Get the required drivers and agents from HPEFirst check the recipe for the right firmware and driver combinations. This maybe requires you to update firmware on your boxes.
    HPE ProLiant server and option firmware and driver support recipe 

    Download the required drivers for the latest folder container esxi-650-* hierachie
    http://vibsdepot.hpe.com/hpe/nov2016/
    (alternatively you could connect this online, but have to build the image without a proper internet connection)

    The “esxi-650-devicedrivers” folder contains the right offline bundles for the drivers. Pick the ones you need for your hardware. If you have no idea how to find our what driver is required, please play around a little bit with the “esxcfg-*” commands on the ESXi Shell. List your network and storage adapters on an existing ESXi, best installed with vendor image, and note down the drivers are used.

    The “esxi-650-bundles” contains all additional agents and tooling. Just download the hpe-esxi6.5uX-bundle-* file as this does contain the  hpe-smx-provider CIM provider integration you need for proper hardware monitoring.Some of the drivers are double zipped. Just extract the first layer so you have the offline bundle. The second zip file should not contain any further *.zip file, but *.vib or a vib20 folder.

  3. Setup a PowerCLI 6.5 environment on a compatible Windows machine

Build

  1. Load the VMware ESXi vanilla image
    Add-EsxSoftwareDepot .\ESXi650-201703002.zip
  2. Clone it for further modification
    PS> Get-EsxImageProfile | Select Name
    
    Name
    ----
    ESXi-6.5.0-20170304101-standard
    ESXi-6.5.0-20170304101-no-tools
    Select the standard image. For other patch offline depots you might see for Image Profiles. Pick the standard one without “s” behind the number.
    New-EsxImageProfile -CloneProfile ESXi-6.5.0-20170304101-standard -Name "ESXi-650-custom-hpe-hardened"  -Vendor "schoen computing"
    The Acceptance Level gets automatically inherited from the source image. You don’t need to explicitly specify the parameter.

  3. Remove packages
    Remove-EsxSoftwarePackage -ImageProfile ESXi-650-custom-hpe-hardened -SoftwarePackage xhci-xhci
    I removed these packages for my use case:
    sata-ata-piix
    net-usbnet
    sata-sata-sil
    lsi-msgpt2
    scsi-megaraid2
    scsi-mptspi
    ata-pata-hpt3x2n
    shim-libata-9-2-1-0
    net-forcedeth
    scsi-mpt2sas
    ata-pata-pdc2027x
    scsi-megaraid-mbox
    lsu-hp-hpsa-plugin
    ata-pata-cmd64x
    ata-pata-serverworks
    lsu-lsi-mpt2sas-plugin
    ata-pata-atiixp
    shim-iscsi-linux-9-2-2-0
    mtip32xx-native
    scsi-adp94xx
    nmlx4-en
    lsu-lsi-lsi-msgpt3-plugin
    misc-cnic-register
    scsi-bnx2i
    net-bnx2x
    net-fcoe
    scsi-aacraid
    scsi-qla4xxx
    scsi-megaraid-sas
    ata-pata-sil680
    scsi-iscsi-linux-92
    scsi-aic79xx
    net-nx-nic
    shim-libfc-9-2-1-0
    net-bnx2
    ata-pata-via
    lsu-lsi-megaraid-sas-plugin
    i40en
    net-cnic
    net-vmxnet3
    emulex-esx-elxnetcli
    shim-libfc-9-2-2-0
    nvme
    scsi-ips
    qfle3
    net-enic
    scsi-bnx2fc
    net-mlx4-en
    shim-libfcoe-9-2-2-0
    nvmxnet3
    sata-sata-nv
    vmware-esx-esxcli-nvme-plugin
    lsu-lsi-lsi-mr3-plugin
    net-cdc-ether
    usb-storage-usb-storage
    sata-sata-promise
    scsi-mptsas
    scsi-hpsa
    nmlx4-rdma
    ata-pata-amd
    pvscsi
    net-mlx4-core
    shim-libfcoe-9-2-1-0
    lsi-mr3
    nhpsa
    shim-vmklinux-9-2-1-0
    block-cciss
    scsi-fnic
    lsi-msgpt3
    nmlx4-core
    qlnativefc
    shim-libata-9-2-2-0
    lpfc
    nmlx5-core
    sata-sata-svw
    ima-qla4xxx
    nenic
    elxnet
    qedentv
    sata-sata-sil24
    shim-iscsi-linux-9-2-1-0
    This list does also contain drivers now get removed, but later added from the HPE depot in a newer version.

    Attention: Not all packages can be removed in the listed order as there dependencies between them. If the CLI does not allow you to remove a package as it is a required for another package, just remove the other first and try again.

  4. Export the stripped image
    Export-EsxImageProfile -ImageProfile ESXi-650-custom-hpe-hardened -ExportToBundle -FilePath .\ESXi-650-custom-hpe-hardened.zip
    This image now does contain only the left packages. I prefer now to close the PowerCLI session and load the exported image in a new session like in step 1.

  5. Add HPE offline depots
    Add-EsxSoftwareDepot .\<hpe driver/bundle>.zip
    Add all downloaded and extracted zips in the way above.

  6. Add the HPE packages to the image
    Add-EsxSoftwarePackage -ImageProfile ESXi-650-custom-hpe-hardened -SoftwarePackage <package name>
    The package names can be get from the offline depot zip files. These contain a folder for each package name in  vib20  folder. For my use case these packages were added:
    net-bnx2x
    hpe-esxi-fc-enablement
    net-bnx2
    misc-cnic-register
    qfle3
    scsi-mpt2sas
    lpfc
    hpe-smx-provider
    net-cnic
    nhpsa
    scsi-bnx2fc
  7. Export the finale image
    Export-EsxImageProfile -ImageProfile ESXi-650-custom-hpe-hardened -ExportToBundle -FilePath .\ESXi-650-custom-hpe-hardened.zip
    
    Export-EsxImageProfile -ImageProfile ESXi-650-custom-hpe-hardened -ExportToIso -FilePath .\ESXi-650-custom-hpe-hardened.iso

     Keep the ZIP store anywhere as you can use it for updating and extending the image.

ESXi security/hardening - ESXi image

First part of the series. As mentioned in the overview VMware provides a newly called “Security Configuration Guide”, but this don’t really faces the first part in hands-on, when elaborating a hardened hypervisor approach. All starts with the image we pick – it is the foundation of security. Just think of you are designing a bank depot for storing all the money. The holy grail – the money – is stored in the basement and the entrance of the building above the ground is highly secured by policemen staying at the doores and windows, but the basement has several holes for cooling, wastewater, etc, which are not secured anyhow. That’s not what we want to have with the hypervisor. So what are possible holes in our ESXi image?

  • ESXi Web Services
  • ESXi UI
  • CIM Server
  • OEM management agents
  • OEM tooling
  • serveral other services (just check firewall list on the Security configuration tab in ESXi)

These are services listening on the ESXi for providing data to vCenter or other management services. Some may be wanted, others not. Ok, fair enough, nice to know, but how this relates to the ESXi image? My main focus is to strip down the ESXi images as best as possible to guarantee functionality but don’t offer a high attack vector. So if we can remove unneeded services listening on any port, we can reduce the attack vector, so the attacker has not much possibilities to find any weakness in the system. But before removing anything, we need something we can remove things from. Picking the right base image is key. So what choices do we have for a base image:

  • VMware ESXi vanilla image This is offered only on the VMware website. It does not contain any relation to a specific hardware vendor. The driver set is integrated is capable to support most hardware on the HCL. It does not contain any OEM agents or services.

  • OEM ESXi image  For the most vendors this is also offered on the VMware website and is marked as a vendor specific image. This image was built based on the VMware vanilla image. Additional vendor specific agents, drivers and tools were added to it, to support all the hardware the vendors has certified to the hypervisor version it was built for, to remote manage the hardware by vendor management tools and run firmware updates for the underlying hardware on the hypervisor.

It should be now very clear what are candidates for a removal:

  • OEM management agents Don’t trust any of these agents. Many of them caused PSODs for my customers and offer often bad secured services to the outside. But be aware a lot of this agents are bundled with the CIM integrations provided from the vendor. CIM provider integrations are something we want to have in the image to not lose track of the hardware outages. The vendor integrations are mostly much more powerful compared to what VMware provides via generic interfaces.

  • Drivers in general (optional) Drivers, independent if they were provided by VMware of the OEM, are not really a security concern, as they are only used if a matching devices is present. I like to remove the unneeded ones anyway to keep the image as clean as possible. Most customers have a static bill of material for hardware and so it is very easy to pick the required drivers and strip out the left ones.

  • OEM tooling A lot of hardware vendors provide extra tooling for example for running firmware upgrades on the ESXi Shell or to read configuration out of the BMC boards or BIOS. This is nice, but really unwanted. Like I don’t want to provide capabilites to bridge the isolation between hypervisor and VMs and also don’t want to do the same between hypervisor and hardware.

  • Unwanted functionality This is the most complicated part of the hardening. To chose the right default functionality, is not build into the kernel, can be removed. Good candidates are GUIs, like HTML 5 GUI or the USB 3 drivers.

This should be all for now. There is a good question coming up how to remove all this drivers/agents/tools/functionality from my image? I prefer the VMware Image Builder CLI based on PowerCLI. With 6.5 you also have the chance to use a Web Client GUI for it as part of the Auto Deploy feature.

However you alter your image, please do yourself the favor and document it!

For getting an idea how specific steps in the reality look like, please check the example for HPE hardware linked below:

Build a hardened ESXi 6.5 image for HPE hardware    

ESXi 6.5 security/hardening

ESXi Hardening – loved and hated. What do have already in this space? VMware provides a hardening guide for all the latest versions of ESXi. Looks like with 6.5 there was a change in naming: “Security Configuration Guide“.  I think is pretty much well known, but in my day-to-day work this is only one part to build a hardened hypervisor. Especially if you work with large and security sensitive customers, you should put more brain work into this topic. As this is a growing topic, best is to set up a series for it:
  1. ESXi image
  2. SSL/TLS
  3. DMZ
  4. Monitoring
Stay tuned!

VMware Validated Designs does not exclude Stretched Cluster in general

Working as an architect in the VMware space you will sooner or later come across the VMware Validated Designs (VVD). Just a few weeks ago the latest version 4.0 was released to make adjustments for vSphere 6.5. It can be found here:

VMware Validate Designs Documentation

The designs are a great source for building your own architectures or building architectures for customers. The incorporated component architectures are natively built for availability, reliability and scalability. These are exactly the main goals I try to put in the designs I create for customers. The VVDs show up a good practise for a detailed setup that can be used for several use cases like Private Cloud or VDI deployments. VMware Cloud Foundation also makes use of the VVDs for its implementations.

But apart from this I also like to treat them as a framework which gives me the chance to keep the setup supported by VMware but also adjust it to the customer needs and make it fit like a second skin based on customers requirements.

Across their history they mainly relied/rely on a two region concept with one primary and fail-over region. This is a quite common architecture for U.S. setups. In the European space and especially in Germany customers often stick their existing architectures based a two datacenters setup working as an active/active pair. If you see this also as a two region setup or you would aggregate this into one region like me, is up to you. I prefer one region because the datacenters are in a short distance because of their synchronous replication/mirroring and so they build up a logical domain because for their active/active style.  This is why I split the region down to two physical availability zones (AWS term) and one virtual across two datacenters. This does not need to be undestand now, it will get clearer in later chapter.

In my understanding the VVD framework needs some extension in regards to Stretched Clusters and this is why I like to set up a series which guides through a forked version of the VVDs I personally use for customer designs:

  1. General thoughts
  2. Additions/Changes to physical architecture
  3. Additions/Changes to virtual architecture
  4. Additions/Changes to cloud management architecture
  5. Additions/Changes to operations management architecture
  6. Additions/Changes to business continuity architecture

Stay tuned!

 

 

 

 

 

Set up vCenter HA in advanced mode with different heartbeat networks

Setting up vCenter HA in basic mode is pretty straight forward.

Doing the same in advanced mode is a little bit more tricky. But first, why I need to go with the advanced vCenter HA mode?

  • the VCSA is managed by another vCenter (for example if the setup is according to the VMware Validated Design 3.0)
  • the heartbeat vNICs are not in the same IP subnet

Luckily I to cater with both in a project.

I don’t want to create much content by copying from VMware documentation, so I only like to give some hints and describe some pitfalls you might get into, when setting this up and last but not least describe what VMware missed in there official documentation:

https://pubs.vmware.com/vsphere-65/index.jsp#com.vmware.vsphere.avail.doc/GUID-13E2EEB3-4DD7-43CC-99C8-73A98676C15E.html

Preparation pitfalls

DNS reverse lookup

Basics, basics, basics. Ensure that you have a proper DNS reverse lookup implemented. Also take care of the exact writing of the host names. Don’t maintain a hostname entry in the DNS system with capital letters and use a non-capital or mixed one in your VCSA. This basically does not work, because the self-lockup fails. The GUI won’t tell you that, but if you run the preparing with prepare-vcha cmd line interface I will show up some more info.

 

Procedure pitfalls

When have you clicked through the first step in GUI, the next step is to clone the VCSA two times, once for the passive node and second one for the witness node. Very important is, that you don’t clone before the first step has finished. Otherwise the communication between the appliances won’t work because they use the wrong SSH keys. This is also valid, if you try to use the same cloned appliances for setting this up a second time – this won’t work. You need to throw them away and clone new ones at the right point in time.

Customization

The official documentation does not describe the customization stuff in much detail. Maybe that is not necessary, but if something is not totally clear to you, here some hints:

Passive node: documentation is clear, but really, you need to configure the first NIC (NIC 0) with the same IP config like the primary vNIC of your original VCSA. The adapter gets automatically shut down while first boot, so there is no IP conflict in your network – or better only for a short moment. NIC 1 set to the heartbeat network info you have prepared. Ensure that you really don’t set a gateway on this adapter.

Witness node: the documentation stated here to also leave the gateway out for the required heartbeat NIC. I have not done this, as I set the first adapter unused or shut it down. The second one I set to the heartbeat IP and used also a gateway. The gotcha was, that the witness had an IP in a totally different subnet than the active and the passive VCSAs. Static routes might also do the job, but I had some trouble with that.

First start

Start up the appliances. The IP conflict resolves as described above. Before to click the next step in the GUI, ensure that you do the following:

Go to both active and passive VCSA and add a static route so they can reach the witness via the heartbeat interface:

Edit /etc/systemd/network/10-eth1.network and add the following lines at the end:

[Route]
Destination=10.xxx.yyy.zzz/32
Gateway=10.sss.ttt.1

The destination can be limited to the heartbeat IP of the witness VM. The gateway must be the gateway of the IP subnet used for the heartbeat in the local VCSA appliance. Repeat this on the passive appliance (best is to SSH to it via the active one).

Now and only now hit “Finish” in the GUI and all should go well. You can check the log /var/log/vmware/vcha/vcha.log for further info.

Creating Docker Adapter Instance in vROps with vRO

Das neue Docker Management Pack für vROps ermöglich die Einbindung von Docker-Metriken das Monitoring. Problematisch ist, dass für jeden Docker Host eine eigene Adapter-Instanz erstellt werden muss. Eine perfekte Gelegenheit um mal wieder den vRealize Orchestrator zum Einsatz zu bringen. vROps bietet ein REST-API. Hierzu findet sich unter https://<vrops host>/suite-api/docs/rest/index.html eine schöne Dokumentation. Screen Shot 2016-05-18 at 22.10.10   Download: vCO Workflow: CreateVropsDockerAdapterInstance

Docker Remote Management mit vRealize Orchestrator

Docker oder Applikations- bzw. Containervirtualisierung erfreut sich quer durch alle Branchen und Unternehmensgrößen immer höherer Beliebtheit. Hat man nun vor Docker im größeren Stil z. B. im Enterprise zu nutzen (darüber lässt sich natürlich genüßlich streiten – Kommentar auf TheNewStack), so macht sich eine Orchestrierung der aufkommenden Verwaltungsaufgaben schnell bezahlt und letztendlich auch unerlässlich. Auf dem OpenSource-Markt gibt es einige Produkte zum Cluster-Management. Zu nennen wären hier Docker Swarm, Mesos oder Kubernetes bzw. alles zusammen im Photon Controller. Möchte man nun seine Container Hosts etwas freier betreiben oder z. B. in den Deployment-Prozess von vRealize Automation einbinden, bietet sich der vRealize Orchestrator an. Läuft ein Docker Container Host als VM z. B. mit Photon, so ergeben sich grundlegend drei Wege Docker Befehle gegen diesen Container Host auszuführen:
  • SSH, Authentifizierung per SSH, Ausführen der Aktionen über die docker CLI Pros: Linux native, keine weiteren Tools notwendig Cons: SSH offen (besonders in der DMZ problematisch), Rückgabewerte auswerten
  • VMware Tools (VMCI), Authentifizierung über das Host-Guest-Interface, Ausführen der Aktionen über die docker CLI Pros: keine Netzwerkverbindung nötig Cons: VMCI Modul in den VMware Tools als Sicherheitsrisiko, Rückgabewerte auswerten
  • Docker Remote API, Authentifizierung über Docker Daemon (Achtung: standardmäßig ist diese deaktiviert), REST-API Aufrufe Pros: standardisiertes API (REST), keine weiteren Tools notwendig Cons: exposed Port
Der wohl eleganteste Weg ist das Docker Remote API. Dieser Artikel beschreibt für Photon, wie man das Remote API freischaltet: http://the-virtualizer.com/2016/05/expose-docker-remote-api-on-photon/ Nun zur eigentlich Orchestrierung. Will man einen Docker Container nun über einen vRA-Request bereitstellen, so ist der vRO das Mittel der Wahl um den Docker Host anzusprechen und einen Container zu instanziieren. Möchte man gerne Docker Swarm einsetzen um die Aufgaben über mehrere Hosts eines Cluster zu verteilen, so sehen die Schritte analog aus, da das Swarm API an das Docker API angelehnt ist. Der folgende Workflow zeigt das Instanziieren und Starten eines Containers aus einem vorgegebenen Image anhand der Docker API: Screen Shot 2016-05-16 at 12.29.16   In dieser Variante akzeptiert dieser Workflow folgende Eingabe-Parameter: Screen Shot 2016-05-16 at 12.34.34 Dies kann belieb erweitert werden, z. B. um Port- und/oder Volumemappings. Dafür muss entsprechend der Payload für den Create Docker Container Workflow erweitert werden. Die möglichen Parameter können hier nachvollzogen werden: https://docs.docker.com/engine/reference/api/docker_remote_api_v1.23/#start-a-container Download: vCO Workflow: Run Docker Container (Remote API)  

Expose Docker Remote API on Photon

Es kann sinnvoll sein die Docker Remote API nicht nur von localhost erreichbar zu machen. Fälle dafür sind z. B.:
  • Orchestrierung der Docker Runtime von extern (z. B. durch den vCenter Orchestrator bzw. vRealize Orchestrator)
  • Monitoring der Docker Runtime durch vRealize Operations mit Docker Management Pack
Dazu muss im Falle von Docker die systemd-Konfiguration für den Docker Daemon editieren:
vi /etc/systemd/system/multi-user.target.wants/docker.service
ExecStart=/usr/bin/docker daemon \
          --containerd /run/containerd.sock --insecure-registry=192.168.5.20:5000 -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
-H bestimmt dabei, wo der Deamon lauschen soll. tcp://0.0.0.0:2375 exposed den Port auf 2375/tcp nach außen unix:///var/run/docker.sock lässt auch weiterhin den lokalen Unix-Socket laufen, sodass die CLI-Kommandos wie gewöhnlich genutzt werden können   Von außen lassen sich nun entsprechende Abfragen gegen die Rest-API absetzen:
GET http://192.168.5.20:2375/containers/json
POST http://192.168.5.20:2375/containers/create
 

VMware Photon und eine private Docker Registry

In einer DEV-/Test-Umgebung kommt es häufig vor, dass die Docker Registry nur über HTTP und nicht über HTTPS verwendet wird. Die Docker Runtime geht standardmäßig davon aus, dass die Verbindung SSL verschlüsselt und das ausstellende CA-Zertifikat lokal vorhanden ist. Ist dies nicht der Fall, erhält man folgende Fehlermeldung:

root@photon01 [ ~ ]# docker run -it 192.168.5.20:5000/corp/centos7base:7.2
Unable to find image '192.168.5.20:5000/corp/centos7base:7.2' locally
docker: Error response from daemon: Get https://192.168.5.20:5000/v1/_ping: tls: oversized record received with length 20527.
See 'docker run --help'.

Der Docker-Daemon kennt zum Ignorieren der SSL-Verschlüsselung den Parameter –insecure-registry . Dieses lässt sich manuell beim Starten des Daemons mitgeben oder als fester Parameter für den automatischen Start festlegen. Hier ist noch zu unterscheiden ob systemd verwendet wird oder nicht. Photon setzt systemd ein, daher kann folgender Weg angewendet werden:

systemctl enable docker
vi /etc/systemd/system/multi-user.target.wants/docker.service
ExecStart=/usr/bin/docker daemon \
          --containerd /run/containerd.sock --insecure-registry=192.168.5.20:5000

Der Parameter –insecure-registry gibt dabei die Adresse plus den Port der privaten Docker Registry an. Sollen mehrere Registries eingetragen werden, so wird der Parameter in der oben angegeben Form mehrmals hintereinander mit je einem Server angegeben.

Nach dem Änderung durchgeführt wurde, muss die systemd-Konfiguration von der Platte neueingelesen werden:

systemctl daemon-reload
systemctl restart docker

HINWEIS: Die systemd-Konfigurationsdatei könnte bei einem Update wieder überschrieben werden.

CentOS Base Image für Docker erstellen

Im öffentlichen Docker Hub Image Repository finden sich einige fertige CentOS-Basisimages, darunter auch die vom CentOS-Projekt gepflegten. Wer diesen nun misstraut oder gerne sein eigenes Basisimage erzeugen möchte, kann zu mehreren Optionen greifen. Folgt man dem Hinweis für CentOS auf docker.com, gelangt man zu einem Skript, der die Arbeit für einen erledigt. Manchmal ist es sinnvoll auch zu verstehen, was dort getan wird und somit macht man es auch gerne per Hand. Das Erstellen eines Docker-Basisimages ist vergleichbar mit dem Erstellen einer CHROOT-Umgebung. So fangen wir auch an (hier am Beispiel einer CentOS 7-Installation):
  1. Zuerst erstellen man einen temporären Ordner für die CHROOT-Umgebung:
  2. Danach baut man eine RPM-Datenbank für die spätere Paketinstallation in der CHROOT-Umgebung:
  3. Nun teilt man dem lokalen RPM mit, welches OS installiert werden soll. Dies geschieht über das centos-release Paket, welches die nötigen Metainformationen für z. B. Repositories enthält:
  4. Es fehlen noch die Pakete für die CHROOT-Umgebung. YUM und RPM ziehen alle nötigen Abhängigkeiten für die CHROOT-Umgebung:
  5. Überprüfen sollte man seine Arbeit auch noch:
  6. Nun importieren wir das Image noch in unsere private Docker Registry:
Die Anleitung funktioniert analog auch für CentOS 6 oder den entsprechenden RHEL-Versionen. Einfach die Repository-Pfade anpassen und gut.