16 January 2014

CentOS 6 under VMware ESXi 3.5...

CentOS, a "genericized" version of Red Hat Enterprise Linux, has been my server OS of choice for several years now. It has all the acceptance and compatibility of Red Hat Enterprise, without the heavy cost for updates. It has a long support lifespan, and follows the policy of backporting patches, so updates should not affect the configuration of running services.

My employer is releasing a brand new version of their core software offering, and I have been working to put together a new template image from which we can deploy new systems. With the relatively recent release of Red Hat Enterprise Linux 6 (and thus CentOS 6) and the major updates to the software that came with that, it seemed the obvious choice from which to assemble this new template.

In our hosting environment, we are running VMware Infrastructure 3i, using ESXi 3.5. I have plans to upgrade the hosts to a much more current version of vSphere, but the window for that has not yet presented itself! All of our production VM guests are using the vmxnet ethernet adpater.

Building the template in this environment proved easier said than done - The OS would install just fine in the template VM (I went with a "minimal" install, so as to keep the template as lightweight as possible), but once I tried to deploy a second system from the template, I could not get the network to come up. The VMware Tools and associated drivers (when downloaded from a 3.5 update 6a host) would build with no trouble.

After several false starts and ample searching, I found that there had been a few changes made to the behavior of device detection in the switch from RHEL 5 to RHEL 6, and the impact that was having was being felt in the changes that happen to a VMware network interface when one deploys a new cloned VM from a template.

Whenever you deploy a clone from a template, you have a VM that is identical in *most* regards to the template - except for a few minor identifiers, as well as a new MAC address on the cloned system's network adapters. This change leads udev to behave as though the adapter is new (and that the adapter with the template's MAC address is simply not present - but the configuration remains). As such, you have a eth0 that will not start, and an eth1 that is not configured.

Simple solution, based on findings at Aaron Walrath's blog:

On any template from which you will be deploying CentOS 6 machines:
Step 1: Remove /etc/udev/rules.d/70-persistent-net.rules
Step 2: Remove the HWADDR entry from /etc/sysconfig/network-scripts/ifcfg-eth0 (left in default DHCP configuration)
Step 3: Shutdown the system, convert it to a template

You should now be able to deploy CentOS 6 (or presumably RHEL 6) without network issues.

Display driver frustration

I use an older (4.5 years old) Dell Precision M4400 laptop running Windows 7 64-bit at work, and while it is certainly getting a bit long in the tooth, it has worked well for me. My demands are relatively light - I use most of the 8GB of RAM it has for web applications and browsing, and a few other applications. I don't put the graphics processor to the test much, aside from having a second Dell display connected via DisplayPort. Both displays were operating just fine at 1920X1200.

As time has gone on, however, the newer drivers for my graphics chipset have started causing a very annoying issue. Power saving behavior incorporated into the driver (called "Powermizer") is not compatible with the firmware Dell has made available for my laptop. As a result, if a fairly current version of the nVidia Quadro FX 770M graphics driver is installed, the laptop will blue screen (with a STOP 0x00000116 error). Safe mode worked to prevent this from happening, but prevents use of a number of system items. Simply removing the driver was insufficient - Windows 7 would install the "current" driver from Windows Update, and the blue screen would happen on the next normal reboot.

Solution: I uninstalled all nVidia driver components (in Control Panel) and installed only the following parts of the driver package:
nVidia Graphics Driver 275.33
nVidia nView 135.85

I also downloaded a piece of freeware called Powermizer Switch which (via registry settings) disables the Powermizer feature that causes the blue screen. I downloaded it from here.

Where before, the system would blue screen on login or would simply say "Unable to save Display Settings" when I would attempt to enable the external display - now it works just fine, as it has for 4.5 years.

Conclusion: If you are happy with how something works, resist making changes, applying updates, etc. You never know what might break, and further, you don't know if your "update" will be reversible!