Server 2008 R2 Core Memory Usage with vSphere

A few weeks back while testing a Server 2008 R2 Core install I was curious what the resource usage impact was when comparing it to some other editions of Windows Server.


2008 R2 Core without VMWare Tools installed runs a page file of 282MB while with VMWare Tools installed hovers at 401MB. Lastly when testing against a full 2008 R2 Standard install with VMWare Tools installed we are sitting at 461MB for memory usage. For comparison, I also included a base Server 2003 R2 install showing even lighter resource usage yet.

Definitely less is more, but it goes to show that there is a a 60MB savings in memory usage by using a core install over the full-install. The bigger surprise was finding out that VMWare Tools alone used 119MB on the test system.


Website performance tuning

When developing websites, never forget that page render time is critical. I’ve seen way too often that production code just performs poorly. Often the blame goes the servers or the connection to the servers. Reports for each and any error logs will show what, if any issues are present. If routine automated maintenance is taken against the disk, database back-end, and memory recycling then the issue lays elsewhere. More often than not the problem is with the code being rendered to the client. Even if cross-platform browser testing is conducted to verify the rendered formatting, it does nothing to actually benchmark the performance of the site.

Test-Driven Development

Over a number of years I was taught about concept of Test-driven development. Which for websites can be done with something like SimpleTest for PHP development or NUnit for ASP.NET development. The problem I find with this methodology is that the code generated as a result of the tests written are only as good as the developer writing the tests. The larger problem is when moving into the web development arena. Unless the developer is spending all their time working on a single internally developed website framework, development of a full-compliment of tests internally is a waste of time and development resources.

What a waste

So then what? Often outside of performing some simple render tests against a couple of different browsers and walking through some of the website forms nothing, ABSOLUTELY NOTHING. Often this method for testing is considered good enough. But I question, how many visitors to a site do you loose when your homepage is over a megabyte? Typically the developers have a 100Meg or faster connection to the websites on a LAN connection latency < 2ms. The problem is real-world users don’t have that kind of pipe available. Websites should load quickly on whatever the lowest common user-base is. If it’s an internal site-only, what about users coming in via VPN from home? Now try performing your browser testing on a 1Meg or slow link with at least 100ms in latency, you’ll start to see what I’m getting at. The site no-longer performs like it use to.

Make Time

As everyone’s time is limited you may begin to wonder, where am I going to find all the time to perform all this testing? Well, I hate to burst your bubble regardless of what type testing is conducted there is always some time used. The good news is there are tools freely available out there (as I know we all work without a budget) to assist with this testing. The first issue which may arise would be corporate policies dictating we only use *browser XYZ*, no other browsers are allowed or should be supported. If you are in this scenario, RUN!….. *just kidding* talk with your manager about loading Firefox on the development workstations to assist with testing your code.

W3C Validation

The most basic testing would be a add-on called HTML Validator. This add-on allows you to validate against W3C HTML standards, while not necessarily performance this helps to minimize any cross-browser rendering issues. Below is a screenshot of the window presented when single-clicking icon within firefox.
By default this add-on will run on every page so all you need to look for is the red and white “X” in the corner. When double-clicking on the icon, it will present a window as seen below stating the line of rendered code the can be found, what the problem is, and some documentation about the error being presented.
HTMLValidator - source window
I will occasionally get false-positives with this add-on, this add-on is designed to test W3C compliance of HTML generated code. Anything beyond HTML may throw warnings or errors (of which inline CSS comes to mind).

Now for the meat and potatoes.

When looking at website performance, the real test is time. There are basically:

3 things to check

  1. The number of files being requested
  2. Code complexity
  3. total page weight

Different aspects of these three variables presented will determine the time it takes for the page to be fully presented to the user.

The many requests

The number of files being requested is a latency centric issue. When a specific page is being request as it is loading there will be references to images, css files, js files, and so forth. Each file needs to be requested separately which has two issues overall latency build-up or a variation on the “rubber band effect” and browser limitations. The rubber band effect in general is when sending or requesting information I need to go through points generally:

  • Lets start at A
  • then go to B
  • then go to A
  • then go to B
  • then go to C
  • then go to A
  • then go to B
  • then go to C
  • … rinse, wash, repeat until your blue in the face

More specifically for our case:

  • Request: somepage.html
  • Response: here’s page somepage.html
  • Request: my.css
  • Request: mysecond.css
  • Response: my.css
  • Response: mysecond.css
  • Request: image_xyz
  • and so on and so forth

The problem here is a lot of general page request I/O. This results in a latency buildup issue. So if your average latency is 100ms to the server and it is a very simple page with just one small image, there will be two separate non-concurrent requests being made resulting in 200ms render time outside of transmission time. In a CMS-based site there are between 10 requests going well up over 30 requests which is why sometimes it takes seconds even on a broadband link to load components. If there are multiple Javascript or CSS files could they be pipe-lined on the back-end and pulled down in a single request? Instead of using imaging slices with a number of different pieces, could parts be rendered with CSS instead?

Code Smell

The second problem is code complexity or inefficiency. Beyond combing through the code line by line for “Code Smell“, take a look with Firebug or HTTPWatch for the time taken to obtain each request. If there a few outliers taking a 100ms or more than the rest of the code there is typically a sign there is a problem with the given file. When run the rendering tests, make sure to run them a few times to get a baseline to try to rule out any abnormalities.
Here you can see a test against my Linkedin homepage as I was noticing some slowness issues today. The sluggishness of the site appears to be due to the RSS styled feed of my connection updates.

Trim the fat

The last issue is the total page weight. Here is where YSlow really shines, take a look specifically at the statistics and components tabs of the rendered page.

Uncompressed or poorly compressed images are the largest offenders on most sites I’ve seen, typically art received from a graphics arts department come over as PNG files or uncompressed JPEG’s running at a resolution many times that of what is required. Re-size the images down and then save them either as JPEG’s or GIF’s with appropriate compression. PNG files on occasion can produce smaller files but unless you are targeting only browsers which support PNG files, leave them for development purposes only.

Next it’s time to look at the code, assuming you already looked for and corrected issues with “code smell” are your files gzip’d? Files can be compressed in advanced but on modern servers it very rarely shows any advantage over run-time compression. Compressing all your PHP, ASP, ASPX, CSS, JS, and XML files can save a large amount on the file size. You may ask yourself but what about the increased CPU load on the server and client when performing all that compression? Truthfully the load increase is very minimal compared to the amount of savings in time as most times the slowest link of anywhere is the users internet or network connection.


Below are some links for some of the tools listed above along with a few extras for test automation and sql injection tests.

Firefox –
Firebug – – there is also FlashFirebug and FirePHP
YSlow –
HTML Validator –
Live Headers –
SQL Inject Me –
HttpWatch basic edition – – for IE performance testing, though it can be used for Firefox as well. There is also a more featured commercial version available.
iMacros – – Rendered test-driven performance automation, a bit more involved but can be useful for automated testing in certain scenarios.

Windows Unicast NLB Performance Tuning

Windows Server 2000 and later offer a clustering option known as Windows Network Load Balancing (or NLB for short). This technology allows for a very cost-effective clustering solution. When working with the lowest common denominator of a switched networking architecture, NLB is limited to Unicast operations only. As this technology has been out for nearly I decade, I pause to bring to being it up but in productions environments I keep running across performance impacts from using this technology. Per Microsoft:

In its default unicast mode of operation, Network Load Balancing reassigns the station address (“MAC” address) of the network adapter for which it is enabled (called the cluster adapter), and all cluster hosts are assigned the same MAC address. Incoming packets are thereby received by all cluster hosts and passed up to the Network Load Balancing driver for filtering.

Network Load Balancing’s unicast mode induces switch flooding in order to simultaneously deliver incoming network traffic to all cluster hosts.

I see of hear of small-to-medium sized organizations introducing multiple vlans on their networks for performance to control broadcast storms, which is a great starting step. Where it stops often is at the data center where there is a single vlan for all server traffic. Microsoft mentions that there is a “port-flooding” condition that may occur, but at what level? For example, lets introduce a pair of IIS NLB clusters into a single vlan with gigabit connectivity along with say less than 100 other servers with traffic in the neighborhood of 200 simultaneous connections or less. Everything still works, performance may seem a little sluggish but nothing to noticeable.

Now lets scale the traffic up by either upping the connections to 1,000+ or having a backup solution point to a DNS entry or one of the IP’s on the network card which is part of the NLB cluster. You will start to see switch ports lit up like a Christmas tree and periodic dropped packets on the given vlan.

With NLB in Unicast mode every packet received is sent as a broadcast packet to every member of the vlan. Separately this can introduce security issues for the environment.

Graphically, what does this look like? Below you will see an RRD graph with the amount of traffic being sent to a monitoring port on the network, the baseline is from 30 to 35 kbps. In this scenario there is on NLB cluster offering up IIS under Server 2003 and a second NLB clustering offering up Microsoft Exchange 2007 CAS/Frontend services. Each cluster introduces approximately 15 kbps of traffic to every node on the vlan. You will also notice that by design the Unicast NLB method introduces this problem on the receive-side only, packet transmission from the cluster does not flood the vlan.

NLB Effect

Correction of this design issue is fairly straight-forward. Each Windows NLB cluster should by design be in vlan isolation to prevent port-flooding. If vlan isolation is not an option for weeks/months for whatever the reason you might be able to reduce the scope of the flooding by adjusting the “Port Rules” option as shown below. For vlan sizing I would take into account whatever your current plans or end-game ideas for the cluster (whichever is larger), then double it, add the number of routing virtual IP’s from the networking side and add one for troubleshooting. For smaller clusters a /28 would be sufficient to meet these requirements which allows for future expansion, cluster node upgrading/replacement, and a spare IP for troubleshooting in case a problem should arise.

Microsoft reference: