IIS website performance tuning


After having some free time, to upgrade the underlying hardware running this site along with a few other things. The drive upgrades in particular helped a fair amount on the processing time, however going back and remembering to configure the output caching for IIS was a bigger help. In any event the site should be significantly faster loading for everyone. As IIS output caching is not new by any means, below are some links going over the feature within IIS.

IIS.Net – Configuring IIS 7 Output Caching

IIS.Net Dynamic Content caching

Technet – Kernel-Mode caching

While this feature has been available for years, many IIS websites still haven’t taken advantage of any of the newer features.

, , ,
July 2, 2013 at 7:18 pm Comments (0)

Slow snapshots with VMware

Within vSphere one of the common features available is the ability to take snapshots. For a couple of years now taking snapshots had an option called “Snapshot the virtual machine’s memory” which would snapshot the target VM with a perfect run time state snapshot.

This feature comes at a price time. Recently I’ve been going after some of the larger servers in my environment, in this particular test case some new Exchange CAS/HUB servers. When taking a snapshot normally it would complete within 5 seconds. However, with the given VM’s running 4vCPU and 8GB ram each taking a snapshot of the VM’s memory was taking over 21 minutes on creation. The issue only shows itself during creation when merging snapshots back together there is no unusual delay.

Now there is a way to enhance the perform, however it requires manually editing the vmx config file by hand, via powercli, or via the vSphere client.

Here’s how with the vSphere client.

With the given VM powered off, edit it and select the options tab.

Select Configuration Parameters and then “Add Row” twice then insert the following:

Name: mainMem.ioBlockPages
Value: 2048

Name: mainMem.iowait
Value: 2

Then select “Ok” twice and power back on the VM to test it again.

With the same VM the second time around it took just under 2 minutes and 20 seconds, saving myself almost 20 minutes per snapshot.

As with anything please test this yourself, I would assume your mileage would vary depending upon your configuration. When looking to implement this on a large scale of dozens, hundreds, ect VM’s you would need to leverage PowerCLI. To shutdown, edit, and power back on each VM.

, , , ,
July 27, 2012 at 3:13 pm Comments (0)

Bad Fragmentation

Shortly before moving I ran across the worst fragmented by percentage system I’ve ever seen. To give some background the system was an IBM x336 running Windows 2000 Server (yes, 2000 Server) and it had been up and in production for 5+ years. As you can see the 10.6GB partition for the C-drive is reporting 124% fragmentation!


The screenshot from the above is taken by an old version of Defraggler (version 1.21) with the defrag taking about 12 hours.

Why is it that no-one ever does any maintenance on their systems?

November 14, 2011 at 10:06 pm Comments (0)

SSD Notebook Performance

The Lenovo (IBM) T410s notebook ships with an optional 128GB SSD. The particular SSD seems to vary a bit between either Toshiba or Samsung. When testing read performance there was no problem cracking 3800+IOPS with random seek performance and over a 128MB sustained read rate.


April 5, 2011 at 9:38 pm Comments (0)

Website performance tuning

When developing websites, never forget that page render time is critical. I’ve seen way too often that production code just performs poorly. Often the blame goes the servers or the connection to the servers. Reports for each and any error logs will show what, if any issues are present. If routine automated maintenance is taken against the disk, database back-end, and memory recycling then the issue lays elsewhere. More often than not the problem is with the code being rendered to the client. Even if cross-platform browser testing is conducted to verify the rendered formatting, it does nothing to actually benchmark the performance of the site.

Test-Driven Development

Over a number of years I was taught about concept of Test-driven development. Which for websites can be done with something like SimpleTest for PHP development or NUnit for ASP.NET development. The problem I find with this methodology is that the code generated as a result of the tests written are only as good as the developer writing the tests. The larger problem is when moving into the web development arena. Unless the developer is spending all their time working on a single internally developed website framework, development of a full-compliment of tests internally is a waste of time and development resources.

What a waste

So then what? Often outside of performing some simple render tests against a couple of different browsers and walking through some of the website forms nothing, ABSOLUTELY NOTHING. Often this method for testing is considered good enough. But I question, how many visitors to a site do you loose when your homepage is over a megabyte? Typically the developers have a 100Meg or faster connection to the websites on a LAN connection latency < 2ms. The problem is real-world users don’t have that kind of pipe available. Websites should load quickly on whatever the lowest common user-base is. If it’s an internal site-only, what about users coming in via VPN from home? Now try performing your browser testing on a 1Meg or slow link with at least 100ms in latency, you’ll start to see what I’m getting at. The site no-longer performs like it use to.

Make Time

As everyone’s time is limited you may begin to wonder, where am I going to find all the time to perform all this testing? Well, I hate to burst your bubble regardless of what type testing is conducted there is always some time used. The good news is there are tools freely available out there (as I know we all work without a budget) to assist with this testing. The first issue which may arise would be corporate policies dictating we only use *browser XYZ*, no other browsers are allowed or should be supported. If you are in this scenario, RUN!….. *just kidding* talk with your manager about loading Firefox on the development workstations to assist with testing your code.

W3C Validation

The most basic testing would be a add-on called HTML Validator. This add-on allows you to validate against W3C HTML standards, while not necessarily performance this helps to minimize any cross-browser rendering issues. Below is a screenshot of the window presented when single-clicking icon within firefox.
By default this add-on will run on every page so all you need to look for is the red and white “X” in the corner. When double-clicking on the icon, it will present a window as seen below stating the line of rendered code the can be found, what the problem is, and some documentation about the error being presented.
HTMLValidator - source window
I will occasionally get false-positives with this add-on, this add-on is designed to test W3C compliance of HTML generated code. Anything beyond HTML may throw warnings or errors (of which inline CSS comes to mind).

Now for the meat and potatoes.

When looking at website performance, the real test is time. There are basically:

3 things to check

  1. The number of files being requested
  2. Code complexity
  3. total page weight

Different aspects of these three variables presented will determine the time it takes for the page to be fully presented to the user.

The many requests

The number of files being requested is a latency centric issue. When a specific page is being request as it is loading there will be references to images, css files, js files, and so forth. Each file needs to be requested separately which has two issues overall latency build-up or a variation on the “rubber band effect” and browser limitations. The rubber band effect in general is when sending or requesting information I need to go through points generally:

  • Lets start at A
  • then go to B
  • then go to A
  • then go to B
  • then go to C
  • then go to A
  • then go to B
  • then go to C
  • … rinse, wash, repeat until your blue in the face

More specifically for our case:

  • Request: somepage.html
  • Response: here’s page somepage.html
  • Request: my.css
  • Request: mysecond.css
  • Response: my.css
  • Response: mysecond.css
  • Request: image_xyz
  • and so on and so forth

The problem here is a lot of general page request I/O. This results in a latency buildup issue. So if your average latency is 100ms to the server and it is a very simple page with just one small image, there will be two separate non-concurrent requests being made resulting in 200ms render time outside of transmission time. In a CMS-based site there are between 10 requests going well up over 30 requests which is why sometimes it takes seconds even on a broadband link to load components. If there are multiple Javascript or CSS files could they be pipe-lined on the back-end and pulled down in a single request? Instead of using imaging slices with a number of different pieces, could parts be rendered with CSS instead?

Code Smell

The second problem is code complexity or inefficiency. Beyond combing through the code line by line for “Code Smell“, take a look with Firebug or HTTPWatch for the time taken to obtain each request. If there a few outliers taking a 100ms or more than the rest of the code there is typically a sign there is a problem with the given file. When run the rendering tests, make sure to run them a few times to get a baseline to try to rule out any abnormalities.
Here you can see a test against my Linkedin homepage as I was noticing some slowness issues today. The sluggishness of the site appears to be due to the RSS styled feed of my connection updates.

Trim the fat

The last issue is the total page weight. Here is where YSlow really shines, take a look specifically at the statistics and components tabs of the rendered page.

Uncompressed or poorly compressed images are the largest offenders on most sites I’ve seen, typically art received from a graphics arts department come over as PNG files or uncompressed JPEG’s running at a resolution many times that of what is required. Re-size the images down and then save them either as JPEG’s or GIF’s with appropriate compression. PNG files on occasion can produce smaller files but unless you are targeting only browsers which support PNG files, leave them for development purposes only.

Next it’s time to look at the code, assuming you already looked for and corrected issues with “code smell” are your files gzip’d? Files can be compressed in advanced but on modern servers it very rarely shows any advantage over run-time compression. Compressing all your PHP, ASP, ASPX, CSS, JS, and XML files can save a large amount on the file size. You may ask yourself but what about the increased CPU load on the server and client when performing all that compression? Truthfully the load increase is very minimal compared to the amount of savings in time as most times the slowest link of anywhere is the users internet or network connection.


Below are some links for some of the tools listed above along with a few extras for test automation and sql injection tests.

Firefox –
Firebug – – there is also FlashFirebug and FirePHP
YSlow –
HTML Validator –
Live Headers –
SQL Inject Me –
HttpWatch basic edition – – for IE performance testing, though it can be used for Firefox as well. There is also a more featured commercial version available.
iMacros – – Rendered test-driven performance automation, a bit more involved but can be useful for automated testing in certain scenarios.

, , , , ,
February 7, 2011 at 10:12 pm Comment (1)

Windows Unicast NLB Performance Tuning

Windows Server 2000 and later offer a clustering option known as Windows Network Load Balancing (or NLB for short). This technology allows for a very cost-effective clustering solution. When working with the lowest common denominator of a switched networking architecture, NLB is limited to Unicast operations only. As this technology has been out for nearly I decade, I pause to bring to being it up but in productions environments I keep running across performance impacts from using this technology. Per Microsoft:

In its default unicast mode of operation, Network Load Balancing reassigns the station address (“MAC” address) of the network adapter for which it is enabled (called the cluster adapter), and all cluster hosts are assigned the same MAC address. Incoming packets are thereby received by all cluster hosts and passed up to the Network Load Balancing driver for filtering.

Network Load Balancing’s unicast mode induces switch flooding in order to simultaneously deliver incoming network traffic to all cluster hosts.

I see of hear of small-to-medium sized organizations introducing multiple vlans on their networks for performance to control broadcast storms, which is a great starting step. Where it stops often is at the data center where there is a single vlan for all server traffic. Microsoft mentions that there is a “port-flooding” condition that may occur, but at what level? For example, lets introduce a pair of IIS NLB clusters into a single vlan with gigabit connectivity along with say less than 100 other servers with traffic in the neighborhood of 200 simultaneous connections or less. Everything still works, performance may seem a little sluggish but nothing to noticeable.

Now lets scale the traffic up by either upping the connections to 1,000+ or having a backup solution point to a DNS entry or one of the IP’s on the network card which is part of the NLB cluster. You will start to see switch ports lit up like a Christmas tree and periodic dropped packets on the given vlan.

With NLB in Unicast mode every packet received is sent as a broadcast packet to every member of the vlan. Separately this can introduce security issues for the environment.

Graphically, what does this look like? Below you will see an RRD graph with the amount of traffic being sent to a monitoring port on the network, the baseline is from 30 to 35 kbps. In this scenario there is on NLB cluster offering up IIS under Server 2003 and a second NLB clustering offering up Microsoft Exchange 2007 CAS/Frontend services. Each cluster introduces approximately 15 kbps of traffic to every node on the vlan. You will also notice that by design the Unicast NLB method introduces this problem on the receive-side only, packet transmission from the cluster does not flood the vlan.

NLB Effect

Correction of this design issue is fairly straight-forward. Each Windows NLB cluster should by design be in vlan isolation to prevent port-flooding. If vlan isolation is not an option for weeks/months for whatever the reason you might be able to reduce the scope of the flooding by adjusting the “Port Rules” option as shown below. For vlan sizing I would take into account whatever your current plans or end-game ideas for the cluster (whichever is larger), then double it, add the number of routing virtual IP’s from the networking side and add one for troubleshooting. For smaller clusters a /28 would be sufficient to meet these requirements which allows for future expansion, cluster node upgrading/replacement, and a spare IP for troubleshooting in case a problem should arise.

Microsoft reference:

, , , ,
February 2, 2011 at 1:42 pm Comments (0)