How NFS Behavior Changes in a Virtualized Environment

We have a guest blog post today from Mark Gritter, Tintri co-founder and lead architect. Tintri is a storage solution designed exclusively for virtual machines. The storage appliance can report on bottlenecks from the guest operating system layer through to the storage layer, and it can identify latency at the virtual machine, or vDisk, at any layer of the infrastructure, making it able to pinpoint the source of performance issues.

We are going to have them out to one of our VMUG events in the new year, so we can get a deep dive on their solution.

How NFS Behavior Changes in a Virtualized Environment

In the recent Infosmack podcast on “Designing for VMware,” I mentioned that use of the Network File System protocol (NFS) in virtualized environments differs from conventional NFS behavior. File-based access typically sees as many metadata operations as data operations. But when NFS is serving virtual disks, read and write operations dominate. This is a good example of how the virtualized environment is a mismatch with conventional storage. It is differences like this one that receive benefits from Tintri’s focus on VM-aware storage.

Using the Tintri VMstore’s autosupport features, I was able to examine NFS operation counts over the course of a day from several systems, both those used internally and from several customers. The read/write mix varies a lot across workloads, but the combination accounts for 99 percent or more of the NFS operations.

All of these examples are authentic load data, not benchmarks. The two internal systems are used for build and continuous integration, as well as developer and desktop VMs. The customer systems contain databases, test and development VMs, Web servers, and other application servers.

In contrast, the NFS server Tintri uses internally for user home directories, archives of test results, and other general-purpose file storage, shows two orders of magnitude difference in the proportion of metadata operations. Our system saw a mix of 14 percent read and 32 percent write, with metadata operations comprising the other 54 percent. Of those metadata operations, the biggest contributors are GETATTR with 36 percent, LOOKUP with 5 percent, and ACCESS with 4 percent. The COMMIT call came next with 3 percent—this function isn’t used at all by the vSphere NFS client.

This difference in behavior is very typical, and is reflected in NFS benchmarks as well. The SPEC SFS benchmark consists of 18 percent read operations and 9 percent to 10 percent write (depending on version). That leaves an even larger 72 percent of NFS calls under the benchmark performing metadata operation. In SPECsfs2008, the load is dominated by the GETATTR and LOOKUP operations (26 percent and 24 percent respectively), with ACCESS (11 percent) and SETATTR (4 percent) the other major metadata contributors.

If we take an average of the five examples in the graph above and compare the operations to both our internal file server and the SPEC SFS Benchmark, there’s a dramatic difference in the balance of operations the storage system needs to handle.

A conventional filer must be prepared to accept this varied operation mix, and perform well on it. Engineering effort spent on all these different code paths may hinder VM performance. For example, a traditional file system might devote substantial system resources to handling file lookup and access control—such as a large in-memory cache, or a dedicated thread pool—which provide no benefit for the VM workload and create unnecessary overhead.

The Tintri VMstore, in contrast, was designed from the ground up to work well in the virtualized environment, where data operations dominate the load.