Storage, Storage Networks, & Adapters
Thursday, November 17, 2011
"CLOUD" Infrastructure as a Service (IaaS) and FCoE VN2VN
Friday, June 25, 2010
FCoE Direct End-to-End (aka FCoE VN2VN)
At the latest meeting of the T11.3 standards organization (FC-BB-6 Ad Hoc Working Group) the concept of FCoE Direct End-to-End protocol was accepted for input into the Workgroup's next standard. It is also known as FCoE VN_Port to VN_Port (FCoE VN2VN). This new function permits FCoE adapters, which are interconnected within the same Level 2 Lossless Ethernet network, to discover and connect to compatible FCoE adapters -- which have the appropriate Virtual N_Ports -- and then transmit Fibre Channel commands and data via the standard FCoE protocol.
This is all done on a Lossless Ethernet Network without any assistance from a Fibre Channel Switch or an FCoE Switch (called an FC Forwarder -- FCF). All that is needed is the appropriate VN2VN FCoE Adapters and a Lossless Ethernet layer 2 Network.
There also exists, today, some Open Source FCoE software that only requires a normal Ethernet NIC, to operate standard FCoE protocols (a special Converged Network Adapter -- CNA -- is not required). It is expected that this Open Source software will be updated to also support the new VN2VN function.
The VN2VN (direct End-to-End) function will support 2 types of direct connections:
1. Connections through Lossless Ethernet switches
2. One to One Connection via a single cable (point to point)
The FCoE protocol is made up of 2 types of Ethernet frames (which have their own unique Ethertypes):
1. The FCoE initialization Protocol (FIP) frame packets
2. The Fibre Channel over Ethernet (FCoE) frame packets
The FIP packets are only used as part of discovery and connection setup whereas the FCoE packets carry the actual FC commands and data. The new VN2VN functions have only added additional FIP packets, and have left the rest of the protocol unchanged. The new VN2VN FIP packets were needed since in this mode there is no FCF to provide connection services.
The transfer of FC data and commands via the FCoE protocol -- which was developed in the T11.3 standards organization (FC-BB-5 Ad Hoc Working Group) -- continues to operate as currently specified and will continue unchanged in this new VN2VN environment.
The upper levels of the protocol remain FC, and that means that there continues to be complete compatibility with existing FC & FCoE Device Drivers etc. The vendors are, of course, adding additional management capabilities to exploit the additional capabilities of FCoE, but the command and data protocol do not require any modifications. Likewise, as adapters are updated to support VN2VN mode, the upper layers will retain their current FC compatibility even as additional management capabilities are added to permit ease (and flexibility) of use.
This new VN2VN capability will permit FC protocol to go "Down Market" to entry and Mid Range environments. Yet, as the installation grows it will be able to install FCF switches and thereby obtain the additional functions of a FC network without having to change the server or storage connections.
The new VN2VN capability will be competitive to iSCSI within a Data Center Environment. And I fully expect the Lossless Ethernet Standards, which were focused at a 10Gb/s Ethernet, to be offered by various vendors on 1Gb/s networks and switches. This will mean that FCoE VN2VN will operate very well with the Open Source FCoE code and 1Gb/s NICs without the overhead of TCP/IP. This should make the FCoE VN2VN capability very performance competitive with iSCSI.
Stay tuned to this Blog to see how the capability unfolds.
Wednesday, January 13, 2010
FCoE Adapter Based Shortcuts
I posted a previous Blog Called "New Extensions for Fibre Channel over Ethernet (FCoE)" which gave folks a send ahead of this new direction.
Since that posting there has been a great deal of movement on this front. The author has made a number of proposals (to the T11 FC-BB-6 AdHoc Working group), as have others. At this moment the actions have been coalescing around two related but unique approaches (both proposed by this author). One approach (nicknamed "Adapter Based Shortcuts" -- ABS) that was described in the previous Blog ("New Extensions ….") that involves an FCoE Switch (aka FCF) but only for connection setup etc. and then permits direct Adapter-to-Adapter data and message transfer. The other approach (nicknamed "direct-mode Adapter Based Shortcuts" -- dABS) enables FCoE Adapters (CNAs) to send messages and data to each other without any involvement of an FCoE Switch (FCF).
The value of both of these proposals is that Fibre Channel technology can go "down market" to installations that do not have any significant training on Fibre Channel. The dABS proposal can be used in small installations that need a SAN (Storage Area Network) and only need to have the appropriate Converged Network Adapters -- which supports both IP and FCP on Ethernet -- and the new Lossless Ethernet Switches. The ABS proposal requires (in addition) at least a small (maybe 4 ports) FCF Switch for managing the connection process, Zoning protection, etc. (but some FCFs include the Lossless Ethernet Switches as part of the FCF and this holds down the cost and simplifies the overall installation).
Since FCFs are designed to interconnect with normal Fibre Channel switches, or perform all the Fibre Channel functions on Lossless Ethernet, one can see how it might be possible for a small installation to begin with the dABS approach and over time move to an ABS approach and as the installation continues to grow ultimately embrace a full FCoE or FC Fabric with all the services of Fibre Channel (including Hard Zoning, Virtualization of Storage Controllers, Encryption at Rest, etc.)
However, even if these approaches are compatible with each other and together permit an easy growth path, the T11 Standardization Committee might not want to pursue both approaches. In that case either the dABS or the ABS approach will probably be accepted.
Stay tuned to this Blog for the results as the smoke begins to clear.
Thursday, September 3, 2009
New Extensions for Fibre Channel over Ethernet (FCoE)
There will still be a requirement for FC Services, such as the Login, log-off, Name Server, and Zoning, etc. but the ULP messages (e.g. SCSI) will be able to travel from End Node to End Node without having to pass through a Fibre Channel Forwarder (FCF) or any other kind of a FC Switch. It might even be possible for existing FCFs to be used for the above mentioned FC services. A single FCF (plus a backup) might be able to support a very large configuration of Hosts and Storage Devices since the data will not be flowing through the FCF.
This new FCoE capability will permit new players into the Data Center Fabric, since the major consideration will be their ability to handle the Lossless Ethernet (10Gig) Switching. There will, of course, be some additional requirements (such as dynamic inspection and ACL building) but it will be primarily about the Lossless Ethernet Switch.)
It is also possible that there may be additional features that the new FCoE Converged Network Adapter (CNA) will need to implement in order to permit this type of operation, but this will probably NOT be a show stopper.
Stay tuned here as the various proposal come together to define this exciting new (perhaps game changing) technology.
Thursday, July 9, 2009
RDMA (Remote Direct Memory Access) for the Data Center Ethernet
Now that the T11 Technical Committee has completed their Standards Work on FCoE (Fibre Channel over Ethernet) it is probably time to look at additional technologies that will be able to compliment FCoE.
But first, a bit of FCoE history; the T11 AdHoc Working Group known as FC-BB-5 has been working since 2007 to define a way that the Fibre Channel (FC) protocol can be carried on an Ethernet infrastructure. As part of that effort they managed to get some complimentary work going on within the IEEE 802.1 committee. This committee defined what has been called Converged Enhanced Ethernet (CEE) aka Data Center Ethernet, or Data Center Bridging, which includes a Priority Based Flow Control, and Discovery protocol, among others that will permit vendors to build what T11 calls a “Lossless Ethernet”.
This “Lossless Ethernet” (CEE) is defined to operate only within a single subnet (no IP routing). When messages are required to be sent beyond a single CEE subnet, one of two things must be true:
- The message must NOT be a FCoE message and may therefore transit a router as a normal IP message, where losses may occur
Or - The message is an FCoE message and may transit what is called an FCF (Fibre Channel Forwarder) which acts like a Router for FC messages. The FC messages may be carried onto another subnet via FCoE on a CEE link or on a Physical FC link
The advantage of the FCoE Network is that it’s made up of Lossless Links and Switches, and is primarily being defined for 10Gbps CEE fabrics.
The value of this CEE Network is that it can be used by normal Ethernet packets as well as FCoE packets. This means that it is possible to share the same physical network for all networking requirements. This includes not only the normal Client/Server messaging, but also the Server to Server messaging as well as the Storage Input/Output (I/O).
Because one of the keys to Lossless Ethernet is operating only in a single subnet and not passing through an IP router, it will have a very limited distance capability. However, this limited distance matches the current major Server to Server messaging environments.
One can see examples of Server to Server messaging in the general business environment with the Front-end to Back-end messaging requirements as well as Cluster messaging requirements. However, the most demanding of all Server to Server messaging requirements are found within the environments known as High Performance Computing (HPC) where high performance and low latency is most highly prized. In all these environments you will normally find the Server configurations to be within a single Subnet with as few Switches between the Servers as possible. This is another reason that the CEE environment seems to be very compatible with Enterprise and HPC Server to Server messaging.
Now every vendor’s equipment will, of course, be better then every other vendor’s equipment, however, the goal is clearly to have the total send/receive latency less than 2-4 microseconds so that Sever to Server messaging can fully exploit it. As part of the needed infrastructure, some vendors will provide 10GE CEE switches that will operate in the sub microsecond range. With these types of goals and equipment many vendors believe that the latency of the egress/ingress path is the remaining problem to be solved. They believe that the Host side Adapters or Host Network Stacks need to shed the TCP/IP overhead so that these low latencies can be achieved. However, without TCP/IP, Server to Server messaging is only practical if all the connections are built with CEE components, and stay within the same CEE subnet. Whether or not TCP/IP needs to be removed will be covered latter, but it is safe to say that a CEE Subnet will eliminate many of the retry and error scenarios so that even TCP/IP will operate well in a CEE environment.
This entire discussion means that since the FCoE protocol is based the use of a CEE fabric, and since the CEE network includes Priority Based Flow Control, the needs of the Server to Server messaging and the Storage I/O seem to be very compatible.
Taking all of the above in mind the question then comes down to whether the Host adapters in a CEE environment can be shared between the Ethernet based messaging and the Storage I/O. The answer to this seems to be “YES” since a number of vendors are producing such devices which are called Converged Network Adapters (CNAs). These CNA devices are an evolution of FC host adapters that were called Host Bus Adapters (HBAs) but which now also provide the normal Ethernet Network Interface Controller (NIC) functions, and manage to share that NIC with a Fibre Channel (FC) function called FCoE.
The early versions of these CNAs were made up of; a NIC Chip, a FC chip, and an FCoE encapsulation chip which interfaced the FC function to the NIC Chip. Since then most CNA vendors have integrated those functions together into a single chip (aka ASIC). In any event, the same physical port can be used for Normal NIC functions (which might include normal IP and TCP/IP messaging) as well as Storage I/O, all operating at 10Gbps.
The Internet Engineering Task Force (IETF) standards group defined an RDMA protocol that can be used in a general Ethernet environment, this protocol is called iWARP. The “i” stands for Internet and “WARP” is just a cool name (indicating fast i.e. WARP Drive from Startrek) and has no acronym meeting. This iWARP standard included techniques and protocols for operating on a normal IP network; this included Ethernet and any other network type that would handle IP protocols. To accomplish this it was necessary to use TCP/IP as the Transport.
In general CNAs, today, do not have the capability of built-in RDMA functions via iWARP. As a result, those installations that wish to have RDMA functions between their servers and also have FC based Storage I/O are not able to consolidate onto the same CNAs. In an Ethernet environment it generally requires an iWARP adapter and a separate CNA/HBA for Storage I/O. It was the search for a complete CEE CNA that caused folks to consider whether it was possible to combine the RDMA functions along with the other capabilities of CNAs.
(It should be noted that it is possible to have a Convergence of RDMA messaging (via iWARP) and Storage I/O if the installation uses iSCSI for their Storage I/O. However, the enterprise business opportunity is usually found in a large physical installation that has FC based Storage devices. Except for HPC environments, the integration of iWARP and iSCSI has not really happened, and it is the large business enterprises where the large profitable opportunity exists.)
TCP/IP has all the necessary things built into its protocol to operate on a lossy and error prone network. As a result, some folks have felt that it has too much overhead for a network of servers which may be located on a single subnet. However, until the creation of the CEE capability as part of the FCoE protocol there was no practical alternative. Now that a CEE fabric can be created and since the key most strenuous low latency requirement seems to be within a single subnet, there are thoughts about how best to place the RDMA protocol on the CEE network. There are currently two proposals that will be discussed here. The first is to just use iWARP as is, and the second is to create a dWARP (the “d” stands for Data Center). The proposals can be summarized as follows:
- Use iWARP – We already have iWARP defined, and TCP/IP will work very well on a CEE network. The fact that there are no message drops, and the error rate is low means that TCP/IP is not entering its error path, and the TCP/IP Slow Start, and other such capabilities are not used so they are not impacting the performance of iWARP. Further, the predominate providers of iWARP NICs are offloading the TCP/IP into a TOE (TCP/IP Offload Engine) so the latency is kept to a minimum.
- Use dWARP – regardless of whether CEE reduces the Error conditions, there is additional path length that is needed when TCP/IP is used, and that will affect latency in a negative manner. Also there are always fights between the Server’s OS’s native TCP/IP implementation and the adapter’s Vendor’s TCP/IP, so eliminating TCP/IP will reduce this needless conflict. Further, some vendors believe that when you include TOE capabilities it requires a lot of state maintenance and the size of any ASIC implementation will get very large and requires much more electrical power, and in general cost more. Hence there became a wish to create a CEE based RDMA function that did not need a TOE. There are at least two approaches to this (which also keep the RDMA host Interfaces/APIs the same as iWARP) and they are:
- Encapsulate the functions either directly onto Ethernet packets or onto a packet with IP headers. In one case you would create your own headers (from scratch) and in the other the header would be “IP like” headers even if the Ethertype prevented them from being treated and Routed like IP headers. Therefore, one could possibly build a specialty dWARP Router sometime in the future and not have to reinvent the things that have been learned about IP Routing, and perhaps use the same code for many functions if only an IP like header is used.
- Exploit the capabilities of FCoE by placing the RDMA functions into FC protocols. In this case it could ride along with all the capabilities being built into Data Center Ethernet and even be able to be forwarded (Routed) to other subnets via an FCF, if that function was ever required.
Either of these dWARP proposals might result in the smallest CNA ASIC chip, since the addition of a TOE would not be required.
- Encapsulate the functions either directly onto Ethernet packets or onto a packet with IP headers. In one case you would create your own headers (from scratch) and in the other the header would be “IP like” headers even if the Ethertype prevented them from being treated and Routed like IP headers. Therefore, one could possibly build a specialty dWARP Router sometime in the future and not have to reinvent the things that have been learned about IP Routing, and perhaps use the same code for many functions if only an IP like header is used.
At this moment there is work in the T11 Standards group to define dWARP, and as of this writing, it looks like it will take the form of 2b. This means that any installation that wants to use FCoE for its Storage I/O will be able with little if any additional cost, be able to have a Low Latency RDMA protocol that can be used on their CEE fabric.
On the other hand, if the installation desires to use RDMA across a routable non CEE network, then iWARP is currently the only game in town. An example of the usefulness of this capability can be seen in Client/Server messaging in which Clients are almost always located outside the Data Center. Unfortunately there are very few Client/Server installations that use iWARP, because of:
The cost of the adapters is relatively high for a Client system
At this time, the predominate desktop OS manufacture (Microsoft) has decided NOT to implement iWARP in software, like they did for iSCSI (internet Small Computer System Interconnect), so the potential reductions in cost for each Client system (which often has CPU cycles to spare) has not been possible. (This is regrettable since the true value of RDMA in a Client/Server environment is the reduction of overhead in the Server which could bear the additional cost of a physical iWARP adapter).
iWARP client software, outside of the Microsoft environment is still very embolic and is not seeing traction in Enterprise environments.
As a resultant the business for iWARP outside of a Single Subnet has been very small.
On the other hand, if a software client was commercially available -- for desktop systems -- that might also foster the development of Bridges/Proxies that could sit on the edge of the CEE network and map dWARP server packets into iWARP client packets (and visa versa).
Summary
Without a software implementation for Clients being widely available the primary place where iWARP will be found is within a Data Center and on a single Subnet where Servers send messages to other Servers. That being the case, there is a strong motivation to exploit the capabilities of CEE and integrate these RDMA functions with the current CNAs and permit the complete convergence of the Data Center Ethernet Fabric (CEE) using dWARP enabled CNAs (Converged Network Adapters without a TOE).
Whether or not a CNA using dWARP is a significantly better performer, and is cheaper than a CNA using iWARP on CEE, is yet to be shown, however, this is where the new RDMA messaging battle ground will be fought.
Tuesday, April 14, 2009
iSCSI vrs NAS
The discussion seems to center around looking at the iSCSI and NAS technologies as if it were interchangeable. It is true that both technologies can be used for reading and writing storage and it is also true that NAS filers (or storage controllers) can do everything that an iSCSI storage controller can do, plus more. However, they are fundamentally different in their structure and as a result are significantly different in what hardware processing capabilities (CPU, Memory, etc.) are required to support their capabilities.
The iSCSI structure is based on SCSI Block protocol, which is created as a result of application file system calls for Reads or Writes. The NAS (NFS/CIFS) structure is based on special “Client-Server” protocols which are also created as a result of application file systems calls.
In the case of NAS the file system work is not really done in the client system, but via the NAS (NFS/CIFS) protocol which invokes various functions in the NAS server’s File System. The file system in the NAS server must then convert these file system functions into a SCSI Block protocol that will in turn access the actual storage device. In other words NAS moves the function of the physical file system from the client into the NAS server appliance. The same physical file system work needs to be done whether it is done in the client or in the NAS appliance.

Thursday, April 2, 2009
iSCSI vrs FCoE
I continue to be amused by the people that try to position iSCSI (Internet Small Computer Systems Interconnect) and FCoE (Fibre Channel over Ethernet) by placing them in conflict with each other. One group might say iSCSI is better than FCoE because …… Another group will say FCoE is better than iSCSI because …. In truth they are both wrong and both right. The appropriate truth is all in the circumstances in which the customer finds themselves.
If a customer has an IT shop which has a small number of servers and a minimum amount of external storage, they should very definitely consider iSCSI and define a SAN (Storage Area Network) with normal Ethernet. An iSCSI network is easily setup and will often be all that is needed for this type of environment. And the cost is usually significantly less than would be the case with a FC fabric.
In my opinion, if the customer has not had a SAN before, they should consider it; especially if they would like to have easy failover, or use some of the new consolidation capabilities of the various Server Virtualization products. In server virtualization environments, the movement of applications (Virtual Machines) quickly and dynamically between physical servers is very valuable, but requires a SAN that connects the physical servers with external storage controllers. Many customers that desire to have this type of consolidation environment are not familiar with Storage Networking -- and iSCSI operating on a 1Gibabit Ethernet network is not only simple to set up and use, but is usually all that is needed and meets their requirements very well. There is a caution here, and that is in regards to the total bandwidth that might be needed after the consolidation of multiple systems/applications into a single physical server. In some cases the consolidation will require more storage bandwidth than can be handled by a simple 1GE network. That means that one will need to multiply the number of 1GE attachments, and increase the bandwidth capability to/from the physical servers. Depending on the approach, this will either provide a significant increase in the processor cycles (in the case of a software iSCSI driver), or in the number or capabilities of the iSCSI Adapters (which will drive up the cost). So it is possible that with the virtualization of servers, one could find that the cost of an iSCSI solution, in terms of processors cycles or adapter cost will approach that of a FC or FCoE solution. But if the installation is not familiar with storage networking, then only if the installation sees dramatic growth in its future should anything other than iSCSI be seen as the right initial solution.
Customers that already have a large server and storage network have probably already established a Fibre Channel (FC) network and are committed to the high bandwidth and low latency that FC provides. These types of IT organization often have an in-depth knowledge of FC configurations and all that comes with a FC Fabric. It is also not unusual to find FC networks that contain storage functions within the fabric itself (such as Storage Virtualization and Encryption at Rest, etc). That said, many of these organizations still find value in the idea that they might be able to save money by having a common network which includes not only storage access but also the IP messaging that occurs between their servers and clients whether transported across the data center or across the Intranet or Internet. FC over Ethernet (FCoE) is the type of protocol that permits FC to flow over a new type of Ethernet (a Lossless Ethernet within the Data Center), and which also permits the use of other protocols such as TCP/IP etc. The goal of this type of connection is to permit FC protocols and procedures to work with other network protocols. Of course this only makes sense in a FC environment if the speed of the new (lossless) Ethernet fabric is fast enough to carry the required storage bandwidth plus the interactive messaging bandwidth associated with the installation’s IP network. This means that since much of FC is operating at a 4GB (or 8GB) speed, the addition of the IP network will often require an Ethernet Fabric with speeds of 10GB (or more). Hence the FCoE Lossless Ethernet has been initially assumed to be a 10GB fabric.
I expect many FC installations to continue to use normal FC and keep their storage and IP networks separate; however, I also expect a large number of installations to move toward FCoE. Even though most of these FC to FCoE installations will first only convert the “Server Edge” (Server Side connection to the network) some may (over time) extended the Lossless Ethernet throughout their Data Center for both IP and Storage Networks. But whether or not they continue to evolve their FC fabric to an FCoE fabric the point is they are quite a different community of customers than those that would be operating an iSCSI network. And to these customers, they see FCoE as the simplest way to evolve to an Ethernet based Fabric while keeping the speed and sophistication of their current FC storage network.
So you see it is not iSCSI vrs FCoE, each protocol meets the needs of a different community of customers. Yes, they can both do similar things, but until iSCSI is shown to perform cost effectively at the high speeds and with the low latency of FCoE, in very complex configurations -- which might also have storage related functions within the Fabric -- iSCSI will not quickly move (if ever) into the high end environment. Likewise, FCoE will not move into the low-mid size environment to displace iSCSI unless it can be shown to be as easy to setup and use while maintaining a low cost profile at least equivalent to iSCSI.
So the bottom line is: iSCSI and FCoE are two different tools that can be used to connect and manage external storage, depending on the customer needs. One tool does not meet all needs, so let’s not even go to the question of which is better iSCSI or FCoE since it depends on the environment of the IT organization.
…………. John L. Hufferd