Storage, Storage Networks, & Adapters: FCoE

Showing posts with label FCoE. Show all posts

Thursday, November 17, 2011

"CLOUD" Infrastructure as a Service (IaaS) and FCoE VN2VN

When the new FCoE (Fibre Channel over Ethernet) VN2VN (aka Direct End Node to End Node) was defined in the T11.3 FC-BB-6 Ad Hoc Working Group it was assumed that it would find a niche in the Low to medium IT organizations that wanted to have compatibility with Fibre Channel (FC). Though that is still valid, it looks as though it may also be important to some of the new "Cloud" services that provide Infrastructure as a Service (IaaS).

FCoE VN2VN is a additional FCoE protocol which permits FCoE End Nodes such as Servers acting as "Initiators" and FCoE End Nodes such as Storage Controllers acting as" Targets" to either directly attach to each other or attach with only lossless Ethernet switches between them (perhaps as few as one switch between the End Nodes). This form of FCoE does not require any FC/FCoE networking equipment.

FCoE VN2VN permits the IaaS organization to enable their installation to provide storage interconnectivity with FC and/or FCoE. FCoE VN2VN capability can be used to give a customer an FCoE VN2VN connection between the servers and the storage that are supplied by the IaaS provider. This VN2VN interconnect can provide the fastest end-to-end connection with the fewest number of "hops" as possible. That is, the data path can traverse between the server and the storage unit by passing through perhaps as few as one Lossless Ethernet switch. No FCF (Fibre Channel Forwarder) is required, which means that no additional FC switching processes and overhead are involved in the data path. In addition, the lossless Ethernet switch can be provided by a great number of vendors, thus permitting the lowest possible cost data path. This means that the IaaS provider can give a customer the fastest interconnect at the lowest possible cost.

To enable this type of capability there is certain implications upon the configuration of the "Cloud" installation. For example: if the customer would like to purchase infrastructure where the required servers and storage can fit into a single rack (or even a 2-3 rack side-to-side configuration) they are candidates for FCoE VN2VN interconnection. In such a configuration a lossless Ethernet switch can be placed at the top of the Rack (or Rack set) and Ethernet connections run from the servers to the Switch and then to the storage units. For total installation flexibility the Top-of Rack (ToR) switches may also be physically interconnected to an End-of-Row (EoR) Director class FCoE switch that may have full FCF capabilities. However, the EoR Director would have no direct involvement with the data path for this IaaS rack-set. It is also possible to have a ToR switch at the top of each rack and have them interconnected with each other. In this case, the data path may go through two ToR switches but would still not need to go through the EoR FCoE Director.

So depending on the needs of the customer, and the physical configuration required by the provider, it is possible to obtain the minimum switch/"hop" count and lowest latency interconnect. This means that the provider of IaaS services can "carve-out" a rack or set of racks that can be dedicated to a specific IaaS customer, and give them isolated service yet when that customers grows and has a much larger requirement, or they leave the IaaS provider's installation, the installation can easily re-task the servers and storage, or expand to other racks of server and storage, without needing to physically re-cable the network configuration.

In this example, the IaaS systems and storage are given their own VLANs that can be used by the FCoE VN2VN to permit "direct" connection between the IaaS customer's servers and storage without involvement of other systems within the IaaS providing installation. It should be noted that when the customer either leaves the installation or expands, the provider can re-task the equipment and remove the VLAN specification, and in the case of expansion utilize a regular FCoE interconnect (via the EoR director FCoE switches).

Likewise, a company often has the need to provide IaaS like services to various internal departments which for various company technical or "political" reasons need to be provided with dedicated server and storage rack(s) which can function as isolated environments for various company departments and projects. This then becomes an internal IaaS "Cloud" environment in which FCoE VN2VN can often be an appropriate solution to this configuration requirement.

But independent of the internal or external "Cloud" IaaS environments FCoE VN2VN is still appropriate for the smaller computing environments such as "Big Box" stores, "Disaster Recovery Trailers" and small to medium IT installations.

In smaller organizations such as local "Big Box" stores, they can have their whole data center located in a single rack which has the appropriate servers and storage all inclusive. In this type of configuration the various Server vendors can be asked to bid on the "total rack" that includes FCoE VN2VN, and often obtain a "total solution" at a minimum cost. I was once associated with an organization that wanted to sell such configurations to the big box stores but was deterred because of the cost of the Fibre Channel Connections and switches. That concern is no longer relevant when FCoE and VN2VN connections, within the rack, are utilized.

I also understand that various "disaster recovery trailers" can utilize such configurations in their trailers when they are used to provide temporary IT service to big box stores (and others) after various disasters.

And, of course, when it comes small to medium IT installations (ones that fit within a single or few Racks) FCoE VN2VN configurations seems to offer a high performing low cost storage interconnect solution that is compatible with future growth into a full FCoE or FC installation. These types of installations may also be seen as a valuable asset that can easily be integrated during a merge or buy-out with larger organizations that probably have an FC and/or FCoE.

Friday, June 25, 2010

FCoE Direct End-to-End (aka FCoE VN2VN)

Blog on June 25, 2010

At the latest meeting of the T11.3 standards organization (FC-BB-6 Ad Hoc Working Group) the concept of FCoE Direct End-to-End protocol was accepted for input into the Workgroup's next standard. It is also known as FCoE VN_Port to VN_Port (FCoE VN2VN). This new function permits FCoE adapters, which are interconnected within the same Level 2 Lossless Ethernet network, to discover and connect to compatible FCoE adapters -- which have the appropriate Virtual N_Ports -- and then transmit Fibre Channel commands and data via the standard FCoE protocol.

This is all done on a Lossless Ethernet Network without any assistance from a Fibre Channel Switch or an FCoE Switch (called an FC Forwarder -- FCF). All that is needed is the appropriate VN2VN FCoE Adapters and a Lossless Ethernet layer 2 Network.

There also exists, today, some Open Source FCoE software that only requires a normal Ethernet NIC, to operate standard FCoE protocols (a special Converged Network Adapter -- CNA -- is not required). It is expected that this Open Source software will be updated to also support the new VN2VN function.

The VN2VN (direct End-to-End) function will support 2 types of direct connections:
1. Connections through Lossless Ethernet switches
2. One to One Connection via a single cable (point to point)

The FCoE protocol is made up of 2 types of Ethernet frames (which have their own unique Ethertypes):
1. The FCoE initialization Protocol (FIP) frame packets
2. The Fibre Channel over Ethernet (FCoE) frame packets

The FIP packets are only used as part of discovery and connection setup whereas the FCoE packets carry the actual FC commands and data. The new VN2VN functions have only added additional FIP packets, and have left the rest of the protocol unchanged. The new VN2VN FIP packets were needed since in this mode there is no FCF to provide connection services.

The transfer of FC data and commands via the FCoE protocol -- which was developed in the T11.3 standards organization (FC-BB-5 Ad Hoc Working Group) -- continues to operate as currently specified and will continue unchanged in this new VN2VN environment.

The upper levels of the protocol remain FC, and that means that there continues to be complete compatibility with existing FC & FCoE Device Drivers etc. The vendors are, of course, adding additional management capabilities to exploit the additional capabilities of FCoE, but the command and data protocol do not require any modifications. Likewise, as adapters are updated to support VN2VN mode, the upper layers will retain their current FC compatibility even as additional management capabilities are added to permit ease (and flexibility) of use.

This new VN2VN capability will permit FC protocol to go "Down Market" to entry and Mid Range environments. Yet, as the installation grows it will be able to install FCF switches and thereby obtain the additional functions of a FC network without having to change the server or storage connections.

The new VN2VN capability will be competitive to iSCSI within a Data Center Environment. And I fully expect the Lossless Ethernet Standards, which were focused at a 10Gb/s Ethernet, to be offered by various vendors on 1Gb/s networks and switches. This will mean that FCoE VN2VN will operate very well with the Open Source FCoE code and 1Gb/s NICs without the overhead of TCP/IP. This should make the FCoE VN2VN capability very performance competitive with iSCSI.

Stay tuned to this Blog to see how the capability unfolds.

Thursday, July 9, 2009

RDMA (Remote Direct Memory Access) for the Data Center Ethernet

Now that the T11 Technical Committee has completed their Standards Work on FCoE (Fibre Channel over Ethernet) it is probably time to look at additional technologies that will be able to compliment FCoE.

But first, a bit of FCoE history; the T11 AdHoc Working Group known as FC-BB-5 has been working since 2007 to define a way that the Fibre Channel (FC) protocol can be carried on an Ethernet infrastructure. As part of that effort they managed to get some complimentary work going on within the IEEE 802.1 committee. This committee defined what has been called Converged Enhanced Ethernet (CEE) aka Data Center Ethernet, or Data Center Bridging, which includes a Priority Based Flow Control, and Discovery protocol, among others that will permit vendors to build what T11 calls a “Lossless Ethernet”.

This “Lossless Ethernet” (CEE) is defined to operate only within a single subnet (no IP routing). When messages are required to be sent beyond a single CEE subnet, one of two things must be true:

The message must NOT be a FCoE message and may therefore transit a router as a normal IP message, where losses may occur
Or
The message is an FCoE message and may transit what is called an FCF (Fibre Channel Forwarder) which acts like a Router for FC messages. The FC messages may be carried onto another subnet via FCoE on a CEE link or on a Physical FC link

The advantage of the FCoE Network is that it’s made up of Lossless Links and Switches, and is primarily being defined for 10Gbps CEE fabrics.

The value of this CEE Network is that it can be used by normal Ethernet packets as well as FCoE packets. This means that it is possible to share the same physical network for all networking requirements. This includes not only the normal Client/Server messaging, but also the Server to Server messaging as well as the Storage Input/Output (I/O).

Because one of the keys to Lossless Ethernet is operating only in a single subnet and not passing through an IP router, it will have a very limited distance capability. However, this limited distance matches the current major Server to Server messaging environments.

One can see examples of Server to Server messaging in the general business environment with the Front-end to Back-end messaging requirements as well as Cluster messaging requirements. However, the most demanding of all Server to Server messaging requirements are found within the environments known as High Performance Computing (HPC) where high performance and low latency is most highly prized. In all these environments you will normally find the Server configurations to be within a single Subnet with as few Switches between the Servers as possible. This is another reason that the CEE environment seems to be very compatible with Enterprise and HPC Server to Server messaging.

Now every vendor’s equipment will, of course, be better then every other vendor’s equipment, however, the goal is clearly to have the total send/receive latency less than 2-4 microseconds so that Sever to Server messaging can fully exploit it. As part of the needed infrastructure, some vendors will provide 10GE CEE switches that will operate in the sub microsecond range. With these types of goals and equipment many vendors believe that the latency of the egress/ingress path is the remaining problem to be solved. They believe that the Host side Adapters or Host Network Stacks need to shed the TCP/IP overhead so that these low latencies can be achieved. However, without TCP/IP, Server to Server messaging is only practical if all the connections are built with CEE components, and stay within the same CEE subnet. Whether or not TCP/IP needs to be removed will be covered latter, but it is safe to say that a CEE Subnet will eliminate many of the retry and error scenarios so that even TCP/IP will operate well in a CEE environment.

This entire discussion means that since the FCoE protocol is based the use of a CEE fabric, and since the CEE network includes Priority Based Flow Control, the needs of the Server to Server messaging and the Storage I/O seem to be very compatible.

Taking all of the above in mind the question then comes down to whether the Host adapters in a CEE environment can be shared between the Ethernet based messaging and the Storage I/O. The answer to this seems to be “YES” since a number of vendors are producing such devices which are called Converged Network Adapters (CNAs). These CNA devices are an evolution of FC host adapters that were called Host Bus Adapters (HBAs) but which now also provide the normal Ethernet Network Interface Controller (NIC) functions, and manage to share that NIC with a Fibre Channel (FC) function called FCoE.

The early versions of these CNAs were made up of; a NIC Chip, a FC chip, and an FCoE encapsulation chip which interfaced the FC function to the NIC Chip. Since then most CNA vendors have integrated those functions together into a single chip (aka ASIC). In any event, the same physical port can be used for Normal NIC functions (which might include normal IP and TCP/IP messaging) as well as Storage I/O, all operating at 10Gbps.

The Internet Engineering Task Force (IETF) standards group defined an RDMA protocol that can be used in a general Ethernet environment, this protocol is called iWARP. The “i” stands for Internet and “WARP” is just a cool name (indicating fast i.e. WARP Drive from Startrek) and has no acronym meeting. This iWARP standard included techniques and protocols for operating on a normal IP network; this included Ethernet and any other network type that would handle IP protocols. To accomplish this it was necessary to use TCP/IP as the Transport.

In general CNAs, today, do not have the capability of built-in RDMA functions via iWARP. As a result, those installations that wish to have RDMA functions between their servers and also have FC based Storage I/O are not able to consolidate onto the same CNAs. In an Ethernet environment it generally requires an iWARP adapter and a separate CNA/HBA for Storage I/O. It was the search for a complete CEE CNA that caused folks to consider whether it was possible to combine the RDMA functions along with the other capabilities of CNAs.

(It should be noted that it is possible to have a Convergence of RDMA messaging (via iWARP) and Storage I/O if the installation uses iSCSI for their Storage I/O. However, the enterprise business opportunity is usually found in a large physical installation that has FC based Storage devices. Except for HPC environments, the integration of iWARP and iSCSI has not really happened, and it is the large business enterprises where the large profitable opportunity exists.)

TCP/IP has all the necessary things built into its protocol to operate on a lossy and error prone network. As a result, some folks have felt that it has too much overhead for a network of servers which may be located on a single subnet. However, until the creation of the CEE capability as part of the FCoE protocol there was no practical alternative. Now that a CEE fabric can be created and since the key most strenuous low latency requirement seems to be within a single subnet, there are thoughts about how best to place the RDMA protocol on the CEE network. There are currently two proposals that will be discussed here. The first is to just use iWARP as is, and the second is to create a dWARP (the “d” stands for Data Center). The proposals can be summarized as follows:

Use iWARP – We already have iWARP defined, and TCP/IP will work very well on a CEE network. The fact that there are no message drops, and the error rate is low means that TCP/IP is not entering its error path, and the TCP/IP Slow Start, and other such capabilities are not used so they are not impacting the performance of iWARP. Further, the predominate providers of iWARP NICs are offloading the TCP/IP into a TOE (TCP/IP Offload Engine) so the latency is kept to a minimum.

Use dWARP – regardless of whether CEE reduces the Error conditions, there is additional path length that is needed when TCP/IP is used, and that will affect latency in a negative manner. Also there are always fights between the Server’s OS’s native TCP/IP implementation and the adapter’s Vendor’s TCP/IP, so eliminating TCP/IP will reduce this needless conflict. Further, some vendors believe that when you include TOE capabilities it requires a lot of state maintenance and the size of any ASIC implementation will get very large and requires much more electrical power, and in general cost more. Hence there became a wish to create a CEE based RDMA function that did not need a TOE. There are at least two approaches to this (which also keep the RDMA host Interfaces/APIs the same as iWARP) and they are:
- Encapsulate the functions either directly onto Ethernet packets or onto a packet with IP headers. In one case you would create your own headers (from scratch) and in the other the header would be “IP like” headers even if the Ethertype prevented them from being treated and Routed like IP headers. Therefore, one could possibly build a specialty dWARP Router sometime in the future and not have to reinvent the things that have been learned about IP Routing, and perhaps use the same code for many functions if only an IP like header is used.
- Exploit the capabilities of FCoE by placing the RDMA functions into FC protocols. In this case it could ride along with all the capabilities being built into Data Center Ethernet and even be able to be forwarded (Routed) to other subnets via an FCF, if that function was ever required.
Either of these dWARP proposals might result in the smallest CNA ASIC chip, since the addition of a TOE would not be required.

At this moment there is work in the T11 Standards group to define dWARP, and as of this writing, it looks like it will take the form of 2b. This means that any installation that wants to use FCoE for its Storage I/O will be able with little if any additional cost, be able to have a Low Latency RDMA protocol that can be used on their CEE fabric.

On the other hand, if the installation desires to use RDMA across a routable non CEE network, then iWARP is currently the only game in town. An example of the usefulness of this capability can be seen in Client/Server messaging in which Clients are almost always located outside the Data Center. Unfortunately there are very few Client/Server installations that use iWARP, because of:

The cost of the adapters is relatively high for a Client system

At this time, the predominate desktop OS manufacture (Microsoft) has decided NOT to implement iWARP in software, like they did for iSCSI (internet Small Computer System Interconnect), so the potential reductions in cost for each Client system (which often has CPU cycles to spare) has not been possible. (This is regrettable since the true value of RDMA in a Client/Server environment is the reduction of overhead in the Server which could bear the additional cost of a physical iWARP adapter).

iWARP client software, outside of the Microsoft environment is still very embolic and is not seeing traction in Enterprise environments.

As a resultant the business for iWARP outside of a Single Subnet has been very small.

On the other hand, if a software client was commercially available -- for desktop systems -- that might also foster the development of Bridges/Proxies that could sit on the edge of the CEE network and map dWARP server packets into iWARP client packets (and visa versa).

Summary

Without a software implementation for Clients being widely available the primary place where iWARP will be found is within a Data Center and on a single Subnet where Servers send messages to other Servers. That being the case, there is a strong motivation to exploit the capabilities of CEE and integrate these RDMA functions with the current CNAs and permit the complete convergence of the Data Center Ethernet Fabric (CEE) using dWARP enabled CNAs (Converged Network Adapters without a TOE).

Whether or not a CNA using dWARP is a significantly better performer, and is cheaper than a CNA using iWARP on CEE, is yet to be shown, however, this is where the new RDMA messaging battle ground will be fought.

Thursday, April 2, 2009

iSCSI vrs FCoE

Blog on -- 2 April 2009

I continue to be amused by the people that try to position iSCSI (Internet Small Computer Systems Interconnect) and FCoE (Fibre Channel over Ethernet) by placing them in conflict with each other. One group might say iSCSI is better than FCoE because …… Another group will say FCoE is better than iSCSI because …. In truth they are both wrong and both right. The appropriate truth is all in the circumstances in which the customer finds themselves.
If a customer has an IT shop which has a small number of servers and a minimum amount of external storage, they should very definitely consider iSCSI and define a SAN (Storage Area Network) with normal Ethernet. An iSCSI network is easily setup and will often be all that is needed for this type of environment. And the cost is usually significantly less than would be the case with a FC fabric.
In my opinion, if the customer has not had a SAN before, they should consider it; especially if they would like to have easy failover, or use some of the new consolidation capabilities of the various Server Virtualization products. In server virtualization environments, the movement of applications (Virtual Machines) quickly and dynamically between physical servers is very valuable, but requires a SAN that connects the physical servers with external storage controllers. Many customers that desire to have this type of consolidation environment are not familiar with Storage Networking -- and iSCSI operating on a 1Gibabit Ethernet network is not only simple to set up and use, but is usually all that is needed and meets their requirements very well. There is a caution here, and that is in regards to the total bandwidth that might be needed after the consolidation of multiple systems/applications into a single physical server. In some cases the consolidation will require more storage bandwidth than can be handled by a simple 1GE network. That means that one will need to multiply the number of 1GE attachments, and increase the bandwidth capability to/from the physical servers. Depending on the approach, this will either provide a significant increase in the processor cycles (in the case of a software iSCSI driver), or in the number or capabilities of the iSCSI Adapters (which will drive up the cost). So it is possible that with the virtualization of servers, one could find that the cost of an iSCSI solution, in terms of processors cycles or adapter cost will approach that of a FC or FCoE solution. But if the installation is not familiar with storage networking, then only if the installation sees dramatic growth in its future should anything other than iSCSI be seen as the right initial solution.
Customers that already have a large server and storage network have probably already established a Fibre Channel (FC) network and are committed to the high bandwidth and low latency that FC provides. These types of IT organization often have an in-depth knowledge of FC configurations and all that comes with a FC Fabric. It is also not unusual to find FC networks that contain storage functions within the fabric itself (such as Storage Virtualization and Encryption at Rest, etc). That said, many of these organizations still find value in the idea that they might be able to save money by having a common network which includes not only storage access but also the IP messaging that occurs between their servers and clients whether transported across the data center or across the Intranet or Internet. FC over Ethernet (FCoE) is the type of protocol that permits FC to flow over a new type of Ethernet (a Lossless Ethernet within the Data Center), and which also permits the use of other protocols such as TCP/IP etc. The goal of this type of connection is to permit FC protocols and procedures to work with other network protocols. Of course this only makes sense in a FC environment if the speed of the new (lossless) Ethernet fabric is fast enough to carry the required storage bandwidth plus the interactive messaging bandwidth associated with the installation’s IP network. This means that since much of FC is operating at a 4GB (or 8GB) speed, the addition of the IP network will often require an Ethernet Fabric with speeds of 10GB (or more). Hence the FCoE Lossless Ethernet has been initially assumed to be a 10GB fabric.
I expect many FC installations to continue to use normal FC and keep their storage and IP networks separate; however, I also expect a large number of installations to move toward FCoE. Even though most of these FC to FCoE installations will first only convert the “Server Edge” (Server Side connection to the network) some may (over time) extended the Lossless Ethernet throughout their Data Center for both IP and Storage Networks. But whether or not they continue to evolve their FC fabric to an FCoE fabric the point is they are quite a different community of customers than those that would be operating an iSCSI network. And to these customers, they see FCoE as the simplest way to evolve to an Ethernet based Fabric while keeping the speed and sophistication of their current FC storage network.
So you see it is not iSCSI vrs FCoE, each protocol meets the needs of a different community of customers. Yes, they can both do similar things, but until iSCSI is shown to perform cost effectively at the high speeds and with the low latency of FCoE, in very complex configurations -- which might also have storage related functions within the Fabric -- iSCSI will not quickly move (if ever) into the high end environment. Likewise, FCoE will not move into the low-mid size environment to displace iSCSI unless it can be shown to be as easy to setup and use while maintaining a low cost profile at least equivalent to iSCSI.
So the bottom line is: iSCSI and FCoE are two different tools that can be used to connect and manage external storage, depending on the customer needs. One tool does not meet all needs, so let’s not even go to the question of which is better iSCSI or FCoE since it depends on the environment of the IT organization.
…………. John L. Hufferd

Storage, Storage Networks, & Adapters