.. This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode ================================================= Distributor for Active-Active, N+1 Amphorae Setup ================================================= .. attention:: Please review the active-active topology blueprint first ( :doc:`active-active-topology` ) https://blueprints.launchpad.net/octavia/+spec/active-active-topology Problem description =================== This blueprint describes how Octavia implements a *Distributor* to support the *active-active* loadbalancer (LB) solution, as described in the blueprint linked above. It presents the high-level Distributor design and suggests high-level code changes to the current code base to realize this design. In a nutshell, in an *active-active* topology, an *Amphora Cluster* of two or more active Amphorae collectively provide the loadbalancing service. It is designed as a 2-step loadbalancing process; first, a lightweight *distribution* of VIP traffic over an Amphora Cluster; then, full-featured loadbalancing of traffic over the back-end members. Since a single loadbalancing service, which is addressable by a single VIP address, is served by several Amphorae at the same time, there is a need to distribute incoming requests among these Amphorae -- that is the role of the *Distributor*. This blueprint uses terminology defined in the Octavia glossary when available, and defines new terms to describe new components and features as necessary. .. _P2: **Note:** Items marked with [P2]_ refer to lower priority features to be designed / implemented only after initial release. Proposed change =============== * Octavia shall implement a Distributor to support the active-active topology. * The operator should be able to select and configure the Distributor (e.g., through an Octavia configuration file or [P2]_ through a flavor framework). * Octavia shall support a pluggable design for the Distributor, allowing different implementations. In particular, the Distributor shall be abstracted through a *driver*, similarly to the current support of Amphora implementations. * Octavia shall support different provisioning types for the Distributor; including VM-based (the default, similar to current Amphorae), [P2]_ container-based, and [P2]_ external (vendor-specific) hardware. * The operator shall be able to configure the distribution policies, including affinity and availability (see below for details). Architecture ------------ High-level Topology Description ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * The following diagram illustrates the Distributor's role in an active-active topology: :: Front-End Back-End Internet Networks Networks (world) (tenants) (tenants) ║ A B C A B C ┌──╨───┐floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║ ║ ║ │ ├─ to VIP ──►╢◄──────║───────║──┤f.e. IPs│ Amphorae │b.e.├►╜ ║ ║ │ │ LB A ║ ║ ║ └──┬─────┤ of │ IPs│ ║ ║ │ │ ║ ║ ║ │VIP A│ Tenant A ├────┘ ║ ║ │ GW │ ║ ║ ║ └─────┴──────────┘ ║ ║ │Router│floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║ ║ │ ├─ to VIP ───║──────►╟◄──────║──┤f.e. IPs│ Amphorae │b.e.├──►╜ ║ │ │ LB B ║ ║ ║ └──┬─────┤ of │ IPs│ ║ │ │ ║ ║ ║ │VIP B│ Tenant B ├────┘ ║ │ │ ║ ║ ║ └─────┴──────────┘ ║ │ │floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║ │ ├─ to VIP ───║───────║──────►╢◄─┤f.e. IPs│ Amphorae │b.e.├────►╜ └──────┘ LB C ║ ║ ║ └──┬─────┤ of │ IPs│ ║ ║ ║ │VIP C│ Tenant C ├────┘ arp─►╢ arp─►╢ arp─►╢ └─────┴──────────┘ ┌─┴─┐ ║┌─┴─┐ ║┌─┴─┐ ║ │VIP│┌►╜│VIP│┌►╜│VIP│┌►╜ ├───┴┴┐ ├───┴┴┐ ├───┴┴┐ │IP A │ │IP B │ │IP C │ ┌┴─────┴─┴─────┴─┴─────┴┐ │ │ │ Distributor │ │ (multi-tenant) │ └───────────────────────┘ * In the above diagram, several tenants (A, B, C, ...) share the Distributor, yet the Amphorae, and the front- and back-end (tenant) networks are not shared between tenants. (See also "Distributor Sharing" below.) Note that in the initial code implementing the distributor, the distributor will not be shared between tenants, until tests verifying the security of a shared distributor can be implemented. * The Distributor acts as a (one-legged) router, listening on each load balancer's VIP and forwarding to one of its Amphorae. * Each load balancer's VIP is advertised and answered by the Distributor. An ``arp`` request for any of the VIP addresses is answered by the Distributor, hence any traffic sent for each VIP is received by the Distributor (and forwarded to an appropriate Amphora). * ARP is disabled on all the Amphorae for the VIP interface. * The Distributor distributes the traffic of each VIP to an Amphora in the corresponding load balancer Cluster. * An example of high-level data flow: 1. Internet clients access a tenant service through an externally visible floating-IP (IPv4 or IPv6). 2. The GW router maps the floating IP into a loadbalancer's internal VIP on the tenant's front-end network. 3. (1st packet to VIP only) the GW send an ``arp`` request on VIP (tenant front-end) network. The Distributor answers the ``arp`` request with its own MAC address on this network (all the Amphorae on the network can serve the VIP, but do not answer the ``arp``). 4. The GW router forwards the client request to the Distributor. 5. The Distributor forwards the packet to one of the Amphorae on the tenant's front-end network (distributed according some policy, as described below), without changing the destination IP (i.e., still using the VIP). 6. The Amphora accepts the packet and continues the flow on the tenant's back-end network as for other Octavia loadbalancer topologies (non active-active). 7. The outgoing response packets from the Amphora are forwarded directly to the GW router (that is, it does not pass through the Distributor). Affinity of Flows to Amphorae ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Affinity is required to make sure related packets are forwarded to the same Amphora. At minimum, since TCP connections are terminated at the Amphora, all packets that belong to the same flow must be sent to the same Amphora. Enhanced affinity levels can be used to make sure that flows with similar attributes are always sent to the same Amphora; this may be desired to achieve better performance (see discussion below). - [P2]_ The Distributor shall support different modes of client-to-Amphora affinity. The operator should be able to select and configure the desired affinity level. - Since the Distributor is L3 and the "heavy lifting" is expected to be done by the Amphorae, this specification proposes implementing two practical affinity alternatives. Other affinity alternatives may be implemented at a later time. *Source IP and source port* In this mode, the Distributor must always send packets from the same combination of Source IP and Source port to the same Amphora. Since the Target IP and Target Port are fixed per Listener, this mode implies that all packets from the same TCP flow are sent to the same Amphora. This is the minimal affinity mode, as without it TCP connections will break. *Note*: related flows (e.g., parallel client calls from the same HTML page) will typically be distributed to different Amphorae; however, these should still be routed to the same back-end. This could be guaranteed by using cookies and/or by synchronizing the stick-tables. Also, the Amphorae in the Cluster could be configured to use the same hashing parameters (avoid any random seed) to ensure all make similar decisions. *Source IP* (default) In this mode, the Distributor must always send packets from the same source IP to the same Amphora, regardless of port. This mode allows TLS session reuse (e.g., through session ids), where an abbreviated handshake can be used to improve latency and computation time. The main disadvantage of sending all traffic from the same source IP to the same Amphora is that it might lead to poor load distribution for large workloads that have the same source IP (e.g., workload behind a single nat or proxy). **Note on TLS implications**: In some (typical) TLS sessions, the additional load incurred for each new session is significantly larger than the load incurred for each new request or connection on the same session; namely, the total load on each Amphora will be more affected by the number of different source IPs it serves than by the number of connections. Moreover, since the total load on the Cluster incurred by all the connections depends on the level of session reuse, spreading a single source IP over multiple Amphorae *increases* the overall load on the Cluster. Thus, a Distributor that uniformly spreads traffic without affinity per source IP (e.g., uses per-flow affinity only) might cause an increase in overall load on the Cluster that is proportional to the number of Amphorae. For example, in a scale-out scenario (where a new Amphora is spawned to share the total load), moving some flows to the new Amphora might increase the overall Cluster load, negating the benefit of scaling-out. Session reuse helps with the certificate exchange phase. Improvements in performance with the certificate exchange depend on the type of keys used, and is greatest with RSA. Session reuse may be less important with other schemes; shared TLS session tickets are another mechanism that may circumvent the problem; also, upcoming versions of HA-Proxy may be able to obviate this problem by synchronizing TLS state between Amphorae (similar to stick-table protocol). - Per the agreement at the Mitaka mid-cycle, the default affinity shall be based on source-IP only and a consistent hashing function (see below) shall be used to distribute flows in a predictable manner; however, abstraction will be used to allow other implementations at a later time. Forwarding with OVS and OpenFlow Rules ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * The reference implementation of the Distributor shall use OVS for forwarding and configure the Distributor through OpenFlow rules. - OpenFlow rules can be implemented by a software switch (e.g., OVS) that can run on a VM. Thus, can be created and managed by Octavia similarly to creation and management of Amphora VMs. - OpenFlow rules are supported by several HW switches, so the same control plane can be used for both SW and HW implementations. * Outline of Rules - A ``group`` with the ``select`` method is used to distribute IP traffic over multiple Amphorae. There is one ``bucket`` per Amphora -- adding an Amphora adds a new ``bucket`` and deleting and Amphora removes the corresponding ``bucket``. - The ``select`` method supports (OpenFlow v1.5) hashed-based selection of the ``bucket``. The hash can be set up to use different fields, including by source IP only (default) and by source IP and source port. - All buckets route traffic back on the in-port (i.e., no forwarding between ports). This ensures that the same front-end network is used (i.e., the Distributor does not route between front-end networks; therefore, does not mix traffic of different tenants). - The ``bucket`` actions do a re-write of the outgoing packets. It supports re-write of the destination MAC to that of the specific Amphora and re-write of the source MAC to that of the Distributor interface (together these MAC re-writes provide L3 routing functionality). *Note:* alternative re-write rules can be used to support other forwarding mechanisms. - OpenFlow rules are also used to answer ``arp`` requests on the VIP. ``arp`` requests for each VIP are captured, re-written as ``arp`` replies with the MAC address of the particular front-end interface and sent back on the in-port. Again, there is no routing between interfaces. * Handling Amphora failure - Initial implementation will assume a fixed size for each cluster (no elasticity). The hashing will be "consistent" by virtue of never changing the number of ``buckets``. If the cluster size is changed on the fly (there should not be an API to do so) then there are no guarantees on shuffling. - If an Amphora fails then remapping cannot be avoided -- all flows of the failed Amphora must be remapped to a different one. Rather than mapping these flows to other active Amphorae in the cluster, the reference implementation will map all flows to the cluster's *standby* Amphora (i.e. the "+1" Amphora in this "N+1" cluster). This ensures that the cluster size does not change. The only change in the OpenFlow rules would be to replace the MAC of the failed Amphora with that of the standby Amphora. - This implementation is very similar to Active-Standby fail-over. There will be a standby Amphora that can serve traffic in case of failure. The differences from Active-Standby is that a single Amphora acts as a standby for multiple ones; fail-over re-routing is handled through the Distributor (rather than by VRRP); and a whole cluster of Amphorae is active concurrently, to enable support of large workloads. - Health Manager will trigger re-creation of a failed Amphora. Once the Amphora is ready it becomes the new *standby* (no changes to OpenFlow rules). - [P2]_ Handle concurrent failure of more than a single Amphora * Handling Distributor failover - To handle the event of a Distributor failover caused by a catastrophic failure of a Distributor, and in order to preserve the client to Amphora affinity when the Distributor is replaced, the Amphora registration process with the Distributor should preserve positional information. This should ensure that when a new Distributor is created, Amphorae will be assigned to the same buckets to which they were previously assigned. - In the reference implementation, we propose making the Distributor API return the complete list of Amphorae MAC addresses with positional information each time an Amphora is registered or unregistered. Specific proposed changes ------------------------- **Note:** These are changes on top of the changes described in the "Active-Active, N+1 Amphorae Setup" blueprint, (see https://blueprints.launchpad.net/octavia/+spec/active-active-topology) * Create flow for the creation of an Amphora cluster with N active Amphora and one extra standby Amphora. Set-up the Amphora roles accordingly. * Support the creation, connection, and configuration of the various networks and interfaces as described in `high-level topology` diagram. The Distributor shall have a separate interface for each loadbalancer and shall not allow any routing between different ports. In particular, when a loadbalancer is created the Distributor should: - Attach the Distributor to the loadbalancer's front-end network by adding a VIP port to the Distributor (the LB VIP Neutron port). - Configure OpenFlow rules: create a group with the desired cluster size and with the given Amphora MACs; create rules to answer ``arp`` requests for the VIP address. **Notes:** [P2]_ It is desirable that the Distributor be considered as a router by Neutron (to handle port security, network forwarding without ``arp`` spoofing, etc.). This may require changes to Neutron and may also mean that Octavia will be a privileged user of Neutron. Distributor needs to support IPv6 NDP [P2_] If the Distributor is implemented as a container then hot-plugging a port for each VIP might not be possible. If DVR is used then routing rules must be used to forward external traffic to the Distributor rather than rely on ``arp``. In particular, DVR messes-up ``noarp`` settings. * Support Amphora failure recovery - Modify the HM and failure recovery flows to add tasks to notify the ACM when ACTIVE-ACTIVE topology is in use. If an active Amphora fails then it needs to be decommissioned on the Distributor and replaced with the standby. - Failed Amphorae should be recreated as a standby (in the new IN_CLUSTER_STANDBY role). The standby Amphora should also be monitored and recovered on failure. * Distributor driver and Distributor image - The Distributor should be supported similarly to an amphora; namely, have its own abstract driver. - Distributor image (for reference implementation) should include OVS with a recent version (>1.5) that supports hash-based bucket selection. As is done for Amphorae, Distributor image should be installed with public keys to allow secure configuration by the Octavia controller. - Reference implementation shall spawn a new Distributor VM as needed. It shall monitor its health and handle recovery using heartbeats sent to the health monitor in a similar fashion to how this is done presently with Amphorae. [P2]_ Spawn a new Distributor if the number VIPs exceeds a given limit (to limit the number of Neutron ports attached to one Distributor). [P2]_ Add configuration options and/or Operator API to allow operator to request a dedicated Distributor for a VIP (or per tenant). * Define a REST API for Distributor configuration (no SSH API). See below for details. * Create data-model for Distributor. Alternatives ------------ TBD Data model impact ----------------- Add table ``distributor`` with the following columns: * id ``(sa.String(36) , nullable=False)`` ID of Distributor instance. * compute_id ``(sa.String(36), nullable=True)`` ID of compute node running the Distributor. * lb_network_ip ``(sa.String(64), nullable=True)`` IP of Distributor on management network. * status ``(sa.String(36), nullable=True)`` Provisioning status * vip_port_ids (list of ``sa.String(36)``) List of Neutron port IDs. New VIFs may be plugged into the Distributor when a new LB is created. We may need to store the Neutron port IDs in order to support fail-over from one Distributor instance to another. Add table ``distributor_health`` with the following columns: * distributor_id ``(sa.String(36) , nullable=False)`` ID of Distributor instance. * last_update ``(sa.DateTime, nullable=False)`` Last time distributor heartbeat was received by a health monitor. * busy ``(sa.Boolean, nullable=False)`` Field indicating a create / delete or other action is being conducted on the distributor instance (ie. to prevent a race condition when multiple health managers are in use). Add table ``amphora_registration`` with the following columns. This describes which Amphorae are registered with which Distributors and in which order: * lb_id ``(sa.String(36) , nullable=False)`` ID of load balancer. * distributor_id ``(sa.String(36) , nullable=False)`` ID of Distributor instance. * amphora_id ``(sa.String(36) , nullable=False)`` ID of Amphora instance. * position ``(sa.Integer, nullable=True)`` Order in which Amphorae are registered with the Distributor. REST API impact --------------- Distributor will be running its own rest API server. This API will be secured using two-way SSL authentication, and use certificate rotation in the same way this is done with Amphorae today. Following API calls will be addressed. 1. Post VIP Plug Adding a VIP network interface to the Distributor involves tasks which run outside the Distributor itself. Once these are complete, the Distributor must be configured to use the new interface. This is a REST call, similar to what is currently done for Amphorae when connecting to a new member network. `lb_id` An identifier for the particular loadbalancer/VIP. Used for subsequent register/unregister of Amphorae. `vip_address` The IP of the VIP (for which IP to answer ``arp`` requests) `subnet_cidr` Netmask for the VIP's subnet. `gateway` Gateway outbound packets from the VIP ip address should use. `mac_address` MAC address of the new interface corresponding to the VIP. `vrrp_ip` In the case of HA Distributor, this contains the IP address that will be used in setting up the allowed address pairs relationship. (See Amphora VIP plugging under the ACTIVE-STANDBY topology for an example of how this is used.) `host_routes` List of routes that should be added when the VIP is plugged. `alg_extras` Extra arguments related to the algorithm that will be used to distribute requests to Amphorae part of this load balancer configuration. This consists of an algorithm name and affinity type. In the initial release of ACTIVE-ACTIVE, the only valid algorithm will be *hash*, and the affinity type may be ``Source_IP`` or [P2]_ ``Source_IP_AND_port``. 2. Pre VIP unplug Removing a VIP network interface will involve several tasks on the Distributor to gracefully roll-back OVS configuration and other details that were set-up when the VIP was plugged in. `lb_id` ID of the VIP's loadbalancer that will be unplugged. 3. Register Amphorae This adds Amphorae to the configuration for a given load balancer. The Distributor should respond with a new list of all Amphorae registered with the Distributor with positional information. `lb_id` ID of the loadbalancer with which the Amphora will be registered `amphorae` List of Amphorae MAC addresses and (optional) position argument in which they should be registered. 4. Unregister Amphorae This removes Amphorae from the configuration for a given load balancer. The Distributor should respond with a new list of all Amphorae registered with the Distributor with positional information. `lb_id` ID of the loadbalancer with which the Amphora will be registered `amphorae` List of Amphorae MAC addresses that should be unregistered with the Distributor. Security impact --------------- The Distributor is designed to be multi-tenant by default. (Note that the first reference implementation will not be multi-tenant until tests can be developed to verify the security of a multi-tenant reference distributor.) Although each tenant has its own front-end network, the Distributor is connected to all, which might allow leaks between these networks. The rationale is two fold: First, the Distributor should be considered as a trusted infrastructure component. Second, all traffic is external traffic before it reaches the Amphora. Note that the GW router has exactly the same attributes; in other words, logically, we can consider the Distributor to be an extension to the GW (or even use the GW HW to implement the Distributor). This approach might not be considered secure enough for some cases, such as, if LBaaS is used for internal tier-to-tier communication inside a tenant network. Some tenants may want their loadbalancer's VIP to remain private and their front-end network to be isolated. In these cases, in order to accomplish active-active for this tenant we would need separate dedicated Distributor instance(s). Notifications impact -------------------- Other end user impact --------------------- Performance Impact ------------------ Other deployer impact --------------------- Developer impact ---------------- Further Discussion ------------------ .. Note:: This section captures some background, ideas, concerns, and remarks that were raised by various people. Some of the items here can be considered for future/alternative design and some will hopefully make their way into, yet to be written, related blueprints (e.g., auto-scaled topology). [P2]_ Handling changes in Cluster size (manual or auto-scaled) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - The Distributor shall support different mechanism for preserving affinity of flows to Amphorae following a *change in the size* of the Amphorae Cluster. - The goal is to minimize shuffling of client-to-Amphora mapping during cluster size changes: * When an Amphora is removed from the Cluster (e.g., due to failure or scale-down action), all its flows are broken; however, flows to other Amphorae should not be affected. Also, if a drain method is used to empty the Amphora of client flows (in the case of a graceful removal), this should prevent disruption. * When an Amphora is *added* to the Cluster (e.g., recovery of a failed Amphora), some new flows should be distributed to the new Amphora; however, most flows should still go to the same Amphora they were distributed to before the new Amphora was added. For example, if the affinity of flows to Amphorae is per Source IP and a new Amphora was just added then the Distributor should forward packets from this IP only one of only two Amphorae: either the same Amphora as before or the Amphora that was added. Using a simple hash to maintain affinity does not meet this goal. For example, suppose we maintain affinity (for a fixed cluster size) using a hash (for randomizing key distribution) as `chosen_amphora_id = hash(sourceIP # port) mod number_of_amphorae`. When a new Amphora is added or remove the number of Amphorae changes; thus, a different Amphora will be chosen for most flows. - Below are the couple of ways to tackle this shuffling problem. *Consistent Hashing* Consistent hashing is a hashing mechanism (regardless if key is based on IP or IP/port) that preserves most hash mappings during changes in the size of the Amphorae Cluster. In particular, for a cluster with N Amphorae that grows to N+1 Amphorae, a consistent hashing function ensures that, with high probability, only 1/N of inputs flows will be re-hashed (more precisely, K/N keys will be rehashed). Note that, even with consistent hashing, some flows will be remapped and there is only a statistical bound on the number of remapped flows. The "classic" consistent hashing algorithm maps both server IDs and keys to hash values and selects for each key the server with the closest hash value to the key hash value. Lookup generally requires O(log N) to search for the "closest" server. Achieving good distribution requires multiple hashes per server (~10s) -- although these can be pre-computed there is an ~10s*N memory footprint. Other algorithms (e.g., MSFT's Magleb) have better performance, but provide weaker guarantees. There are several consistent hashing libraries available. None are supported in OVS. * Ketama https://github.com/RJ/ketama * Openstack swift http://docs.openstack.org/developer/swift/ring.html * Amazon dynamo http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf We should also strongly consider making any consistent hashing algorithm we develop available to all OpenStack components by making it part of an Oslo library. *Rendezvous hashing* This method provides similar properties to Consistent Hashing (i.e., a hashing function that remaps only 1/N of keys when a cluster with N Amphorae grows to N+1 Amphorae. For each server ID, the algorithm concatenates the key and server ID and computes a hash. The server with the largest hash is chosen. This approach requires O(N) for each lookup, but is much simpler to implement and has virtually no memory footprint. Through search-tree encoding of the server IDs it is possible to achieve O(log N) lookup, but implementation is harder and distribution is not as good. Another feature is that more than one server can be chosen (e.g., two largest values) to handle larger loads -- not directly useful for the Distributor use case. *Hybrid, Permutation-based approach* This is an alternative implementation of consistent hashing that may be simpler to implement. Keys are hashed to a set of buckets; each bucket is pre-mapped to a random permutation of the server IDs. Lookup is by computing a hash of the key to obtain a bucket and then going over the permutation selecting the first server. If a server is marked as "down" the next server in the list is chosen. This approach is similar to Rendezvous hashing if each key is directly pre-mapped to a random permutation (and like it allows more than one server selection). If the number of failed servers is small then lookup is about O(1); memory is O(N * #buckets), where the granularity of distribution is improved by increasing the number of buckets. The permutation-based approach is useful to support clusters of fixed size that need to handle a few nodes going down and then coming back up. If there is an assumption on the number of failures then memory can be reduced to O( max_failures * #buckets). This approach seems to suit the Distributor Active-Active use-case for non-elastic workloads. - Flow tracking is required, even with the above hash functions, to handle the (relatively few) remapped flows. If an existing flow is remapped, its TCP connection would break. This is acceptable when an Amphora goes down and it flows are mapped to a new one. On the other hand, it may be unacceptable when an Amphora is added to the cluster and 1/N of existing flows are remapped. The Distributor may support different modes, as follows. *None / Stateless* In this mode, the Distributor applies its most recent forwarding rules, regardless of previous state. Some existing flows might be remapped to a different Amphora and would be broken. The client would have to recover and establish a connection with the new Amphora (it would still be mapped to the same back-end, if possible). Combined with consistent (or similar) hashing, this may be good enough for many web applications that are built for failure anyway, and can restore their state upon reconnect. *Full flow Tracking* In this mode, the Distributor tracks existing flows to provide full affinity, i.e., only new flows can be remapped to different Amphorae. The Linux connection tracking may be used (e.g., through IPTables or through OpenFlow); however, this might not scale well. Alternatively, the Distributor can use an independent mechanism similar to HA-Proxy sticky-tables to track the flows. Note that the Distributor only needs to track the mapping per source IP and source port (unlike Linux connection tracking which follows the TCP state and related connections). *Use Ryu* Ryu is a well supported and tested python binding for issuing OpenFlow commands. Especially since Neutron recently moved to using this for many of the things it does, using this in the Distributor might make sense for Octavia as well. Forwarding Data-path Implementation Alternatives ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The current design uses L2 forwarding based only on L3 parameters and uses Direct Return routing (one-legged). The rational behind this approach is to keep the Distributor as light as possible and have the Amphorae do the bulk of the work. This allows one (or a few) Distributor instance(s) to serve all traffic even for very large workloads. Other approaches are possible. 2-legged Router _______________ - Distributor acts as router, being in-path on both directions. - New network between Distributor and Amphorae -- Only Distributor on VIP subnet. - No need to use MAC forwarding -- use routing rules LVS ___ Use LVS for Distributor. DNS ___ Use DNS for the Distributor. - Use DNS to map to particular Amphorae. Distribution will be of domain name rather than VIP. - No problem with per-flow affinity, as client will use same IP for entire TCP connection. - Need a different public IP for each Amphora (no VIP) Pure SDN ________ - Implement the OpenFlow rules directly in the network, without a Distributor instance. - If the network infrastructure supports this then the Distributor can become more robust and very lightweight, making it practical to have a dedicated Distributor per VIP (only the rules will be dedicated as the network and SDN controller are shared resources) Distributor Sharing ^^^^^^^^^^^^^^^^^^^ - The initial implementation of the Distributor will not be shared between tenants until tests can be written to verify the security of this solution. - The implementation should support different Distributor sharing and cardinality configurations. This includes single-shared Distributor, multiple-dedicated Distributors, and multiple-shared Distributors. In particular, an abstraction layer should be used and the data-model should include an association between the load balancer and Distributor. - A shared Distributor uses the least amount of resources, but may not meet isolation requirements (performance and/or security) or might become a bottleneck. Distributor High-Availability ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - The Distributor should be highly-available (as this is one of the motivations for the active-active topology). Once the initial active-active functionality is delivered, developing a highly available distributor should take a high priority. - A mechanism similar to the VRRP used on ACTIVE-STANDBY topology Amphorae can be used. - Since the Distributor is stateless (for fixed cluster sizes and if no connection tracking is used) it is possible to set up an active-active configuration and advertise more than one Distributor (e.g, for ECMP). - As a first step, the initial implementation will use a single Distributor instance (i.e., will not be highly-available). Health Manager will monitor the Distributor health and initiate recovery if needed. - The implementation should support plugging-in a hardware-based implementation of the Distributor that may have its own high-availability support. - In order to preserve client to Amphora affinity in the case of a failover, a VRRP-like HA Distributor has several options. We could potentially push Amphora registrations to the standby Distributor with the position arguments specified, in order to guarantee the active and standby Distributor always have the same configuration. Or, we could invent and utilize a synchronization protocol between the active and standby Distributors. This will be explored and decided when an HA Distributor specification is written and approved. Implementation ============== Assignee(s) ----------- Work Items ---------- Dependencies ============ Testing ======= * Unit tests with tox. * Function tests with tox. Documentation Impact ==================== References ========== .. [1] https://blueprints.launchpad.net/octavia/+spec/base-image .. [2] https://blueprints.launchpad.net/octavia/+spec/controller-worker .. [3] https://blueprints.launchpad.net/octavia/+spec/amphora-driver-interface .. [4] https://blueprints.launchpad.net/octavia/+spec/controller .. [5] https://blueprints.launchpad.net/octavia/+spec/operator-api .. [6] :doc:`../../api/haproxy-amphora-api` .. [7] https://blueprints.launchpad.net/octavia/+spec/active-active-topology