ORCA uses what we might call the "coordinator" approach similar to what
you describe. But rather than defining a separate stitching manager
service to function as the coordinator, the Slice Manager (SM)
representing the slice acts as the coordinator. The SM is a piece of
software representing "the researcher" in your description.
This arrangement does not preclude various peering AMs from coordinating
directly, as in the "peer to peer" approaches ProtoGENI and OMF are
pursuing. But we wanted to avoid requiring neighboring AMs to
pre-coordinate bindings to talk to each other. And we wanted to allow
for scenarios involving "logical" stitches where there is no physical
adjacency, like nodes attaching to a storage server in some other domain.
The authorization questions for stitching should also be on the table.
They are a bit subtle. In your proposal, does the stitch manager
service run with any special privilege? Do the AMs trust it? Does the
CH trust it? Presumably the "researcher" ( or its SM or experiment
control tools) must trust it. What is the basis of this trust?
In general, each stitch involves agreement on a common label, e.g., a
VLAN tag from one AM that is to be spliced into a VLAN from another AM
at some adjacency point. One side of the stitch produces the label, and
the other side consumes it. How does the consuming AM know that the
coordinator reports the label correctly? I think this the nub of the
authorization question for stitching.
In ORCA, the producing AM signs the label, and the coordinator presents
the signed label to the consuming AM. The public keys of the AMs are
endorsed by some mutually trusted third party, such as a CH (or a
certifying authority for a federation). In ORCA, tickets issued by a
broker/CH can serve as the endorsement, since they contain the public
key of the AM named in the ticket, and are signed by the broker. (I
gloss over the multi-broker case.) This allows the SM to coordinate
stitching without any special privilege.
The ORCA protocol does not require any AM to know anything about what
the slice looks like in other domains, except for the labels imported at
the stitching points. Only the SM/coordinator knows the entire
structure of the slice. This is important because we want to avoid
assuming that slices are static entities, or that there is some central
controlling authority for the entire "testbed".
Aaron Falk wrote:
> This is a note motivated by topics raised at the GEC6 Control
> Framework WG meeting. Comments, criticism, feedback, and corrections
> are welcome.
> The issue of slice stitching has come up periodically and in the
> interest of making some progress on it, I wanted to propose a
> mechanism for stitching together aggregates using VLANs. What is
> slice stitching? Slice stitching refers to the process of
> interconnecting slivers between is different GENI aggregates. In the
> near term, GENI needs to be able to create Ethernet VLANs that connect
> aggregates (although over the longer term more diverse
> interconnections will be desired).
> Jeff Chase, in his slides at the GEC6 CFWG meeting , catalogs a
> very good list of questions on stitching:
> * How to join slivers/slices across different aggregates end-to-end?
> * Do we require common labels at junction points?
> * How to connect slivers?
> * Do aggregates negotiate with each other? (peer-to-peer) or a
> clearinghouse or service such as a slice manager coordinate?
> * What about isolation for performance or security?
> The mechanism I propose below address these questions for the narrow
> application of establishing end-to-end Ethernet VLANs. Rather than
> try to solve the general problem, my goal is to establish a
> straightforward way to do stitching so that GENI aggregates and tools
> under development will understand what functions are required.
> Let me illustrate by an example.
> +---------->| Stitching Manager Service (S) |<----------+
> | +--------------------------------+ |
> | | |
> V V V
> +--------------+ +--------------+ +--------------+
> ||AM| | | |AM| | | |AM||
> |+--+ +--| |--+ +--+ +--| |--+ +--+|
> | | |........| | | |........| | |
> | |SW|........|SW| |SW|........|SW| |
> | | |........| | | |........| | |
> | +--| usable |--+ +--| usable |--+ |
> |ProtoGENI (PG)| VLANs |Internet2 (I2)| VLANs |OpenFlow (OF) |
> +--------------+ +--------------+ +--------------+
> 1. A ProtoGENI cluster (PG), Internet2 (I2) and an campus OpenFlow
> network containing several hosts (OF) are configured to function
> as GENI aggregates.
> 2. The ProtoGENI cluster administrator provisions connectivity to
> an Internet2 PoP through a regional network. I2, PG, and the
> regional network administrators engineer the network,
> provisioning a set of VLANs for GENI use PG site and the I2
> PoP. The regional network can be thought of as a 'wire'. The
> engineering of the network is worked out between the
> participants and is a 'local' matter. The result is a set of
> VLANs known to the PG and I2 admins and pre-allocated for GENI use.
> 3. A similar process occurs between Internet2 and the campus
> OpenFlow network.
> 4. A researcher now wishes to create a slice containing resources
> from PG and OF using I2 to provide network connectivity between
> them. The researcher has acquired slice credentials allocated
> by a slice authority recognized by all three aggregates.
> 5. The researcher (via the GENI Aggregate Manger API) requests a
> sliver containing hosts connected by a topology on the ProtoGENI
> cluster. The AM allocates the topology and hosts but does not
> yet connect them to the outside world.
> 6. The above step is applied to the campus OpenFlow network.
> 7. The researcher now requests an I2 sliver providing Ethernet
> connectivity between the ProtoGENI cluster to the OpenFlow
> network. The I2 AM allocates the topology but does not yet
> connect it to the outside world. At this point, three
> disconnected slivers have been established.
> 8. The researcher now provides his slice credentials to a stitching
> manager service, S, with two requests: stitch his PG and I2
> slivers and stitch his I2 and OF slivers. S, using a
> pre-established rule, determines the sort order for stitching is
> PG, OF, I2, meaning that for the PG-I2 VLAN, PG is contacted
> first and for the I2-OF VLAN, OF is contacted first.
> 9. S contacts the ProtoGENI AM, forwarding the slice credentials
> and the request to connect the sliver to Internet2. The PG AM,
> using local policy determined by the ProtoGENI administrator,
> assigns a VLAN connecting the ProtoGENI cluster to Internet2 to
> this slice. The PG-I2 VLAN identifying information is provided
> to S. Even though the mapping has been determined, the PG
> switch is configured to drop traffic on the allocated VLANs
> until there is confirmation that the all stitching required by
> the slice is complete. This is to avoid the possibility of
> traffic injected into a partially configured network.
> 10. S now contacts the I2 AM providing the slice credentials and the
> PG-I2 VLAN identifying information. The I2 AM prepares the
> mapping between the I2 internal network and the PG-I2 VLAN.
> However, as within PG, the I2 switch is configured to drop
> traffic on the allocated VLANs until there is confirmation that
> the all stitching required by the slice is complete.
> 11. The previous two steps are repeated with OF and I2, starting
> with OF (as stated in step 8). At this point S, knows the
> identifying information for all the stitching VLANs assigned to
> this slice. This information is stored for operations and
> forensic use. S also has confirmation that the stitching has
> been completed.
> 12. S sends an indication to PG, OF, and I2 that the end-to-end
> network is configured. Now the rules to drop traffic on the
> assigned VLANs are removed and each switch is configured to
> translate VLAN traffic between the assigned stitching VLAN and
> the internal network. Each network sends a confirmation back to S.
> 13. S tells the researcher the end-to-end network is in place.
> Some of the assumptions here:
> * We assume both switches must be able to do VLAN translation.
> There are other techniques for stitching together VLANs but
> requiring translation at every switch will add minimal
> constraints on how to stitching might be established. In other
> words, VLANs available for inter-aggregate connectivity won't be
> constrained by VLAN IDs in use within either aggregate. VLAN
> translation won't be uniformly available throughout GENI for
> several years and the things will be more complex in the
> intervening period.
> * VLANs are assumed to be pre-established before the stitching
> process is started. Inter-aggregate VLANs will have some
> isolation and performance characteristics assigned to them when
> created, i.e., there may be some performance guarantees that can
> be made for some VLANs but this model isn't intended to support
> on-demand per-slice QoS negotiation on the stitching VLANs.
> * The policy by which an internal VLAN is mapped to an stitching
> VLAN is entirely local and under the control of the aggregate.
> * We assume all networks can be represented as either an aggregate
> or a static VLAN (a 'wire'). I believe the implication here is
> that the backbones behave like aggregates.
> * We assume that S picks the first aggregate in an unambiguous and
> repeatable way (e.g., in some sort order) to avoid race
> conditions where both A and B give out the same VLAN to
> different users.
> * We assume that the aggregate manager can configure the switch to
> connect the assigned VLAN to the network resources allocated to
> the slice.
> * Once a VLAN has been assigned to a slice for stitching, it has
> to be reported and recorded by the GENI clearinghouse for
> operations and forensic use. Therefore, the VLAN identifying
> information needs to be in a standard form.
> Questions I have:
> * The AM is required since the binding is between resources
> allocated to a slice and the VLAN. Will an extension to the
> aggregate API be required to support the protocol above?
> Perhaps not: this sort of looks like a 'revise an existing
> slice' operation.
> * It seems like the 'usable VLANs' could be multiplexed over a
> 802.11QinQ, GRE, OpenVPN, or (G)MPLS tunnels. E.g., if QinQ or
> even an IP tunnel is used, the tunnel should be established but
> VLAN IDs should still be passed to permit demultiplexing at the
> edges. What are the implications?
> Let me conclude by saying I'm sending this to the Control Framework
> and Experimenter Workflow and Services working groups because I
> believe there are implications for both groups embedded in this
> proposal. This is an important experimenter service that is not part
> of the control plane (and thus in scope for the services-wg.
> Additionally, aggregate managers and RSpecs would need to support this
> protocol, making it relevant to the control-wg. I think if we can get
> a rough agreement that something like this would work, I'd like to
> hear what would be needed to prototype it.
> http://groups.geni.net/geni/attachment/wiki/GEC6CFWGAgenda/gec6-cf-chase.ppt >
> services-wg mailing list
> services-wg@... > http://lists.geni.net/mailman/listinfo/services-wg