Global Lambda Integrated Facility

Subject Re: [GLIF controlplane] RE: Network Control Architecture
From Gigi Karmous-Edwards <gigi@xxxxxxxx>
Date Sun, 06 May 2007 09:18:59 -0400

All,

I forgot to mention one more thing: As was discussed in the meeting in February, both strategies can co-exist. We drew this up on the whiteboard the first day and then decided not to have it initially as part of the architecture. If those who were present remember when we drew two separate network domain clouds, (Domain Network Resource Manager ) NRM-A and DNRM-B. Then we discussed, that if they had an agreement between each other such as "inter-domain Dragon" testbed, then we can have another DNRM-AB (one cloud that encapsulates the two smaller ones) for advertising and therefore configuring. In this case if a user request comes in that requires a lightpath across domains A and B, the RB on behalf of the user can make a single request to DNRM-AB. Let me know what the community's thoughts are ....

Kind regards,
Gigi

--------------------------------------------

Gigi Karmous-Edwards
Principal Scientist
Advanced Technology Group
http://www.mcnc.org
MCNC RTP, NC, USA
+1 919-248 -4121
gigi@xxxxxxxx
--------------------------------------------



Gigi Karmous-Edwards wrote:
Hi Jerry and All,

Ok Jerry, I stuck with you on your insightful email ( I started your email a couple of weeks ago and just finished it this morning :-) ). If I can summarize your assertions : When an interdomain lightpath is requested, the resource broker (RB) (which is a servant of a user rather than a domain) talks only to the first Domain's NRM (network resource manager) and then that NRM talks to the second NRM, and so on till the destination. This requires each domain to have established some sort of agreement with all adjacent domains. In your second scenario it seems the user requested a source RM that is not in the RB's "domain" and that the RB will have to forward it to the right RM, then a repeat of the above process.

I think what you described is the ultimate goal of the community, however, due to complexities of the current infrastructures (NRENs, Research testbeds, Global government networks, etc) that require interoperation, it seems that we first need to take small "baby steps". Existing infrastructures include a variety of technologies, different management (TL1, SNMP, CLI, etc.) and control plane (very few deployments of GMPLS) tools for configuration and fault management, also current procedures for information exchange between network domains range from protocols to phone calls/emails. These complexities and other "policy" related challenges force us to break the problem up into smaller functional blocks. I think the framework presented will give us a path forward based on "baby steps" to finally reach the scenario you describe.

I see the problem as having three key challenges:
1) Information dissemination (where is what resource? what are its characteristics? what are its policies for use?) 2) Capability to request reservations on resources globally once discovered ( standard interfaces to query resource managers, with "NO" restrictions on how each resource manager accommodates each request, reuse of existing implementations) 3) Scalability ( division of labor among functional components and responsibilities per domain)


The assumption in the framework sent out has been that an RB takes requests from a particular domain's user/application but behaves as a servant of the domain not a single user. In this case there will be several RBs worldwide, but not one for each user, rather one or two per domain. It is assumed that the knowledge of the different resources globally will be published per domain in a very distributed fashion (each RB will publish the resources and their characteristics hopefully using the schema from the OGF Network Markup Language working group. A query from one RB to the "distributed GLIF resources" will use a type of crawl mechanism to match the requested resources with the "published" resource information that each domain RB publishes on behalf of its RMs. The assumption is, the information published by the RBs is not static and will be updated by each RB when necessary. This email is already getting too long, I suggest that we have a conference call and use a WEB based slide sharing application to go through some scenarios. Any interest?


To summarize, the strategy in your email will be the goal of the community but it will take a while. I think, as a community we can start to develop standard interfaces for the various RMs such as the Generic Network Interface (GNI), this will help us towards interoperability in today's environment.

Please let me know if we should have a GLIF control plane conference call in the next few weeks?

Kind regards,
Gigi

--------------------------------------------

Gigi Karmous-Edwards
Principal Scientist
Advanced Technology Group
http://www.mcnc.org
MCNC RTP, NC, USA
+1 919-248 -4121
gigi@xxxxxxxx
--------------------------------------------



Jerry Sobieski wrote:
Good comments both Steve and Bert...let me chime in: (this is a bit long, but I think it is relevant)

I too think the reservation phase in each domain must be atomic - there are effective ways to do this. The overall process though becomes two phase: HOLD a resource for some finite holding time and provide an ACK to the requestor. At some later time the RM will receive a CONFIRM from the requester, or a RELEASE. If the hold time expires, the resource is released unilaterally. On a macro basis, the reservation of the entire end-to-end lightpath must also be held in the HOLD state while the rest of the application resources are reserved as there may be a dependency between availability of non-network resources and the reserved lightpath. As Steve suggests, this atomic two phase mechanism is used in many other similar reservations systems.

The issue I am concerned about is the roles of the RB and RM. I think the RBs will be numerous - possibly one for every user. I believe we must assume that all networks will default to a stringent "self secure" stance and will only allow access to its RM from known and trusted peers. It doesn't scale for every network to "know" about every other RB in the world (RBs are agents of the user - not of the network) Therefore, for scalability and security reasons, these resource reservation requests must be made between directly peering networks, and each network is responsible for recursively reserving the resources forward toward the destination. This is still a two stage commit as described above but it solves two problems: a) it scales much better as each network only needs to expect queries from its direct peers (and customers) and b) it allows each network to negotiate aggregation policies with its peers for services (enabling economies of scale and global reach). This is not unlike how we place a phone call to anywhere in the world - we don't go asking each network if we can use it, we ask our service provider to do so, they ask theirs, and so on, and so on,...

The above scenario assumes the RB poses the service request to the RM serving the source end of a path. There is a [common?] case where the RB is not at the endpoint(s) and does not know of any RMs at the endpoint (or in the middle for that matter). This brings us to another assumption I think we must make: a RB only knows its *local* network RM. An appropriately designed algorithm should/could forward the request to the source address RM using the same forwarding process as the reservation (but crossgrain toward toward the source), and then the request can be serviced forward normally as described above. (This is the "third party" provisioning scenario.) An alternative model asumes a "minion" agent at the path endpoints that is owned by the end user and knows of its local RM- the minion agent acts as proxy for the RB and makes the reservation request to the minion's RM. (got that?:-) I think we *can* assume that the RB knows of these minions since they reside at the end points (source or destination) at a well known port.

It is important to note that this process relies on each network RM (not the RB) knowing constrained reachability of all endpoints - not unlike current interdomain routing protocols. This allows the RM to postulate which "nexthop" network will provide the best path and try that first. If the RM knows more than just reachability - i.e. if it knows topology, then the RM can select a more specific candidate path and, via authorized recursive querires, can reserve the resource. Only the RM responsible for a network knows the state and availability details associated with the internal network resources, and therefore only the local RM can authoritatively and atomically reserve the resources in that network.

The beauty of this process is that from the RB perspective, the RB need only ask one RM for the entire end-to-end network path. The RM will either return a ticket indicating a path was successfully reserved that meets the requested service characteristics, or a NACK indicating that the resource was not available for some reason. The user must change the requested services parameters somehow before trying again - i.e. change the source or destination addr, the start time, the capacity, etc.)

As Gigi states, once all application resources are reserved in the HOLD state, then all must be CONFIRM'ed which will lock in the reservation. At some delta-t later (which could be 0) there is a separate process that causes the reconfiguration of the network elements to make the reserved resources available for actual use (i.e. the provisioning or signaling process). This process must be correlated to a previous reservation and so the provisioning request (separate from the reservation request) must contain some indicator that is trusted by the network and indicates which reservation is being placed into service (see Leon's work on AAA)

Note that none of the above is predicated on any particular routing or signaling protocol... That being said (:-), DRAGON has implemented much of this functionality using GMPLS protocols. -The DRAGON Network Aware Resource Broker (NARB) is analogous to the network RM and performs the path computation recursively reserving the resources along the way.. It returns a path reservation in the form of an Explicit Route Object (ERO) to the source requestor. This loose hop ERO specifies a path consisting of ingress and egress points at each network boundary. -RSVP then uses this ERO to provision the multi-domain end-to-end path. -The DRAGON Application Specific Topology "Master" is an agent analogous to the RB mentioned above. AST Master queries all the various resource managers (compute nodes, storage, instruments, network, etc) to reserve groups of dependent resources. There is a significant protocol exchange defined for ASTs to construct a workable physical resource grid for the application. What DRAGON has not yet implemented: We have implemented scheduling and policy constraints in the traffic engineering database, but we have not yet implemented the path computation to use those constraints (this will be coming soon). We have atomic reservations, but have not implemented the two phase commit - though we have long recognized it as critical to the bookahead capability and a robust integrated resource scheduling process.

Thanks for sticking with me on this ...:-)
Jerry