AI-based Dynamic Schedule Calculation in Time Sensitive Networks using GCN-TD3 ††thanks: This work has been funded by the Federal Ministry of Digital and Transport Germany under grant number 19OI22015F. The authors are responsible for the content of this (2024)

Syed Tasnimul Islam1TU Chemnitz
Chemnitz, Germany
syed-tasnimul.islam@etit.tu-chemnitz.de
  Anas Bin MuslimUniversity of Applied Sciences Osnabrück
Osnabrück, Germany
a.bin-muslim@hs-osnabrueck.de

Abstract

Offline scheduling in Time Sensitive Networking(TSN) utilizing the Time Aware Shaper (TAS) facilitates optimaldeterministic latency and jitter-bounds calculation for Time-Triggered (TT) flows. However, the dynamic nature of trafficin industrial settings necessitates a strategy for adaptivelyscheduling flows without interrupting existing schedules. Our research identifies critical gaps in current dynamic scheduling methods for TSN and introduces the novel GCN-TD3 approach. This novel approach utilizes a Graph Convolutional Network (GCN) for representing the various relationswithin different components of TSN and employs the TwinDelayed Deep Deterministic Policy Gradient (TD3) algorithmto dynamically schedule any incoming flow. Additionally, anInteger Linear Programming (ILP) based offline scheduler isused both to initiate the simulation and serve as a fallbackmechanism. This mechanism is triggered to recalculate the entireschedule when the predefined threshold of Gate Control List(GCL) length exceeds.Comparative analyses demonstrate that GCN-TD3 outperformsexisting methods like Deep Double Q-Network (DDQN) and DeepDeterministic Policy Gradient (DDPG), exhibiting convergencewithin 4000 epochs with a 90% dynamic TT flow admission ratewhile maintaining deadlines and reducing jitter to as low as 2us.Finally, two modules were developed for the OMNeT++ simulator,facilitating dynamic simulation to evaluate the methodology.

Index Terms:

Time Sensitive Network, TD3, Graph Convolutional Network.

\AddToShipoutPictureBG

*\AtPageUpperLeft

I Introduction and Related Work

Time-Sensitive Networking (TSN) is a Layer 2 technology, a collection of standards orchestrated by the IEEE 802.1 working group’s TSN task group. Originating from the enhancements to Audio Video Bridging (AVB) technologies, TSN standards are designed to enable precise time-synchronization, ensure bounded low latency, guarantee high reliability, and facilitate effective resource management. Among the versatile toolkits of TSN, central to this paper is the exploration of IEEE 802.1Qbv, which plays a pivotal role in TSN. This standard employs GCL at the egress ports of switches, a crucial feature that enables the synchronized execution of distributed tasks across the network, thus meeting stringent real-time operational demands.

Current scheduling models in TSN predominantly rely on static methods, utilizing tools like Network Calculus, Satisfiability Modulo Theory (SMT) solvers and Integer Linear Programming (ILP) for ensuring bandwidth and latency guarantees [1]. Although these approaches are effective in small-scale scenarios, they often struggle to deliver timely solutions in dynamic environments, within a reasonable time in a more dynamic context, such as manufacturing pipelines when the network and flow set size grows [2]. Literature review in [1, Tab. 2] indicates the research gap of schedule calculation with a fixed routing in the domain of runtime reconfiguration of flows. Further literature review, summarized in Tab. I reveals gaps in dynamic scheduling solutions, with studies either leaning towards non-scalable exact methods [3] or heuristic approaches [4], [5], [6], [7]. Despite their attempt to address the computational challenge of NP-hard scheduling problems, these methods lack the balance between efficiency and adaptability.

Recent advancements in Reinforcement Learning (RL) have introduced scalable and adaptable solutions for industrial automation and dynamic task rescheduling in TSN, employing algorithms like DDQN [8], [9], Policy Gradient[10], and Asynchronous Advantage Actor-Critic methods[11]. However, the efficacy of these solutions is contingent upon the strategic selection of algorithms, failing to do so several limitations persist across these studies. Such as the inefficient segmentation of the time domain into fixed slots, i.e. [10] sized relative to the Maximum Transmission Unit (MTU). Nevertheless, they disregard both the varied nature of time-critical scheduled flow size in industrial use cases[12, Tab. 12] and tick granularity (10ns) guideline from IEEE 802.3 working group hence waste network resources. Moreover, the reliance of DRL methods on Convolutional Neural Networks or Multi-Layer Perceptrons introduces challenges with data in non-Euclidean space, necessitating retraining for each new flow or device addition.

The interaction between an ideal RL agent and its environment, crucial for dynamic decision-making, is often hindered by the absence of appropriate simulation tools and cannot solely rely on synthetic data. Although various studies have utilized tools like Omnet++ [9], [13] and Networkx [5], [14], [3], [8], the lack of dynamic simulation capabilities or clear methodologies for integrating such features highlights a significant gap in current research approaches.

[7][15][11][6][3][8][16][14][13][4][5][10][9][2]
Offline Initial Schedule--------
Mixed Traffic--------
GCL synthesis---------
Dynamic Simulation-------
Flow to Queue Mapping-------
DynamicQueueAssignment-------------
Reschedule Trigger----------
Reinforcement Learning---------
GNN------------
  • : Feature Provided. : Feature partially provided. (-): Feature not provided.

We now turn our attention to GCL length, queue usage, and flow-to-queue mapping in commercial TSN switches, notably those like Innoroute Trustnode [17], limited by a GCL length threshold (ω𝜔\omegaitalic_ω) of 128 entries per egress port. Craciunas et al. [18] thoroughly discussed the challenges of egress interleaving and flow isolation as per the 802.1Qbv standards, emphasizing the need for flow to queue assignment and queue management mechanism. The lack of verification against simulated switch models further exempts researchers from addressing these critical considerations. Finally, reconfiguration algorithms aim to integrate flows with minimal, deterministic response time but fall short of the efficiency of offline exact methods. This efficiency gap underscores an unavoidable reliance on the hyperperiod (HP)—the maximum length of the time domain available for accommodating all flows—prompting the necessity for offline recalculations and system restarts to ensure optimal flow handling. Yet, few studies [7], [16], [14], [4] address this need for a fallback mechanism by halting runtime reconfiguration algorithms to recalculate and reintegrate all flows, highlighting a significant gap in dynamic TSN scheduling research.Motivated by the identified research gaps, this paper introduces several key contributions:

II System Model

In this study, we define Time Sensitive Networking (TSN) using standard terminology, incorporating End Systems (ES) equipped with network interface cards (NIC) and TSN switches (or bridges, terms used interchangeably), interconnected by physical links and fully synchronized using the IEEE 802.1 AS protocol. Following IEEE 802.1Qcc, ESs are categorized as talkers (TK) and listeners (LR), with the assumption of deterministic traffic dispatch, potentially via tools like Intel’s DPDK. Our model assumes a store-and-forward mechanism in switches, simplifying the focus solely on transmission delays by omitting processing and propagation delays and inherently eliminating queuing delay due to our model’s no-wait feature.

The network topology is depicted as a directed graph, with vertices representing ESs, switch queues and edges symbolizing full-duplex links. Detailed topology modelling is discussed in Section IV-B. We adapt the configuration 3 from [18] with a novel twist: we limit 4 total queues, three for static scheduling scenario and one reserved for dynamically arriving critical flows. Our strategy designates one queue for critical traffic, another for Audio Video Bridging (AVB) traffic, specifically audio, and a third for Best Effort (BE) traffic, as depicted in Fig. 1(a). The intent to reserve an extra queue for dynamic scheduling scenario is to avoid egress interleaving problem [18]. Additionally to clarify further the leftover unallocated timeslot for the 3 Scheduled Queues Qs(QReservedQScheduledQAVB)subscript𝑄𝑠subscript𝑄𝑅𝑒𝑠𝑒𝑟𝑣𝑒𝑑subscript𝑄𝑆𝑐𝑒𝑑𝑢𝑙𝑒𝑑subscript𝑄𝐴𝑉𝐵Q_{s}\in(Q_{Reserved}\cap Q_{Scheduled}\cap Q_{AVB})italic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∈ ( italic_Q start_POSTSUBSCRIPT italic_R italic_e italic_s italic_e italic_r italic_v italic_e italic_d end_POSTSUBSCRIPT ∩ italic_Q start_POSTSUBSCRIPT italic_S italic_c italic_h italic_e italic_d italic_u italic_l italic_e italic_d end_POSTSUBSCRIPT ∩ italic_Q start_POSTSUBSCRIPT italic_A italic_V italic_B end_POSTSUBSCRIPT ) is allocated for the BE Queue (Qbsubscript𝑄𝑏Q_{b}italic_Q start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT). Staying consistent with industrial use-cases scenario [12], we assume all flow sizes are below the Maximum Transmission Unit (MTU), including AVB (audio traffic), as defined by IEEE working groups[20]. Flow routing calculations fall outside this paper’s scope. In our model, each flow fiFsubscript𝑓𝑖𝐹f_{i}\in Fitalic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_F is defined by the tuple (Di,Li,Pi)subscript𝐷𝑖subscript𝐿𝑖subscript𝑃𝑖{(D_{i},L_{i},P_{i})}( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), representing the deadline, duration (akin to size) and periodicity, respectively. For a given flow i𝑖iitalic_i associated with bridge j𝑗jitalic_j, the allocated queue is denoted as BRj.Qiformulae-sequence𝐵subscript𝑅𝑗subscript𝑄𝑖BR_{j}.Q_{i}italic_B italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.Adopting the term datapath from Pop. et al.[21], denoted as dpj𝑑subscript𝑝𝑗dp_{j}italic_d italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, which represents an ordered sequence of links from a talker TKjTK𝑇subscript𝐾𝑗𝑇𝐾TK_{j}\in TKitalic_T italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_T italic_K to a listener LRkLR𝐿subscript𝑅𝑘𝐿𝑅LR_{k}\in LRitalic_L italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_L italic_R, we extend this concept to define the Network Datapath (NDP). An NDP represents a subset of a datapath that requires a unique, non-overlapping schedule for each flow. Unlike traditional graph formulations that consider bridges merely as nodes, our approach uniquely identifies each bridge’s distinct queues as separate nodes. For example, in Fig. 1(b) a scenario comprising two (TT) and two Audio (AVB) flows the formulation of NDP𝑁𝐷𝑃NDPitalic_N italic_D italic_P for one TT and one AVB flows is as follows:

dpBrown(TT)=[[TK1,BR1],[BR1.Q3,BR2],[BR2.Q3,LR2]]\displaystyle dp_{\text{Brown}}(TT)=[[TK_{1},BR_{1}],[BR_{1}.Q_{3},BR_{2}],[BR%_{2}.Q_{3},LR_{2}]]italic_d italic_p start_POSTSUBSCRIPT Brown end_POSTSUBSCRIPT ( italic_T italic_T ) = [ [ italic_T italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_B italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] , [ italic_B italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . italic_Q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_B italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] , [ italic_B italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . italic_Q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_L italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ]
=[NDP1,NDP3,NDP6].absent𝑁𝐷subscript𝑃1𝑁𝐷subscript𝑃3𝑁𝐷subscript𝑃6\displaystyle=[NDP_{1},NDP_{3},NDP_{6}].= [ italic_N italic_D italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_N italic_D italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_N italic_D italic_P start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ] .
dpGreen(AVB)=[[TK2,BR1],[BR1.Q2,BR2],[BR2.Q2,LR1]]\displaystyle dp_{\text{Green}}(AVB)=[[TK_{2},BR_{1}],[BR_{1}.Q_{2},BR_{2}],[%BR_{2}.Q_{2},LR_{1}]]italic_d italic_p start_POSTSUBSCRIPT Green end_POSTSUBSCRIPT ( italic_A italic_V italic_B ) = [ [ italic_T italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_B italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] , [ italic_B italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_B italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] , [ italic_B italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . italic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_L italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ]
=[NDP2,NDP4,NDP5]absent𝑁𝐷subscript𝑃2𝑁𝐷subscript𝑃4𝑁𝐷subscript𝑃5\displaystyle=[NDP_{2},NDP_{4},NDP_{5}]= [ italic_N italic_D italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_N italic_D italic_P start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_N italic_D italic_P start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ]
AI-based Dynamic Schedule Calculation in Time Sensitive Networks using GCN-TD3††thanks: This work has been funded by the Federal Ministry of Digital and Transport Germany under grant number 19OI22015F. The authors are responsible for the content of this paper. (1)

III Problem Formulation

In this study, we adopt the novel concept of applying the No-wait Job Shop Scheduling (NW-JSSP) methodology to TSN scheduling adapting the name as No-wait Flow Shop Scheduling problem (NW-FSSP), drawing inspiration from [22].We describe the dynamic NW-FSSP with random flow arrival using symbols defined as follows: There are n𝑛nitalic_n successively arriving flows F={f1,f2,,fn}𝐹subscript𝑓1subscript𝑓2subscript𝑓𝑛F=\left\{f_{1},f_{2},...,f_{n}\right\}italic_F = { italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } which should be processed on k𝑘kitalic_k NDPs. Each flow Fisubscript𝐹𝑖F_{i}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT consists of a predetermined sequence of hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT operations. Oifsubscript𝑂𝑖𝑓O_{if}italic_O start_POSTSUBSCRIPT italic_i italic_f end_POSTSUBSCRIPT is the fthsuperscript𝑓𝑡f^{th}italic_f start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT operation of flow Fisubscript𝐹𝑖F_{i}italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which can be processed on assigned NDPs. The processing time of Oifsubscript𝑂𝑖𝑓O_{if}italic_O start_POSTSUBSCRIPT italic_i italic_f end_POSTSUBSCRIPT on NDPk𝑁𝐷subscript𝑃𝑘NDP_{k}italic_N italic_D italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is denoted by Tiksubscript𝑇𝑖𝑘T_{ik}italic_T start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT.As, the minimal expectation from TAS is to have all the scheduled flows reach their destination within the deadline requirement, the objective here is to obtain flow offset ϕisubscriptitalic-ϕ𝑖\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of the dynamically arriving flow such that end-to-end latency of all scheduled flows are minimized.

z=min(i=0F(max(0,EijDi)wi))𝑧superscriptsubscript𝑖0𝐹0subscript𝐸𝑖𝑗subscript𝐷𝑖subscript𝑤𝑖z=\min(\sum_{i=0}^{F}(\max(0,E_{ij}-D_{i})\cdot w_{i}))italic_z = roman_min ( ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F end_POSTSUPERSCRIPT ( roman_max ( 0 , italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )(1)

Here, Eijsubscript𝐸𝑖𝑗E_{ij}italic_E start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT denotes the latency observed by flow i𝑖iitalic_i in datapath j𝑗jitalic_j and weight of the flow is equivalent to the Priority Code Point (PCP) value of the flow. The constraints that are considered both for ILP formulation and RL’s action and reward design are as follows:

  1. 1.

    A flow operation in a subsequent NDP will only commence after the completion of the previous flow operation if the flow traverses multiple NDPs.

  2. 2.

    The end-to-end delay of a flow must not exceed the specified deadline requirement.

  3. 3.

    The offset of a scheduled flow must be non-negative, and the entire transmission window must be accommodated within the flow’s periodicity.

  4. 4.

    Flows within the same NDP must not overlap temporally.

IV GCN-TD3 FrameWork Overview

In Fig.2, we present our novel GCN-TD3 framework, designed as a runtime scheduling mechanism for dynamic flow handling in TSN. Our experimentation utilizes the INET Framework of discrete event simulator Omnet++ and two custom modules to facilitate dynamic simulation. The simulation initiates (Step 1) with the importation of a predefined network topology and flow properties from a CSV file. These are initially processed by the GCN and ILP generator modules. Subsequently, in Step 2, the ILP module constructs the initial Gate Control List (GCL), which is then forwarded to Omnet++ to kickstart the simulation, while the GCN module establishes the state space necessary for the TD3 algorithm, as described in Section IV-B. In Step 3, to simulate the arrival of dynamic flows, flow properties are randomly fetched from the CSV file and processed by the GCN to generate state updates for TD3.

AI-based Dynamic Schedule Calculation in Time Sensitive Networks using GCN-TD3††thanks: This work has been funded by the Federal Ministry of Digital and Transport Germany under grant number 19OI22015F. The authors are responsible for the content of this paper. (2)

In Step 4, guided by Algorithm 1 and incorporating runtime system data, TD3 decides on the integration of new flows. By Step 5, the custom AppController, module begins its role as the first dynamic module, inheriting from the cSimpleModule class of Omnet++. It periodically retrieves data from the Deterministic Dispatcher module, such as node name, application index, and initial offset and adjusts application parameters within the INET hierarchy using the getAncestorPar() function. Concurrently, the DynamicPeriodicGate module, extending INET’s PeriodicGate module, reads CSV file generated by GCL Conversion module to update the GCL for specified transmission gates, seamlessly integrating dynamic flow changes into the simulation. TD3 also calculates a reward function based on real-time simulation data in Step 6. Should TD3 reject a flow, Step 7 is initiated to evaluate the need for a complete rescheduling through the static ILP scheduler, triggering a fallback mechanism for an offline recalibration with ILP and a process restart if deemed necessary.

IV-A Generate Static Schedule with ILP

The detailed ILP formulation, tailored to the constraints outlined in our problem definition, is thoroughly documented in works of Pop et al.[21] and Craciunas et al. [18].We recommend readers consult these sources for a comprehensive understanding. Given our paper’s emphasis on dynamic flow scheduling, we have adapted the mathematical models from these references, applying them both for initial static scheduling and as a fallback recalculation strategy in scenarios where dynamic scheduling encounters challenges.

IV-B Graph representation learning/embedding with GCN

In this study, we overcome the challenge posed by variable-sized node features in a TSN topology by employing a GCN model. The topology is modelled as a graph G(𝐀)=(V,E)𝐺𝐀𝑉𝐸G(\mathbf{A})=(V,E)italic_G ( bold_A ) = ( italic_V , italic_E ) where nodes V𝑉Vitalic_V represent entities like talkers, listeners, scheduled queues, and BE queues in switches. Each node type possesses distinct and variable features. The graph structure is defined by an adjacency matrix 𝐀N×N𝐀superscript𝑁𝑁\mathbf{A}\in\mathbb{R}^{N\times N}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT, maps the interconnections among N𝑁Nitalic_N nodes with directed edges eijEsubscript𝑒𝑖𝑗𝐸e_{ij}\in Eitalic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_E denoting link from node i𝑖iitalic_i to node j𝑗jitalic_j. Drawing inspiration from [23], our GCN model is specifically designed to manage these variable-sized features by initially passing them through specialized input layers designed for each node type. As shown in Fig.3, the architecture comprises two graph convolutional layers with 64 (Conv1) and 32 ( Conv2) neurons respectively, using ReLU activation functions.

AI-based Dynamic Schedule Calculation in Time Sensitive Networks using GCN-TD3††thanks: This work has been funded by the Federal Ministry of Digital and Transport Germany under grant number 19OI22015F. The authors are responsible for the content of this paper. (3)

These layers aggregate feature information from neighbouring nodes, encapsulating each node’s hidden representation. Following this aggregation, a non-linear transformation refines the outputs and a Global Average Pooling layer (FC) reduces these node embeddings into a singular, unified graph representation. The training of our model is conducted using the ADAM optimizer, and a Mean Squared Error (MSE) loss function to fine-tune the model focusing on minimizing the error between the predicted and actual graph representations. Table II represents features considered for each node type. Additionally, a summarized table presented at the bottom of Fig.3 offers a generalized overview of the feature size variations across different node types. This environment manages n𝑛nitalic_n flows, containing both Scheduled Queues (TT and AVB) and Best Effort (BE) queues. Each of these queues is characterized by s𝑠sitalic_s and b𝑏bitalic_b pairs of gate events respectively, which denote the specified gate opening and closing times within a GCL at each egress port. The decision to categorize TT and AVB queues together under scheduled queues, despite their different properties, stems from our dynamic scheduler’s strategy to integrate any new TT flow into the HP𝐻𝑃HPitalic_H italic_P seamlessly, ensuring there is no disruption to the existing schedule of both TT and AVB flows. In contrast, BE flows remain susceptible to preemption.

IV-C TD3 algorithm for dynamic scheduling

In this section, we reframe the problem outlined in Section III as a Markov Decision Process (MDP). The MDP is defined by the quartet ( 𝒮𝒮\mathcal{S}caligraphic_S,𝒜𝒜\mathcal{A}caligraphic_A,𝒮𝒮\mathcal{S’}caligraphic_S ’,\mathcal{R}caligraphic_R), representing state, action, subsequent state, and reward, respectively.

IV-C1 State Space

:The state space 𝒮𝒮\mathcal{S}caligraphic_S is a 22-dimensional vector, where each state si𝒮subscript𝑠𝑖𝒮s_{i}\in\mathcal{S}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_S encapsulates the embedded graph representation of the network topology along with features as shown in Fig.3. Noteworthy features extracted from Omnet++ runtime are Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, number of contained (Fcsubscript𝐹𝑐F_{c}italic_F start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT) and dropped flows (Fdsubscript𝐹𝑑F_{d}italic_F start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT) by individual queues highlighted in blue in Table II.

IV-C2 Action Space

The action at𝒜subscript𝑎𝑡𝒜a_{t}\in\mathcal{A}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A at time t𝑡titalic_t is defined as:at=[a1(t),a2(t),a3(t),..,a3+(mn1)(t),a3+mn(t)]a_{t}=[a_{1}(t),a_{2}(t),a_{3}(t),..,a_{3+(mn-1)}(t),a_{3+mn}(t)]italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_t ) , . . , italic_a start_POSTSUBSCRIPT 3 + ( italic_m italic_n - 1 ) end_POSTSUBSCRIPT ( italic_t ) , italic_a start_POSTSUBSCRIPT 3 + italic_m italic_n end_POSTSUBSCRIPT ( italic_t ) ]; where m𝑚mitalic_m = no. of switches and n=HP/max(Pi)𝑛𝐻𝑃𝑚𝑎𝑥subscript𝑃𝑖n=HP/max(P_{i})italic_n = italic_H italic_P / italic_m italic_a italic_x ( italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) phase of a flow or sub-cycle and Inter Frame Gap (IFG𝐼𝐹𝐺IFGitalic_I italic_F italic_G) is 12 B.

  • a1{0,1}subscript𝑎101a_{1}\in\{0,1\}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ { 0 , 1 }: Reject or accept the incoming flow.

  • a2{0,1}subscript𝑎201a_{2}\in\{0,1\}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ { 0 , 1 }: Forward to scheduled or reserved Queue.

  • a3[0,(HP/n)(m+1)×Lim×IFG]subscript𝑎30𝐻𝑃𝑛𝑚1subscript𝐿𝑖𝑚𝐼𝐹𝐺a_{3}\in[0,(HP/n)-(m+1)\times L_{i}-m\times IFG]italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ [ 0 , ( italic_H italic_P / italic_n ) - ( italic_m + 1 ) × italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_m × italic_I italic_F italic_G ] : Deterministic dispatch time from talker.

  • a3+(mn1)[a3+(n1)×Pi+Li+IFG,n×Pi(m×Li)]subscript𝑎3𝑚𝑛1subscript𝑎3𝑛1subscript𝑃𝑖subscript𝐿𝑖𝐼𝐹𝐺𝑛subscript𝑃𝑖𝑚subscript𝐿𝑖a_{3+(mn-1)}\in[a_{3}+(n-1)\times P_{i}+L_{i}+IFG,n\times P_{i}-(m\times L_{i})]italic_a start_POSTSUBSCRIPT 3 + ( italic_m italic_n - 1 ) end_POSTSUBSCRIPT ∈ [ italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + ( italic_n - 1 ) × italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_I italic_F italic_G , italic_n × italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ( italic_m × italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ]: Gate opening time of the first bridge on the path.

  • a3+mn[a3+(mn1)+Li+IFG,n×PiLi]subscript𝑎3𝑚𝑛subscript𝑎3𝑚𝑛1subscript𝐿𝑖𝐼𝐹𝐺𝑛subscript𝑃𝑖subscript𝐿𝑖a_{3+mn}\in[a_{3+(mn-1)}+L_{i}+IFG,n\times P_{i}-L_{i}]italic_a start_POSTSUBSCRIPT 3 + italic_m italic_n end_POSTSUBSCRIPT ∈ [ italic_a start_POSTSUBSCRIPT 3 + ( italic_m italic_n - 1 ) end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_I italic_F italic_G , italic_n × italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]:Gate opening time of the second bridge.

TalkerListener
Scheduled
Queue
BE
Queue
ID
No. of FlowsPCP---
Flow ID--
Size---
Period---
Latency
Bound
---
Destination---
Source---
Start Time--
End Time--
Bridge
Delay
---
Observed
Latency
---
Flows Contained--
Flows Dropped--
Switch ID--
Queue ID--
Preemptivity--
Gate EventCountOpen Time--
Close Time--
  • : Feature considered. (-): Feature not considered.

IV-C3 Reward Function

The immediate reward formula:

r(st,at)=r(O).r(S).r(B).a1βωFdFd+FcJσJμformulae-sequence𝑟subscript𝑠𝑡subscript𝑎𝑡𝑟𝑂𝑟𝑆𝑟𝐵subscript𝑎1𝛽𝜔subscript𝐹𝑑subscript𝐹𝑑subscript𝐹𝑐subscript𝐽𝜎subscript𝐽𝜇r(s_{t},a_{t})=r(O).r(S).r(B).a_{1}-\frac{\beta}{\omega}-\frac{F_{d}}{F_{d}+F_%{c}}-\frac{J_{\sigma}}{J_{\mu}}italic_r ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_r ( italic_O ) . italic_r ( italic_S ) . italic_r ( italic_B ) . italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - divide start_ARG italic_β end_ARG start_ARG italic_ω end_ARG - divide start_ARG italic_F start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_F start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_F start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_J start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_ARG start_ARG italic_J start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT end_ARG(2)
r(O)={0,iffiFsuch thatEiϕi>Di1,otherwise𝑟𝑂cases0ifsubscript𝑓𝑖𝐹such thatsubscript𝐸𝑖subscriptitalic-ϕ𝑖subscript𝐷𝑖1otherwiser(O)=\begin{cases}0,&\textbf{if }\exists f_{i}\in F\text{ such that }E_{i}-%\phi_{i}>D_{i}\\1,&\textbf{otherwise}\end{cases}italic_r ( italic_O ) = { start_ROW start_CELL 0 , end_CELL start_CELL if ∃ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_F such that italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL otherwise end_CELL end_ROW(3)
r(S)={0,iffrF,qsQs,(ose,cse)Eqssuch that(ϕrj<cse)(ϕrj+Lr>ose)1,otherwise𝑟𝑆cases0formulae-sequenceifsubscript𝑓𝑟𝐹formulae-sequencesubscript𝑞𝑠subscript𝑄𝑠subscript𝑜𝑠𝑒subscript𝑐𝑠𝑒subscript𝐸subscript𝑞𝑠missing-subexpressionsuch thatsubscriptitalic-ϕ𝑟𝑗subscript𝑐𝑠𝑒subscriptitalic-ϕ𝑟𝑗subscript𝐿𝑟subscript𝑜𝑠𝑒missing-subexpression1otherwiser(S)=\begin{cases}0,&\begin{array}[]{@{}l@{\quad}l}\textbf{if }\exists f_{r}%\in F,\exists q_{s}\in Q_{s},\exists(o_{se},c_{se})\in E_{q_{s}}\\\text{such that }(\phi_{rj}<c_{se})\land(\phi_{rj}+L_{r}>o_{se})\end{array}\\1,&\textbf{otherwise}\end{cases}italic_r ( italic_S ) = { start_ROW start_CELL 0 , end_CELL start_CELL start_ARRAY start_ROW start_CELL if ∃ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ italic_F , ∃ italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , ∃ ( italic_o start_POSTSUBSCRIPT italic_s italic_e end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_s italic_e end_POSTSUBSCRIPT ) ∈ italic_E start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL such that ( italic_ϕ start_POSTSUBSCRIPT italic_r italic_j end_POSTSUBSCRIPT < italic_c start_POSTSUBSCRIPT italic_s italic_e end_POSTSUBSCRIPT ) ∧ ( italic_ϕ start_POSTSUBSCRIPT italic_r italic_j end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT > italic_o start_POSTSUBSCRIPT italic_s italic_e end_POSTSUBSCRIPT ) end_CELL start_CELL end_CELL end_ROW end_ARRAY end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL otherwise end_CELL end_ROW(4)
r(B)={1,iffrF,qbQb,(obe,cbe)Eqbsuch that(ϕrj<cbeLg)(ϕrj+Lr>obe+Lg)0,otherwise𝑟𝐵cases1formulae-sequenceifsubscript𝑓𝑟𝐹formulae-sequencesubscript𝑞𝑏subscript𝑄𝑏subscript𝑜𝑏𝑒subscript𝑐𝑏𝑒subscript𝐸subscript𝑞𝑏missing-subexpressionsuch thatmissing-subexpressionsubscriptitalic-ϕ𝑟𝑗subscript𝑐𝑏𝑒subscript𝐿𝑔subscriptitalic-ϕ𝑟𝑗subscript𝐿𝑟subscript𝑜𝑏𝑒subscript𝐿𝑔missing-subexpression0otherwiser(B)=\begin{cases}1,&\begin{array}[]{@{}l@{\quad}l}\textbf{if }\exists f_{r}%\in F,\exists q_{b}\in Q_{b},\exists(o_{be},c_{be})\in E_{q_{b}}\\\text{such that }\\(\phi_{rj}<c_{be}-L_{g})\land(\phi_{rj}+L_{r}>o_{be}+L_{g})\end{array}\\0,&\textbf{otherwise}\end{cases}italic_r ( italic_B ) = { start_ROW start_CELL 1 , end_CELL start_CELL start_ARRAY start_ROW start_CELL if ∃ italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ italic_F , ∃ italic_q start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ italic_Q start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , ∃ ( italic_o start_POSTSUBSCRIPT italic_b italic_e end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_b italic_e end_POSTSUBSCRIPT ) ∈ italic_E start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL such that end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ( italic_ϕ start_POSTSUBSCRIPT italic_r italic_j end_POSTSUBSCRIPT < italic_c start_POSTSUBSCRIPT italic_b italic_e end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) ∧ ( italic_ϕ start_POSTSUBSCRIPT italic_r italic_j end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT > italic_o start_POSTSUBSCRIPT italic_b italic_e end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) end_CELL start_CELL end_CELL end_ROW end_ARRAY end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise end_CELL end_ROW(5)

The leftmost part of the reward function in (2) ensures compliance with several critical constraints: it verifies upon acceptance of the new flow r𝑟ritalic_r whether all flows fiFsubscript𝑓𝑖𝐹f_{i}\in Fitalic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_F reach within deadline (3), prevents r𝑟ritalic_r from being scheduled in switch j𝑗jitalic_j during the time slots allocated to existing flows (4), and ensures adherence to the preemption guard duration Lgsubscript𝐿𝑔L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT requirements of IEEE 802.1 Qbu (5) when occupying time slots left for BE flows. The variables ose,obesubscript𝑜𝑠𝑒subscript𝑜𝑏𝑒o_{se},o_{be}italic_o start_POSTSUBSCRIPT italic_s italic_e end_POSTSUBSCRIPT , italic_o start_POSTSUBSCRIPT italic_b italic_e end_POSTSUBSCRIPT and cse,cbesubscript𝑐𝑠𝑒subscript𝑐𝑏𝑒c_{se},c_{be}italic_c start_POSTSUBSCRIPT italic_s italic_e end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_b italic_e end_POSTSUBSCRIPT denote the opening and closing times for gate events of the scheduled queues Eqssubscript𝐸𝑞𝑠E_{qs}italic_E start_POSTSUBSCRIPT italic_q italic_s end_POSTSUBSCRIPT and BE queue Eqbsubscript𝐸𝑞𝑏E_{qb}italic_E start_POSTSUBSCRIPT italic_q italic_b end_POSTSUBSCRIPT respectively.Penalties are applied to the agent if there’s an increase in the GCL length counter β𝛽\betaitalic_β, the number of dropped flows Fdsubscript𝐹𝑑F_{d}italic_F start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT across queues and the coefficient of variation of jitter JσJμsubscript𝐽𝜎subscript𝐽𝜇\frac{J_{\sigma}}{J_{\mu}}divide start_ARG italic_J start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_ARG start_ARG italic_J start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT end_ARG.

IV-C4 Policy Optimization with TD3 algorithm

As outlined in Algorithm 1, first the policy parameter ϕitalic-ϕ\phiitalic_ϕ and the Q-function parameters θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are initialized. Additionally, discounted reward factor γ𝛾\gammaitalic_γ, the frequency of delayed policy updates η𝜂\etaitalic_η, the proportion factor of target network updates τ𝜏\tauitalic_τ and β𝛽\betaitalic_β are also initialized.

For each iteration t𝑡titalic_t, the agent explores the environment by selecting actions based on its current policy, adding some noise (as stated in line11) for exploration purposes. During the gradient step, a batch of experiences is randomly sampled from the stored replay buffer B𝐵Bitalic_B and used to update the Q-function, which evaluates the quality of actions taken by the agent. As stated in line13, the Bellman equation[24] is used to calculate a target value where the smaller of the two outputs of target critic networks is fed into the equation to avoid overestimating the Q-value. The policy is then updated in a direction that maximizes the Q-function, indicating better action choices. The parameters of the target networks are not updated consecutively like those of the actor and critic networks to avoid overestimation during the training process. These parameters are updated after timestep η𝜂\etaitalic_η and according to hyperparameter τ𝜏\tauitalic_τ as shown in line20 and line21. The algorithm continually refines its policy through iteration, aiming to pinpoint the optimal action parameters for the agent within its environment until β=ω𝛽𝜔\beta=\omegaitalic_β = italic_ω. Then the Reschedule Trigger Generator module halts the algorithm and falls back to the offline scheduler.

1:Initialize:ϕitalic-ϕ\phiitalic_ϕ, θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, \mathcal{B}caligraphic_B, η𝜂\etaitalic_η, γ𝛾\gammaitalic_γ,τ𝜏\tauitalic_τ and β𝛽\betaitalic_β

2:where η,γ,τ𝜂𝛾𝜏absent\eta,\gamma,\tau\initalic_η , italic_γ , italic_τ ∈ [0,1].

3:Set θ1θ1subscriptsuperscript𝜃1subscript𝜃1\theta^{\prime}_{1}\leftarrow\theta_{1}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, θ2θ2subscriptsuperscript𝜃2subscript𝜃2\theta^{\prime}_{2}\leftarrow\theta_{2}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, ϕϕsuperscriptitalic-ϕitalic-ϕ\phi^{\prime}\leftarrow\phiitalic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_ϕ.

4:repeat

5:Observe s𝑠sitalic_s and select aπϕ(st)+ϵsimilar-to𝑎subscript𝜋italic-ϕsubscript𝑠𝑡italic-ϵa\sim\pi_{\phi}(s_{t})+\epsilonitalic_a ∼ italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_ϵ, ϵ𝒩(0,σ)similar-toitalic-ϵ𝒩0𝜎\epsilon\sim\mathcal{N}(0,\sigma)italic_ϵ ∼ caligraphic_N ( 0 , italic_σ )

6:Execute a𝑎aitalic_a and observe next state ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and reward r𝑟ritalic_r

7:Store transition (s,a,r,s)𝑠𝑎𝑟superscript𝑠(s,a,r,s^{\prime})( italic_s , italic_a , italic_r , italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) in \mathcal{B}caligraphic_B

8:fort=1,,χ𝑡1𝜒t=1,\dots,\chiitalic_t = 1 , … , italic_χdo

9:Sample mini-batch of N𝑁Nitalic_N transitions from \mathcal{B}caligraphic_B

10:Compute target actions:

11:a~πϕ(s)+clip(𝒩(0,σ~),c,c)~𝑎subscript𝜋superscriptitalic-ϕsuperscript𝑠clip𝒩0~𝜎𝑐𝑐\tilde{a}\leftarrow\pi_{\phi^{\prime}}(s^{\prime})+\text{clip}(\mathcal{N}(0,%\tilde{\sigma}),-c,c)over~ start_ARG italic_a end_ARG ← italic_π start_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) + clip ( caligraphic_N ( 0 , over~ start_ARG italic_σ end_ARG ) , - italic_c , italic_c )

12:Compute targets:

13:yr+γ(1d)mini=1,2Qθi(s,a~)𝑦𝑟𝛾1𝑑subscript𝑖12subscript𝑄superscriptsubscript𝜃𝑖superscript𝑠~𝑎y\leftarrow r+\gamma(1-d)\min_{i=1,2}Q_{\theta_{i}^{\prime}}(s^{\prime},\tilde%{a})italic_y ← italic_r + italic_γ ( 1 - italic_d ) roman_min start_POSTSUBSCRIPT italic_i = 1 , 2 end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over~ start_ARG italic_a end_ARG )

14:Update critics:

15:θiargminθiN1(yQθi(s,a))2subscript𝜃𝑖subscriptargminsubscript𝜃𝑖superscript𝑁1superscript𝑦subscript𝑄subscript𝜃𝑖𝑠𝑎2\theta_{i}\leftarrow\text{argmin}_{\theta_{i}}N^{-1}\sum(y-Q_{\theta_{i}}(s,a)%)^{2}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← argmin start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ ( italic_y - italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s , italic_a ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

16:iftmodη=0modulo𝑡𝜂0t\mod\eta=0italic_t roman_mod italic_η = 0then

17:Update policy: ϕJ(ϕ)subscriptitalic-ϕ𝐽italic-ϕ\nabla_{\phi}J(\phi)∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT italic_J ( italic_ϕ )

18:=N1aQθ1(s,a)|a=πϕ(s)ϕπϕ(s)absentevaluated-atsuperscript𝑁1subscript𝑎subscript𝑄subscript𝜃1𝑠𝑎𝑎subscript𝜋italic-ϕ𝑠subscriptitalic-ϕsubscript𝜋italic-ϕ𝑠=N^{-1}\sum\nabla_{a}Q_{\theta_{1}}(s,a)|_{a=\pi_{\phi}(s)}\nabla_{\phi}\pi_{%\phi}(s)= italic_N start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ ∇ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s , italic_a ) | start_POSTSUBSCRIPT italic_a = italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_s ) end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_s )

19:Update target networks:

20:θiτθi+(1τ)θisuperscriptsubscript𝜃𝑖𝜏subscript𝜃𝑖1𝜏superscriptsubscript𝜃𝑖\theta_{i}^{\prime}\leftarrow\tau\theta_{i}+(1-\tau)\theta_{i}^{\prime}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_τ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_τ ) italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

121:ϕτϕ+(1τ)ϕsuperscriptitalic-ϕ𝜏italic-ϕ1𝜏superscriptitalic-ϕ\phi^{\prime}\leftarrow\tau\phi+(1-\tau)\phi^{\prime}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_τ italic_ϕ + ( 1 - italic_τ ) italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

22:endif

23:endfor

24:untilβ=ω𝛽𝜔\beta=\omegaitalic_β = italic_ω

V Experimental Evaluation and Result Analysis

We trained the GCN-TD3 algorithm on a robust setup: a virtual machine powered by an Intel Xeon CPU, 32 GB of RAM, Ubuntu 22.04 LTS, and an Nvidia Quadro RTX GPU, running TensorFlow 2.0.

AI-based Dynamic Schedule Calculation in Time Sensitive Networks using GCN-TD3††thanks: This work has been funded by the Federal Ministry of Digital and Transport Germany under grant number 19OI22015F. The authors are responsible for the content of this paper. (4)
AI-based Dynamic Schedule Calculation in Time Sensitive Networks using GCN-TD3††thanks: This work has been funded by the Federal Ministry of Digital and Transport Germany under grant number 19OI22015F. The authors are responsible for the content of this paper. (5)

For solving integer linear programming (ILP), we turned to the academic version of the CPLEX solver. The training configuration for the TD3 agent involved statically defined source and destination nodes for each flow, with TT flows randomly selected based on the tuple (Period,Size)=((50B,250us),(100B,500us),(500B,1ms))𝑃𝑒𝑟𝑖𝑜𝑑𝑆𝑖𝑧𝑒50𝐵250𝑢𝑠100𝐵500𝑢𝑠500𝐵1𝑚𝑠(Period,Size)=((50B,250us),(100B,500us),(500B,1ms))( italic_P italic_e italic_r italic_i italic_o italic_d , italic_S italic_i italic_z italic_e ) = ( ( 50 italic_B , 250 italic_u italic_s ) , ( 100 italic_B , 500 italic_u italic_s ) , ( 500 italic_B , 1 italic_m italic_s ) ), each having a 1 ms deadline. Two AVB flows of size 1000B, 1200B and 4 ms period and 2 ms deadline, were scheduled statically alongside 2 TT flows using ILP. TD3’s hyperparameters were aligned with those recommended in [24, Tab. 3 ], with ω𝜔\omegaitalic_ω set to 128. Over 5000 epochs, the algorithm was trained, with each epoch determined by β=ω𝛽𝜔\beta=\omegaitalic_β = italic_ω. Performance metrics tracked per epoch included the total and requested number of flows, standard deviation of jitter, average end-to-end latency and flow admission rate.

Our study adopts a fully centralized configuration model as outlined by IEEE 802.1 Qcc, utilizing powerful servers in Centralized Network Configuration to run scheduling algorithms. These schedules are then communicated to TSN bridges via standard NETCONF protocol. The primary hardware limitation considered is the GCL length, which significantly influences our evaluation of scheduling algorithms to demonstrate the feasibility of the algorithms on commercial switches. To benchmark the effectiveness of GCN-TD3, it was compared against GCN-DDQN, which employs a time domain segmentation strategy typical in dynamic scheduling RL research [2], [10], and DDPG, favoured by several studies for its policy gradient approach [11]. The performance comparison, depicted in Fig.4, highlights GCN-TD3’s superior and stable flow accommodation capability within the constraints of GCL length. Although GCN-DDPG initially outperforms GCN-TD3, it fails to converge within the allotted time.

AI-based Dynamic Schedule Calculation in Time Sensitive Networks using GCN-TD3††thanks: This work has been funded by the Federal Ministry of Digital and Transport Germany under grant number 19OI22015F. The authors are responsible for the content of this paper. (6)

It is important to consider metrics beyond just flow numbers for comprehensive evaluation, as shown in Fig.5, where GCN-TD3 nearly achieves a 90% admission rate. Even though GCN-DDQN had impressive admission rates, its total flow capacity was lower due to the inherent limitations of time domain segmentation. A key objective of our study is to minimize jitter while adhering to deadlines. As depicted in Fig.6, although all algorithms initially struggled to balance jitter and latency, GCN-TD3 eventually achieved the lowest jitter (nearly 2us) while maintaining low latency and accommodating the most flows among the algorithms compared.It is noteworthy that our initial intention was to evaluate the models using real network data in addition to doing simulation. However, as highlighted in [25], there is a lack of publicly available datasets for TSN that can be used for evaluating or benchmarking algorithms. Consequently, we plan to test the algorithm in future in our own TSN testbed once the setup is completed.

VI Conclusion and Future Work

In this paper, we introduce the GCN-TD3 framework, a cutting-edge AI-based solution designed for dynamic flow scheduling. This robust framework is carefully designed to align with the stringent requirements of deployable hardware and the intricate dynamics of network flows. Our extensive testing demonstrates that GCN-TD3 outperforms current state-of-the-art Reinforcement Learning methods, scheduling 30% more TT flows and reducing jitter by 3%.

However, the GCN-TD3 framework does have its limitations. It currently does not support dynamic path determination for flows and necessitates retraining with any change in the number of switches. Future research will aim to address these challenges by developing capabilities for dynamic routing and increasing reliability through the incorporation of redundant paths. Additionally, we plan to validate our findings within a physical testbed to facilitate the implementation of a digital twin system, further proving the effectiveness and applicability of our framework in real-world scenarios.

References

  • [1]T.Stüber, L.Osswald, S.Lindner, and M.Menth, “A survey of scheduling in time-sensitive networking (tsn),” ArXiv, 2022.
  • [2]J.Min, Y.Kim, M.Kim, J.Paek, and R.Govindan, “Reinforcement learning based routing for time-aware shaper scheduling in time-sensitive networks,” Computer Networks, 2023.
  • [3]N.Nayak, F.Dürr, and K.Rothermel, “Incremental flow scheduling and routing in time-sensitive software-defined networks,” IEEE Transactions on Industrial Informatics, 2018.
  • [4]M.Raagaard, P.Pop, M.Gutiérrez, and W.Steiner, “Runtime reconfiguration of time-sensitive networking (tsn) schedules for fog computing,” Fog World Congress, 2017.
  • [5]J.Li, H.Xiong, Q.Li, F.Xiong, and J.Feng, “Run-time reconfiguration strategy and implementation of time-triggered networks,” Electronics, 2022.
  • [6]A.A. Syed, S.Ayaz, T.Leinmüller, and M.Chandra, “Dynamic scheduling and routing for tsn based in-vehicle networks,” 2021.
  • [7]N.Wang, Q.Yu, H.Wan, X.Song, and X.Zhao, “Adaptive scheduling for multicluster time-triggered train communication networks,” IEEE Transactions on Industrial Informatics, 2019.
  • [8]L.Yang, Y.Wei, F.Yu, and Z.Han, “Joint routing and scheduling optimization in time-sensitive networks using graph-convolutional-network-based deep reinforcement learning,” IEEE Internet of Things Journal, 2022.
  • [9]C.Zhong, H.Jia, H.Wan, and X.Zhao, “Drls: A deep reinforcement learning based scheduler for time-triggered ethernet,” 2021.
  • [10]H.Jia, Y.Jiang, C.Zhong, H.Wan, and X.Zhao, “Ttdeep: Time-triggered scheduling for real-time ethernet via deep reinforcement learning,” Global Communications Conference, 2021.
  • [11]X.Wang, H.Yao, T.Mai, T.Nie, L.Zhu, and Y.Liu, “Deep reinforcement learning aided no-wait flow scheduling in time-sensitive networks,” IEEE Wireless Communications and Networking Conference, 2022.
  • [12]Industrial Internet Consortium, “Time sensitive networks for flexible manufacturing testbed characterization and mapping of converged traffic types,” Available online: https://www.iiconsortium.org/pdf, (accessed on 21 March 2024).
  • [13]A.Nasrallah, V.Balasubramanian, A.S. Thyagaturu, M.Reisslein, and H.Elbakoury, “Reconfiguration algorithms for high precision communications in time sensitive networks,” 2019.
  • [14]Q.Yu, H.Wan, X.Zhao, Y.Gao, and M.Gu, “Online scheduling for dynamic vm migration in multicast time-sensitive networks,” IEEE Transactions on Industrial Informatics, 2020.
  • [15]K.Polachan, C.Singh, and V.PrabhakarT., “Decentralized dynamic gate scheduling of ieee 802.1qbv time aware shaper and a tsn simulator for tactile cyber-physical systems,” IM, 2021.
  • [16]C.Gärtner, A.Rizk, B.Koldehofe, R.Hark, R.Guillaume, and R.Steinmetz, “Leveraging flexibility of time-sensitive networks for dynamic reconfigurability,” 2021 IFIP Networking Conference (IFIP Networking), 2021.
  • [17]InnoRoute. Trustnode router. Accessed: 21 Mar. 2024. [Online]. Available: https://innoroute.com/
  • [18]S.S. Craciunas, R.S. Oliver, M.Chmelík, and W.Steiner, “Scheduling real-time communication in ieee 802.1qbv time sensitive networks,” in Proceedings of the 24th International Conference on Real-Time Networks and Systems - RTNS ’16.Brest, France: ACM Press, 2016, pp. 183–192.
  • [19]F.Chen, Y.-C. Wang, B.Wang, and C.-C.J. Kuo, “Graph representation learning: a survey,” APSIPA Transactions on Signal and Information Processing, vol.9, p. e15, 2020.
  • [20]IEEE 802.1 Working Group, “Iec/ieee 60802 tsn profile for industrial automation,” https://www.ieee802.org/1/files/public/docs2018, 2021, accessed: 26 Mar. 2024.
  • [21]P.Pop, M.L. Raagaard, S.S. Craciunas, and W.Steiner, “Design optimisation of cyber-physical distributed systems using ieee time-sensitive networks,” IET Cyber-Physical Systems: Theory and Applications, vol.1, no.1, pp. 86–94, 2016.
  • [22]F.Dürr and N.G. Nayak, “No-wait packet scheduling for ieee time-sensitive networks (tsn),” in Proceedings of the 24th International Conference on Real-Time Networks and Systems.New York, NY, USA: Association for Computing Machinery, 2016.
  • [23]T.N. Kipf and M.Welling, “Semi-supervised classification with graph convolutional networks,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
  • [24]S.Fujimoto, H.van Hoof, and D.Meger, “Addressing function approximation error in actor-critic methods,” 2018.
  • [25]D.Ergenç, N.S. Bülbül, L.Maile, A.Arestova, and M.Fischer, “Towards synthesizing datasets for ieee 802.1 time-sensitive networking,” 2023.
AI-based Dynamic Schedule Calculation in Time Sensitive Networks using GCN-TD3
††thanks: This work has been funded by the Federal Ministry of Digital and Transport Germany under grant number 19OI22015F. The authors are responsible for the content of this  (2024)

References

Top Articles
Latest Posts
Article information

Author: Cheryll Lueilwitz

Last Updated:

Views: 6391

Rating: 4.3 / 5 (54 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Cheryll Lueilwitz

Birthday: 1997-12-23

Address: 4653 O'Kon Hill, Lake Juanstad, AR 65469

Phone: +494124489301

Job: Marketing Representative

Hobby: Reading, Ice skating, Foraging, BASE jumping, Hiking, Skateboarding, Kayaking

Introduction: My name is Cheryll Lueilwitz, I am a sparkling, clean, super, lucky, joyous, outstanding, lucky person who loves writing and wants to share my knowledge and understanding with you.