5. Conclusion and future work
We introduced a policy- and learning-based paradigm for emergency networking in conditionally auctioned licensed spectrum. The concept of mission policies, which specify the Quality of Service (QoS) for emergency as well as for incumbent network traffic, is envisioned. Our paradigm for emergency networking represents a shift from the established primary–secondary model (which uses fixed priorities) and enables graceful degradation in the QoS of incumbent networks based on mission-policy specifications. We developed a Multi-Agent Reinforcement Learning (MARL)-based communication framework, RescueNet, for realizing this new paradigm. The proposed solution can go beyond the emergency scenario and has the potential to enable cognitive ad hoc network operation in any frequency band, licensed or unlicensed. The performance of RescueNet in terms of convergence and policy conformance is verified using simulations on ns-3. Our future work includes Inverse Reinforcement Learning (IRL) for the reward function. As the scalar reward function does not provide optimal performance in dynamically changing environment, we will study the IRL problem to optimize the reward function when a priori knowledge is available on the fly. The IRL problem consists in finding a reward function that can explain observed behavior. We will focus initially on the setting in which the complete prior knowledge and mission policy are given; then, we will find new methods to choose among optimal reward functions as multiple possible reward functions may exist.