gymwipe.envs.counter_traffic module¶
A simple Gym environment using the Simple network devices for demonstration purposes
-
class
CounterTrafficEnv[source]¶ Bases:
gymwipe.envs.core.BaseEnvAn environment for testing reinforcement learning with three devices:
- Two network devices that send a configurable amount of data to each other
- A simple RRM operating an interpreter for that use case
Optimally, a learning agent will fit the length of the assignment intervals to the amount of data sent by the devices.
-
class
SenderDevice(name, xPos, yPos, frequencyBand, packetMultiplicity)[source]¶ Bases:
gymwipe.networking.devices.SimpleNetworkDeviceA device sending packets with increasing COUNTER_BYTE_LENGTH-byte integers. Every COUNTER_INTERVAL seconds, a packet with the current integer is sent packetMultiplicity times.
-
class
CounterTrafficInterpreter(env)[source]¶ Bases:
gymwipe.envs.core.Interpreter-
reset()[source]¶ This method is invoked when the environment is reset – override it with your initialization tasks if you feel like it.
-
onPacketReceived(senderIndex, receiverIndex, payload)[source]¶ Is invoked whenever the RRM receives a packet that is not addressed to it.
Parameters: - senderIndex (
int) – The device index of the received packet’s sender (as in the gym environment’s action space) - receiverIndex (
int) – The device index of the received packet’s receiver (as in the gym environment’s action space) - payload (
Transmittable) – The received packet’s payload
- senderIndex (
-
onFrequencyBandAssignment(deviceIndex, duration)[source]¶ Is invoked whenever the RRM assigns the frequency band.
Parameters: - deviceIndex (
int) – The index (as in the gym environment’s action space) of the device that the frequency band is assigned to. - duration (
int) – The duration of the assignment in multiples ofTIME_SLOT_LENGTH
- deviceIndex (
-
getReward()[source]¶ Reward depends on the change of the difference between the values received from both devices: If the difference became smaller, it is the positive reward difference, limited by 10. Otherwise, it is the negative reward difference, limited by -10. This is a result of trial and error and most likely far away from being perfect.
-
-
step(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)