gymwipe.envs.counter_traffic module¶
A simple Gym environment using the Simple network devices for demonstration purposes
-
class
CounterTrafficEnv
[source]¶ Bases:
gymwipe.envs.core.BaseEnv
An environment for testing reinforcement learning with three devices:
- Two network devices that send a configurable amount of data to each other
- A simple RRM operating an interpreter for that use case
Optimally, a learning agent will fit the length of the assignment intervals to the amount of data sent by the devices.
-
class
SenderDevice
(name, xPos, yPos, frequencyBand, packetMultiplicity)[source]¶ Bases:
gymwipe.networking.devices.SimpleNetworkDevice
A device sending packets with increasing COUNTER_BYTE_LENGTH-byte integers. Every COUNTER_INTERVAL seconds, a packet with the current integer is sent packetMultiplicity times.
-
class
CounterTrafficInterpreter
(env)[source]¶ Bases:
gymwipe.envs.core.Interpreter
-
reset
()[source]¶ This method is invoked when the environment is reset – override it with your initialization tasks if you feel like it.
-
onPacketReceived
(senderIndex, receiverIndex, payload)[source]¶ Is invoked whenever the RRM receives a packet that is not addressed to it.
Parameters: - senderIndex (
int
) – The device index of the received packet’s sender (as in the gym environment’s action space) - receiverIndex (
int
) – The device index of the received packet’s receiver (as in the gym environment’s action space) - payload (
Transmittable
) – The received packet’s payload
- senderIndex (
-
onFrequencyBandAssignment
(deviceIndex, duration)[source]¶ Is invoked whenever the RRM assigns the frequency band.
Parameters: - deviceIndex (
int
) – The index (as in the gym environment’s action space) of the device that the frequency band is assigned to. - duration (
int
) – The duration of the assignment in multiples ofTIME_SLOT_LENGTH
- deviceIndex (
-
getReward
()[source]¶ Reward depends on the change of the difference between the values received from both devices: If the difference became smaller, it is the positive reward difference, limited by 10. Otherwise, it is the negative reward difference, limited by -10. This is a result of trial and error and most likely far away from being perfect.
-
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)