gymwipe.envs.counter_traffic module

A simple Gym environment using the Simple network devices for demonstration purposes

class CounterTrafficEnv[source]

Bases: gymwipe.envs.core.BaseEnv

An environment for testing reinforcement learning with three devices:

  • Two network devices that send a configurable amount of data to each other
  • A simple RRM operating an interpreter for that use case

Optimally, a learning agent will fit the length of the assignment intervals to the amount of data sent by the devices.

COUNTER_INTERVAL = 0.001[source]
COUNTER_BYTE_LENGTH = 2[source]
COUNTER_BOUND = 65536[source]
class SenderDevice(name, xPos, yPos, frequencyBand, packetMultiplicity)[source]

Bases: gymwipe.networking.devices.SimpleNetworkDevice

A device sending packets with increasing COUNTER_BYTE_LENGTH-byte integers. Every COUNTER_INTERVAL seconds, a packet with the current integer is sent packetMultiplicity times.

senderProcess()[source]
class CounterTrafficInterpreter(env)[source]

Bases: gymwipe.envs.core.Interpreter

reset()[source]

This method is invoked when the environment is reset – override it with your initialization tasks if you feel like it.

onPacketReceived(senderIndex, receiverIndex, payload)[source]

Is invoked whenever the RRM receives a packet that is not addressed to it.

Parameters:
  • senderIndex (int) – The device index of the received packet’s sender (as in the gym environment’s action space)
  • receiverIndex (int) – The device index of the received packet’s receiver (as in the gym environment’s action space)
  • payload (Transmittable) – The received packet’s payload
onFrequencyBandAssignment(deviceIndex, duration)[source]

Is invoked whenever the RRM assigns the frequency band.

Parameters:
  • deviceIndex (int) – The index (as in the gym environment’s action space) of the device that the frequency band is assigned to.
  • duration (int) – The duration of the assignment in multiples of TIME_SLOT_LENGTH
getReward()[source]

Reward depends on the change of the difference between the values received from both devices: If the difference became smaller, it is the positive reward difference, limited by 10. Otherwise, it is the negative reward difference, limited by -10. This is a result of trial and error and most likely far away from being perfect.

getObservation()[source]

Returns an observation of the system’s state.

getDone()[source]

Returns whether an episode has ended.

Note

Reinforcement learning problems do not have to be split into episodes. In this case, you do not have to override the default implementation as it always returns False.

getInfo()[source]

Returns a dict providing additional information on the environment’s state that may be useful for debugging but is not allowed to be used by a learning agent.

reset()[source]

Resets the state of the environment and returns an initial observation.

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)
render(mode='human', close=False)[source]

Renders the environment to stdout.