gymwipe.envs.inverted_pendulum module

A Gym environment for frequency band assignments to a sensor and a controller in the wireless networked control of an inverted pendulum

class InvertedPendulumInterpreter(env)[source]

Bases: gymwipe.envs.core.Interpreter

onPacketReceived(senderIndex, receiverIndex, payload)[source]

No actions for received packets, as we read sensor angles directly from the plant object.

onFrequencyBandAssignment(deviceIndex, duration)[source]

Is invoked whenever the RRM assigns the frequency band.

Parameters:
  • deviceIndex (int) – The index (as in the gym environment’s action space) of the device that the frequency band is assigned to.
  • duration (int) – The duration of the assignment in multiples of TIME_SLOT_LENGTH
getReward()[source]

Reward is \(\lvert 180 - \alpha \rvert\) with \(\alpha\) being the pendulum angle.

getObservation()[source]

Returns an observation of the system’s state.

getDone()[source]

Returns whether an episode has ended.

Note

Reinforcement learning problems do not have to be split into episodes. In this case, you do not have to override the default implementation as it always returns False.

getInfo()[source]

Returns a dict providing additional information on the environment’s state that may be useful for debugging but is not allowed to be used by a learning agent.

class InvertedPendulumEnv[source]

Bases: gymwipe.envs.core.BaseEnv

An environment that allows an agent to assign a frequency band to a sliding pendulum’s AngleSensor and an InvertedPendulumPidController

Note

This environment is yet untested!

reset()[source]

Resets the state of the environment and returns an initial observation.

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters:action (object) – an action provided by the agent
Returns:agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type:observation (object)
render(mode='human', close=False)[source]

Renders the environment to stdout.