xpag.wrappers.goalenv_wrapper.CumulRewardWrapper#

class CumulRewardWrapper(env, normalization_factor=1.0)#

Bases: VectorEnvWrapper

An environment wrapper that adds the cumulative reward to observations. It assumes that the environment is not a goal-based environment, and that observations are 1D arrays (with .single_observation_space of type spaces.Box).

Base class for vectorized environments.

Parameters:
  • num_envs – Number of environments in the vectorized environment.

  • observation_space – Observation space of a single environment.

  • action_space – Action space of a single environment.

Methods

call

Call a method, or get a property, from each parallel environment.

call_async

Calls a method name for each parallel environment asynchronously.

call_wait

After calling a method in call_async(), this function collects the results.

close

Close all parallel environments and release resources.

close_extras

Clean up the extra resources e.g. beyond what's in this base class.

get_attr

Get a property from each parallel environment.

get_wrapper_attr

Gets the attribute name from the environment.

render

Compute the render frames as specified by render_mode during the initialization of the environment.

reset

Reset all parallel environments and return a batch of initial observations and info.

reset_async

Reset the sub-environments asynchronously.

reset_done

reset_wait

Retrieves the results of a reset_async() call.

set_attr

Set a property in each sub-environment.

step

Take an action for each parallel environment.

step_async

Asynchronously performs steps in the sub-environments.

step_wait

Retrieves the results of a step_async() call.

Attributes

metadata

np_random

Returns the environment's internal _np_random that if not set will initialise with a random seed.

render_mode

reward_range

spec

unwrapped

Returns the base non-wrapped environment.

action_space

observation_space

call(name, *args, **kwargs)#

Call a method, or get a property, from each parallel environment.

Parameters:
  • name (str) – Name of the method or property to call.

  • *args – Arguments to apply to the method call.

  • **kwargs – Keyword arguments to apply to the method call.

Returns:

List of the results of the individual calls to the method or property for each environment.

call_async(name, *args, **kwargs)#

Calls a method name for each parallel environment asynchronously.

call_wait(**kwargs)#

After calling a method in call_async(), this function collects the results.

Return type:

List[Any]

close(**kwargs)#

Close all parallel environments and release resources.

It also closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is generic for both synchronous and asynchronous vectorized environments.

Note

This will be automatically called when garbage collected or program exited.

Parameters:

**kwargs – Keyword arguments passed to close_extras()

close_extras(**kwargs)#

Clean up the extra resources e.g. beyond what’s in this base class.

get_attr(name)#

Get a property from each parallel environment.

Parameters:

name (str) – Name of the property to be get from each individual environment.

Returns:

The property with name

get_wrapper_attr(name)#

Gets the attribute name from the environment.

Return type:

Any

property np_random: Generator#

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

render()#

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames. :rtype: Union[TypeVar(RenderFrame), list[TypeVar(RenderFrame)], None]

Note

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

  • None (default): no render is computed.

  • “human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.

  • “rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.

  • “ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).

  • “rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(**kwargs)#

Reset all parallel environments and return a batch of initial observations and info.

Parameters:
  • seed – The environment reset seeds

  • options – If to return the options

Returns:

A batch of observations and info from the vectorized environment.

Example

>>> import gymnasium as gym
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset(seed=42)
(array([[ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ],
       [ 0.01522993, -0.04562247, -0.04799704,  0.03392126],
       [-0.03774345, -0.02418869, -0.00942293,  0.0469184 ]],
      dtype=float32), {})
reset_async(**kwargs)#

Reset the sub-environments asynchronously.

This method will return None. A call to reset_async() should be followed by a call to reset_wait() to retrieve the results.

Parameters:
  • seed – The reset seed

  • options – Reset options

reset_wait(**kwargs)#

Retrieves the results of a reset_async() call.

A call to this method must always be preceded by a call to reset_async().

Parameters:
  • seed – The reset seed

  • options – Reset options

Returns:

The results from reset_async()

Raises:

NotImplementedError – VectorEnv does not implement function

set_attr(name, values)#

Set a property in each sub-environment.

Parameters:
  • name (str) – Name of the property to be set in each individual environment.

  • values (list, tuple, or object) – Values of the property to be set to. If values is a list or tuple, then it corresponds to the values for each individual environment, otherwise a single value is set for all environments.

step(action)#

Take an action for each parallel environment.

Parameters:

actions – element of action_space Batch of actions.

Returns:

Batch of (observations, rewards, terminations, truncations, infos)

Note

As the vector environments autoreset for a terminating and truncating sub-environments, the returned observation and info is not the final step’s observation or info which is instead stored in info as “final_observation” and “final_info”.

Example

>>> import gymnasium as gym
>>> import numpy as np
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> _ = envs.reset(seed=42)
>>> actions = np.array([1, 0, 1])
>>> observations, rewards, termination, truncation, infos = envs.step(actions)
>>> observations
array([[ 0.02727336,  0.18847767,  0.03625453, -0.26141977],
       [ 0.01431748, -0.24002443, -0.04731862,  0.3110827 ],
       [-0.03822722,  0.1710671 , -0.00848456, -0.2487226 ]],
      dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> termination
array([False, False, False])
>>> truncation
array([False, False, False])
>>> infos
{}
step_async(actions)#

Asynchronously performs steps in the sub-environments.

The results can be retrieved via a call to step_wait().

Parameters:

actions – The actions to take asynchronously

step_wait()#

Retrieves the results of a step_async() call.

A call to this method must always be preceded by a call to step_async().

Parameters:

**kwargs – Additional keywords for vector implementation

Returns:

The results from the step_async() call

property unwrapped#

Returns the base non-wrapped environment.

Returns:

The base non-wrapped gymnasium.Env instance

Return type:

Env