xpag.wrappers.goalenv_wrapper.CumulRewardWrapper#

class CumulRewardWrapper(env, normalization_factor=1.0)#

Bases: VectorEnvWrapper

An environment wrapper that adds the cumulative reward to observations. It assumes that the environment is not a goal-based environment, and that observations are 1D arrays (with .single_observation_space of type spaces.Box).

Base class for vectorized environments.

Parameters:

num_envs – Number of environments in the vectorized environment.
observation_space – Observation space of a single environment.
action_space – Action space of a single environment.

Methods

`call`	Call a method, or get a property, from each parallel environment.
`call_async`	Calls a method name for each parallel environment asynchronously.
`call_wait`	After calling a method in `call_async()`, this function collects the results.
`close`	Close all parallel environments and release resources.
`close_extras`	Clean up the extra resources e.g. beyond what's in this base class.
`get_attr`	Get a property from each parallel environment.
`get_wrapper_attr`	Gets the attribute name from the environment.
`render`	Compute the render frames as specified by `render_mode` during the initialization of the environment.
`reset`	Reset all parallel environments and return a batch of initial observations and info.
`reset_async`	Reset the sub-environments asynchronously.
`reset_done`
`reset_wait`	Retrieves the results of a `reset_async()` call.
`set_attr`	Set a property in each sub-environment.
`step`	Take an action for each parallel environment.
`step_async`	Asynchronously performs steps in the sub-environments.
`step_wait`	Retrieves the results of a `step_async()` call.

Attributes

`metadata`
`np_random`	Returns the environment's internal `_np_random` that if not set will initialise with a random seed.
`render_mode`
`reward_range`
`spec`
`unwrapped`	Returns the base non-wrapped environment.
`action_space`
`observation_space`

call(name, *args, **kwargs)#

Call a method, or get a property, from each parallel environment.

Parameters:

name (str) – Name of the method or property to call.
*args – Arguments to apply to the method call.
**kwargs – Keyword arguments to apply to the method call.

Returns:

List of the results of the individual calls to the method or property for each environment.

call_async(name, *args, **kwargs)#: Calls a method name for each parallel environment asynchronously.

call_wait(**kwargs)#

After calling a method in call_async(), this function collects the results.

Return type:: List[Any]

close(**kwargs)#

Close all parallel environments and release resources.

It also closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is generic for both synchronous and asynchronous vectorized environments.

Note

This will be automatically called when garbage collected or program exited.

Parameters:: **kwargs – Keyword arguments passed to close_extras()

close_extras(**kwargs)#: Clean up the extra resources e.g. beyond what’s in this base class.

get_attr(name)#

Get a property from each parallel environment.

Parameters:: name (str) – Name of the property to be get from each individual environment.
Returns:: The property with name

get_wrapper_attr(name)#

Gets the attribute name from the environment.

Return type:: Any

property np_random: Generator#

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:: Instances of np.random.Generator

render()#

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames. :rtype: Union[TypeVar(RenderFrame), list[TypeVar(RenderFrame)], None]

Note

As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.
“rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
“ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note

Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(**kwargs)#

Reset all parallel environments and return a batch of initial observations and info.

Parameters:

seed – The environment reset seeds
options – If to return the options

Returns:

A batch of observations and info from the vectorized environment.

Example

>>> import gymnasium as gym
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset(seed=42)
(array([[ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ],
       [ 0.01522993, -0.04562247, -0.04799704,  0.03392126],
       [-0.03774345, -0.02418869, -0.00942293,  0.0469184 ]],
      dtype=float32), {})

reset_async(**kwargs)#

Reset the sub-environments asynchronously.

This method will return None. A call to reset_async() should be followed by a call to reset_wait() to retrieve the results.

Parameters:

seed – The reset seed
options – Reset options

reset_wait(**kwargs)#

Retrieves the results of a reset_async() call.

A call to this method must always be preceded by a call to reset_async().

Parameters:

seed – The reset seed
options – Reset options

Returns:

The results from reset_async()

Raises:

NotImplementedError – VectorEnv does not implement function

set_attr(name, values)#

Set a property in each sub-environment.

Parameters:

name (str) – Name of the property to be set in each individual environment.
values (list, tuple, or object) – Values of the property to be set to. If values is a list or tuple, then it corresponds to the values for each individual environment, otherwise a single value is set for all environments.

step(action)#

Take an action for each parallel environment.

Parameters:: actions – element of action_space Batch of actions.
Returns:: Batch of (observations, rewards, terminations, truncations, infos)

Note

As the vector environments autoreset for a terminating and truncating sub-environments, the returned observation and info is not the final step’s observation or info which is instead stored in info as “final_observation” and “final_info”.

Example

>>> import gymnasium as gym
>>> import numpy as np
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> _ = envs.reset(seed=42)
>>> actions = np.array([1, 0, 1])
>>> observations, rewards, termination, truncation, infos = envs.step(actions)
>>> observations
array([[ 0.02727336,  0.18847767,  0.03625453, -0.26141977],
       [ 0.01431748, -0.24002443, -0.04731862,  0.3110827 ],
       [-0.03822722,  0.1710671 , -0.00848456, -0.2487226 ]],
      dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> termination
array([False, False, False])
>>> truncation
array([False, False, False])
>>> infos
{}

step_async(actions)#

Asynchronously performs steps in the sub-environments.

The results can be retrieved via a call to step_wait().

Parameters:: actions – The actions to take asynchronously

step_wait()#

Retrieves the results of a step_async() call.

A call to this method must always be preceded by a call to step_async().

Parameters:: **kwargs – Additional keywords for vector implementation
Returns:: The results from the step_async() call

property unwrapped#

Returns the base non-wrapped environment.

Returns:: The base non-wrapped gymnasium.Env instance
Return type:: Env