xpag.wrappers.goalenv_wrapper.CumulRewardWrapper#
- class CumulRewardWrapper(env, normalization_factor=1.0)#
Bases:
VectorEnvWrapper
An environment wrapper that adds the cumulative reward to observations. It assumes that the environment is not a goal-based environment, and that observations are 1D arrays (with .single_observation_space of type spaces.Box).
Base class for vectorized environments.
- Parameters:
num_envs – Number of environments in the vectorized environment.
observation_space – Observation space of a single environment.
action_space – Action space of a single environment.
Methods
Call a method, or get a property, from each parallel environment.
Calls a method name for each parallel environment asynchronously.
After calling a method in
call_async()
, this function collects the results.Close all parallel environments and release resources.
Clean up the extra resources e.g. beyond what's in this base class.
Get a property from each parallel environment.
Gets the attribute name from the environment.
Compute the render frames as specified by
render_mode
during the initialization of the environment.Reset all parallel environments and return a batch of initial observations and info.
Reset the sub-environments asynchronously.
reset_done
Retrieves the results of a
reset_async()
call.Set a property in each sub-environment.
Take an action for each parallel environment.
Asynchronously performs steps in the sub-environments.
Retrieves the results of a
step_async()
call.Attributes
metadata
Returns the environment's internal
_np_random
that if not set will initialise with a random seed.render_mode
reward_range
spec
Returns the base non-wrapped environment.
action_space
observation_space
- call(name, *args, **kwargs)#
Call a method, or get a property, from each parallel environment.
- Parameters:
name (str) – Name of the method or property to call.
*args – Arguments to apply to the method call.
**kwargs – Keyword arguments to apply to the method call.
- Returns:
List of the results of the individual calls to the method or property for each environment.
- call_async(name, *args, **kwargs)#
Calls a method name for each parallel environment asynchronously.
- call_wait(**kwargs)#
After calling a method in
call_async()
, this function collects the results.- Return type:
List
[Any
]
- close(**kwargs)#
Close all parallel environments and release resources.
It also closes all the existing image viewers, then calls
close_extras()
and setclosed
asTrue
.Warning
This function itself does not close the environments, it should be handled in
close_extras()
. This is generic for both synchronous and asynchronous vectorized environments.Note
This will be automatically called when garbage collected or program exited.
- Parameters:
**kwargs – Keyword arguments passed to
close_extras()
- close_extras(**kwargs)#
Clean up the extra resources e.g. beyond what’s in this base class.
- get_attr(name)#
Get a property from each parallel environment.
- Parameters:
name (str) – Name of the property to be get from each individual environment.
- Returns:
The property with name
- get_wrapper_attr(name)#
Gets the attribute name from the environment.
- Return type:
Any
- property np_random: Generator#
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- render()#
Compute the render frames as specified by
render_mode
during the initialization of the environment.The environment’s
metadata
render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames. :rtype:Union
[TypeVar
(RenderFrame
),list
[TypeVar
(RenderFrame
)],None
]Note
As the
render_mode
is known during__init__
, the objects used to render the environment state should be initialised in__init__
.By convention, if the
render_mode
is:None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()
andrender()
doesn’t need to be called. ReturnsNone
.“rgb_array”: Return a single frame representing the current state of the environment. A frame is a
np.ndarray
with shape(x, y, 3)
representing RGB values for an x-by-y pixel image.“ansi”: Return a strings (
str
) orStringIO.StringIO
containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper,
gymnasium.wrappers.RenderCollection
that is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list")
. The frames collected are popped afterrender()
is called orreset()
.
Note
Make sure that your class’s
metadata
"render_modes"
key includes the list of supported modes.Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e.,
gymnasium.make("CartPole-v1", render_mode="human")
- reset(**kwargs)#
Reset all parallel environments and return a batch of initial observations and info.
- Parameters:
seed – The environment reset seeds
options – If to return the options
- Returns:
A batch of observations and info from the vectorized environment.
Example
>>> import gymnasium as gym >>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.reset(seed=42) (array([[ 0.0273956 , -0.00611216, 0.03585979, 0.0197368 ], [ 0.01522993, -0.04562247, -0.04799704, 0.03392126], [-0.03774345, -0.02418869, -0.00942293, 0.0469184 ]], dtype=float32), {})
- reset_async(**kwargs)#
Reset the sub-environments asynchronously.
This method will return
None
. A call toreset_async()
should be followed by a call toreset_wait()
to retrieve the results.- Parameters:
seed – The reset seed
options – Reset options
- reset_wait(**kwargs)#
Retrieves the results of a
reset_async()
call.A call to this method must always be preceded by a call to
reset_async()
.- Parameters:
seed – The reset seed
options – Reset options
- Returns:
The results from
reset_async()
- Raises:
NotImplementedError – VectorEnv does not implement function
- set_attr(name, values)#
Set a property in each sub-environment.
- Parameters:
name (str) – Name of the property to be set in each individual environment.
values (list, tuple, or object) – Values of the property to be set to. If values is a list or tuple, then it corresponds to the values for each individual environment, otherwise a single value is set for all environments.
- step(action)#
Take an action for each parallel environment.
- Parameters:
actions – element of
action_space
Batch of actions.- Returns:
Batch of (observations, rewards, terminations, truncations, infos)
Note
As the vector environments autoreset for a terminating and truncating sub-environments, the returned observation and info is not the final step’s observation or info which is instead stored in info as “final_observation” and “final_info”.
Example
>>> import gymnasium as gym >>> import numpy as np >>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> _ = envs.reset(seed=42) >>> actions = np.array([1, 0, 1]) >>> observations, rewards, termination, truncation, infos = envs.step(actions) >>> observations array([[ 0.02727336, 0.18847767, 0.03625453, -0.26141977], [ 0.01431748, -0.24002443, -0.04731862, 0.3110827 ], [-0.03822722, 0.1710671 , -0.00848456, -0.2487226 ]], dtype=float32) >>> rewards array([1., 1., 1.]) >>> termination array([False, False, False]) >>> truncation array([False, False, False]) >>> infos {}
- step_async(actions)#
Asynchronously performs steps in the sub-environments.
The results can be retrieved via a call to
step_wait()
.- Parameters:
actions – The actions to take asynchronously
- step_wait()#
Retrieves the results of a
step_async()
call.A call to this method must always be preceded by a call to
step_async()
.- Parameters:
**kwargs – Additional keywords for vector implementation
- Returns:
The results from the
step_async()
call
- property unwrapped#
Returns the base non-wrapped environment.
- Returns:
The base non-wrapped
gymnasium.Env
instance- Return type:
Env