finish updating isd->b0, link to CommonRLInterface

zsunberg · web-flow · commit 46982ecbdbf2 · 2021-10-25T09:55:39.000-06:00
diff --git a/docs/src/simulation.md b/docs/src/simulation.md
@@ -1,6 +1,6 @@
 # Simulation Standard
 
-Important note: In most cases, **users need not implement their own simulators**. Several simulators that are compatible with the standard in this document are implemented in the [POMDPSimulators package](https://github.com/JuliaPOMDP/POMDPSimulators.jl) and allow [interaction from a variety of perspectives](https://juliapomdp.github.io/POMDPSimulators.jl/latest/which/). Moreover [RLInterface.jl](https://github.com/JuliaPOMDP/RLInterface.jl) provides an OpenAI Gym style environment interface to interact with environments that is more flexible in some cases.
+Important note: In most cases, **users need not implement their own simulators**. Several simulators that are compatible with the standard in this document are implemented in the [POMDPSimulators package](https://github.com/JuliaPOMDP/POMDPSimulators.jl) and allow [interaction from a variety of perspectives](https://juliapomdp.github.io/POMDPSimulators.jl/latest/which/). Moreover [CommonRLInterface.jl](https://github.com/JuliaReinforcementLearning/CommonRLInterface.jl) provides an OpenAI Gym style environment interface to interact with environments that is more flexible in some cases.
 
 In order to maintain consistency across the POMDPs.jl ecosystem, this page defines a standard for how simulations should be conducted. All simulators should be consistent with this page, and, if solvers are attempting to find an optimal POMDP policy, they should optimize the expected value of `r_total` below. In particular, this page should be consulted when questions about how less-obvious concepts like terminal states are handled.
 
@@ -13,7 +13,7 @@ In general, POMDP simulations take up to 5 inputs (see also the [`simulate`](@re
 - `pomdp::POMDP`: pomdp model object (see [POMDPs and MDPs](@ref))
 - `policy::Policy`: policy (see [Solvers and Policies](@ref))
 - `up::Updater`: belief updater (see [Beliefs and Updaters](@ref))
-- `b0`: initial belief (this may be )
+- `b0`: initial belief (this may be updater-specific, such as an observation if the updater just returns the previous observation)
 - `s`: initial state
 
 The last three of these inputs are optional. If they are not explicitly provided, they should be inferred using the following POMDPs.jl functions: