You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* moving toward distribution-centric
* check in before going to two stage tutorial
* before adding PO-Mountaincar
* defined pomc
* mostly done with pomdp tutorial
* finished updates
Copy file name to clipboardExpand all lines: docs/src/concepts.md
+9-14Lines changed: 9 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,31 +24,26 @@ The code components of the POMDPs.jl ecosystem relevant to problems and solvers
24
24
An MDP is a mathematical framework for sequential decision making under
25
25
uncertainty, and where all of the uncertainty arises from outcomes that
26
26
are partially random and partially under the control of a decision
27
-
maker. Mathematically, an MDP is a tuple (S,A,T,R), where S is the state
28
-
space, A is the action space, T is a transition function defining the
27
+
maker. Mathematically, an MDP is a tuple ``(S,A,T,R,\gamma)``, where ``S`` is the state
28
+
space, ``A`` is the action space, ``T`` is a transition function defining the
29
29
probability of transitioning to each state given the state and action at
30
-
the previous time, and R is a reward function mapping every possible
31
-
transition (s,a,s') to a real reward value. For more information see a
30
+
the previous time, and ``R`` is a reward function mapping every possible
31
+
transition ``(s,a,s')`` to a real reward value. Finally, ``\gamma`` is a discount factor that defines the relative weighting of current and future rewards.
32
+
For more information see a
32
33
textbook such as \[1\]. In POMDPs.jl an MDP is represented by a concrete
33
34
subtype of the [`MDP`](@ref) abstract type and a set of methods that
34
-
define each of its components. S and A are defined by implementing
35
-
[`states`](@ref) and [`actions`](@ref) for your specific [`MDP`](@ref)
36
-
subtype. R is by implementing [`reward`](@ref), and T is defined by implementing [`transition`](@ref) if the [*explicit*](@ref defining_pomdps) interface is used or [`gen`](@ref) if the [*generative*](@ref defining_pomdps) interface is used.
35
+
define each of its components as described in the [problem definition section](@ref defining_pomdps).
37
36
38
37
A POMDP is a more general sequential decision making problem in which
39
38
the agent is not sure what state they are in. The state is only
40
39
partially observable by the decision making agent. Mathematically, a
41
-
POMDP is a tuple (S,A,T,R,O,Z) where S, A, T, and R are the same as with
42
-
MDPs, Z is the agent's observation space, and O defines the probability
40
+
POMDP is a tuple ``(S,A,T,R,O,Z,\gamma)`` where ``S``, ``A``, ``T``, ``R``, and ``\gamma`` have the same meaning as in an MDP, ``Z`` is the agent's observation space, and ``O`` defines the probability
43
41
of receiving each observation at a transition. In POMDPs.jl, a POMDP is
44
42
represented by a concrete subtype of the [`POMDP`](@ref) abstract type,
45
-
`Z` may be defined by the [`observations`](@ref) function (though an
46
-
explicit definition is often not required), and `O` is defined by
47
-
implementing [`observation`](@ref) if the [*explicit*](@ref defining_pomdps) interface is used or [`gen`](@ref) if the [*generative*](@ref defining_pomdps) interface is used.
43
+
and the methods described in the [problem definition section](@ref defining_pomdps).
48
44
49
45
POMDPs.jl contains additional functions for defining optional problem behavior
50
-
such as a [discount factor](@ref Discount-Factor) or a set of [terminal states](@ref Terminal-States).
51
-
46
+
such as an [initial state distribution](@ref Initial-state-distribution) or [terminal states](@ref Terminal-states).
52
47
More information can be found in the [Defining POMDPs](@ref defining_pomdps) section.
0 commit comments