Skip to content

Commit 7ae4b5e

Browse files
authored
Docs twostage (#375)
* moving toward distribution-centric * check in before going to two stage tutorial * before adding PO-Mountaincar * defined pomc * mostly done with pomdp tutorial * finished updates
1 parent 46982ec commit 7ae4b5e

File tree

7 files changed

+437
-170
lines changed

7 files changed

+437
-170
lines changed

docs/make.jl

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,9 @@ makedocs(
1919

2020
"Defining (PO)MDP Models" => [
2121
"def_pomdp.md",
22-
"static.md",
2322
"interfaces.md",
24-
"dynamics.md",
2523
],
2624

27-
2825
"Writing Solvers" => [
2926
"def_solver.md",
3027
"offline_solver.md",

docs/src/api.md

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,14 @@ convert_a
5959
convert_o
6060
```
6161

62+
### Type Inference
63+
64+
```@docs
65+
statetype
66+
actiontype
67+
obstype
68+
```
69+
6270
### Distributions and Spaces
6371

6472
```@docs
@@ -93,21 +101,3 @@ value
93101
Simulator
94102
simulate
95103
```
96-
97-
## Other
98-
99-
The following functions are not part of the API for specifying and solving POMDPs, but are included in the package.
100-
101-
### Type Inference
102-
103-
```@docs
104-
statetype
105-
actiontype
106-
obstype
107-
```
108-
109-
### Utility Tools
110-
111-
```@docs
112-
add_registry
113-
```

docs/src/concepts.md

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -24,31 +24,26 @@ The code components of the POMDPs.jl ecosystem relevant to problems and solvers
2424
An MDP is a mathematical framework for sequential decision making under
2525
uncertainty, and where all of the uncertainty arises from outcomes that
2626
are partially random and partially under the control of a decision
27-
maker. Mathematically, an MDP is a tuple (S,A,T,R), where S is the state
28-
space, A is the action space, T is a transition function defining the
27+
maker. Mathematically, an MDP is a tuple ``(S,A,T,R,\gamma)``, where ``S`` is the state
28+
space, ``A`` is the action space, ``T`` is a transition function defining the
2929
probability of transitioning to each state given the state and action at
30-
the previous time, and R is a reward function mapping every possible
31-
transition (s,a,s') to a real reward value. For more information see a
30+
the previous time, and ``R`` is a reward function mapping every possible
31+
transition ``(s,a,s')`` to a real reward value. Finally, ``\gamma`` is a discount factor that defines the relative weighting of current and future rewards.
32+
For more information see a
3233
textbook such as \[1\]. In POMDPs.jl an MDP is represented by a concrete
3334
subtype of the [`MDP`](@ref) abstract type and a set of methods that
34-
define each of its components. S and A are defined by implementing
35-
[`states`](@ref) and [`actions`](@ref) for your specific [`MDP`](@ref)
36-
subtype. R is by implementing [`reward`](@ref), and T is defined by implementing [`transition`](@ref) if the [*explicit*](@ref defining_pomdps) interface is used or [`gen`](@ref) if the [*generative*](@ref defining_pomdps) interface is used.
35+
define each of its components as described in the [problem definition section](@ref defining_pomdps).
3736

3837
A POMDP is a more general sequential decision making problem in which
3938
the agent is not sure what state they are in. The state is only
4039
partially observable by the decision making agent. Mathematically, a
41-
POMDP is a tuple (S,A,T,R,O,Z) where S, A, T, and R are the same as with
42-
MDPs, Z is the agent's observation space, and O defines the probability
40+
POMDP is a tuple ``(S,A,T,R,O,Z,\gamma)`` where ``S``, ``A``, ``T``, ``R``, and ``\gamma`` have the same meaning as in an MDP, ``Z`` is the agent's observation space, and ``O`` defines the probability
4341
of receiving each observation at a transition. In POMDPs.jl, a POMDP is
4442
represented by a concrete subtype of the [`POMDP`](@ref) abstract type,
45-
`Z` may be defined by the [`observations`](@ref) function (though an
46-
explicit definition is often not required), and `O` is defined by
47-
implementing [`observation`](@ref) if the [*explicit*](@ref defining_pomdps) interface is used or [`gen`](@ref) if the [*generative*](@ref defining_pomdps) interface is used.
43+
and the methods described in the [problem definition section](@ref defining_pomdps).
4844

4945
POMDPs.jl contains additional functions for defining optional problem behavior
50-
such as a [discount factor](@ref Discount-Factor) or a set of [terminal states](@ref Terminal-States).
51-
46+
such as an [initial state distribution](@ref Initial-state-distribution) or [terminal states](@ref Terminal-states).
5247
More information can be found in the [Defining POMDPs](@ref defining_pomdps) section.
5348

5449
## Beliefs and Updaters

0 commit comments

Comments
 (0)