You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: getting-started/env.md
+65-29
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ Flatland Environment
3
3
4
4
The goal in Flatland is simple:
5
5
6
-
> **We seek to minimize the time it takes to bring all the agents to their respective target.**
6
+
> **We seek to minimize the time it takes to bring all the agents to their respective target.**
7
7
8
8
This raises a number of questions:
9
9
@@ -12,50 +12,81 @@ This raises a number of questions:
12
12
-[**Observations:**](#observations) what can each agent "see"?
13
13
-[**Rewards:**](#rewards) what is the metric used to evaluate the agents?
14
14
15
-
16
15
🗺️ Environment
17
16
---
18
17
19
-
Flatland is a 2D rectangular grid environment of arbitrary width and height, where the most primitive unit is a cell. Each cell has the capacity to hold a single agent (train).
18
+
Flatland is a 2D rectangular grid environment of arbitrary width and height, where the most primitive unit is a cell. Each cell has the capacity to hold a
19
+
single agent (train).
20
+
21
+
An agent in a cell can have a discrete orientation direction which represents the cardinal direction the agent is pointing to. An agent can move to a subset of
22
+
adjacent cells. The subset of adjacent cells that an agent is allowed to transition to is defined by a 4-bit transition map representing possible transitions in
23
+
4 different directions.
20
24
21
-
An agent in a cell can have a discrete orientation direction which represents the cardinal direction the agent is pointing to. An agent can move to a subset of adjacent cells. The subset of adjacent cells that an agent is allowed to transition to is defined by a 4-bit transition map representing possible transitions in 4 different directions.
*8 unique cells enable us to implement any realworld railway network in the flatland env*
27
+
*10 basic cells modulo rotation enable us to implement any real-world railway network in the flatland env*
28
+
This gives a set of 30 valid transitions in total (see `#` giving number of rotations).
25
29
26
-
Agents can only travel in the direction they are currently facing. Hence, the permitted transitions for any given agent depend both on its position and on its direction. Transition maps define the railway network in the flatland world. One can implement any real world railway network within the Flatland environment by manupulating the transition maps of cells.
27
-
28
-
For more information on transtion maps checkout [environment information](../environment/environment_information)!
30
+
Agents can only travel in the direction they are currently facing. Hence, the permitted transitions for any given agent depend both on its position and on its
31
+
direction. Transition maps define the railway network in the flatland world. One can implement any real world railway network within the Flatland environment by
32
+
manipulating the transition maps of cells.
33
+
34
+
For more information on transition maps checkout [environment information](../environment/environment_information)!
29
35
30
36
31
37
↔️ Actions
32
38
---
33
39
34
-
The trains in Flatland have strongly limited movements, as you would expect from a railway simulation. This means that only a few actions are valid in most cases.
40
+
The trains in Flatland have strongly limited movements, as you would expect from a railway simulation. This means that only a few actions are valid in most
41
+
cases.
35
42
36
43
Here are the possible actions:
37
44
38
-
-**`DO_NOTHING`**: If the agent is already moving, it continues moving. If it is stopped, it stays stopped. Special case: if the agent is at a dead-end, this action will result in the train turning around.
39
-
-**`MOVE_LEFT`**: This action is only valid at cells where the agent can change direction towards the left. If chosen, the left transition and a rotation of the agent orientation to the left is executed. If the agent is stopped, this action will cause it to start moving in any cell where forward or left is allowed!
40
-
-**`MOVE_FORWARD`**: The agent will move forward. This action will start the agent when stopped. At switches, this will chose the forward direction.
45
+
-**`DO_NOTHING`**: If the agent is already moving, it continues moving. If it is stopped, it stays stopped. Special case: if the agent is at a dead-end, this
46
+
action will result in the train turning around.
47
+
-**`MOVE_LEFT`**: This action is only valid at cells where the agent can change direction towards the left. If chosen, the left transition and a rotation of
48
+
the agent orientation to the left is executed. If the agent is stopped, this action will cause it to start moving in any cell where forward or left is
49
+
allowed!
50
+
-**`MOVE_FORWARD`**: The agent will move forward. This action will start the agent when stopped. At switches, this will choose the forward direction.
41
51
-**`MOVE_RIGHT`**: The same as deviate left but for right turns.
42
52
-**`STOP_MOVING`**: This action causes the agent to stop.
43
53
44
-
Flatland is a discrete time simulation, i.e. it performs all actions with constant time step. A single simulation step synchronously moves the time forward by a constant increment, thus enacting exactly one action per agent per timestep.
54
+
Flatland is a discrete time simulation, i.e. it performs all actions with constant time step. A single simulation step synchronously moves the time forward by a
55
+
constant increment, thus enacting exactly one action per agent per timestep.
56
+
45
57
```{admonition} Code reference
46
58
The actions are defined in [flatland.envs.rail_env.RailEnvActions](https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/rail_env.py#L69).
47
59
48
60
You can refer to the directions in your code using eg `RailEnvActions.MOVE_FORWARD`, `RailEnvActions.MOVE_RIGHT`...
49
61
```
62
+
63
+
The following diagram shows the interplay of agent position/direction and actions.
64
+
65
+
The agent (red triangle) is in left switch cell with direction `W`. The left neighbor cell is a left switch, too.
66
+
Upon entering the new cell, the `MOVE_LEFT` action will update the agent's direction to `S`, and the `MOVE_FORWARD` direction will keep the agent's direction at
> **current position and direction** determine **next cell**
74
+
>
75
+
> **action** determines **next direction**
76
+
50
77
### 💥 Agent Malfunctions
51
-
Malfunctions are implemented to simulate delays by stopping agents at random times for random durations. Train that malfunction can’t move for a random, but known, number of steps. They of course block the trains following them 😬.
78
+
79
+
Malfunctions are implemented to simulate delays by stopping agents at random times for random durations. Train that malfunction can’t move for a random, but
80
+
known, number of steps. They of course block the trains following them 😬.
52
81
53
82
👀 Observations
54
83
---
55
84
56
-
In Flatland, you have full control over the observations that your agents will work with. Three observations are provided as starting point. However, you are encouraged to implement your own.
85
+
In Flatland, you have full control over the observations that your agents will work with. Three observations are provided as starting point. However, you are
86
+
encouraged to implement your own.
57
87
58
88
The three provided observations are:
89
+
59
90
- Global grid observation
60
91
- Local grid observation
61
92
- Tree observation
@@ -70,7 +101,8 @@ The three provided observations are:
70
101
The provided observations are defined in [envs/observations.py](https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/observations.py)
71
102
```
72
103
73
-
Each of the provided observation has its strengths and weaknesses. However, it is unlikely that you will be able to solve the problem by using any single one of them directly. Instead you will need to design your own observation, which can be a combination of the existing ones or which could be radically different.
104
+
Each of the provided observations has its strengths and weaknesses. However, it is unlikely that you will be able to solve the problem by using any single one of
105
+
them directly. Instead you will need to design your own observation, which can be a combination of the existing ones or which could be radically different.
74
106
75
107
**[🔗 Create your own observations](../environment/custom_observations)**
76
108
@@ -80,37 +112,41 @@ Each of the provided observation has its strengths and weaknesses. However, it i
80
112
81
113
In **Flat**land 3, rewards are only provided at the end of an episode by default making it a sparse reward setting.
82
114
83
-
The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached.
115
+
The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached.
84
116
85
117
The actual reward structure has the following cases:
86
118
87
-
-**Train has arrived at it's target**: The agent will be given a reward of 0 for arriving on time or before the expected time. For arriving at the target later than the specified time, the agent is given a negative reward proportional to the delay.
88
-
`min(latest_arrival - actual_arrival, 0 )`
119
+
-**Train has arrived at its target**: The agent will be given a reward of 0 for arriving on time or before the expected time. For arriving at the target later
120
+
than the specified time, the agent is given a negative reward proportional to the delay.
121
+
`min(latest_arrival - actual_arrival, 0 )`
89
122
90
-
-**The train did not reach it's target yet**: The reward is negative and equal to the estimated amount of time needed by the agent to reach its target from it's current position, if it travels on the shortest path to the target, while accounting for it's latest arrival time.
91
-
`agent.get_current_delay()`*refer to it in detail [here](../environment/timetables)*
92
-
The value returned will be positive if the expected arrival time is projected before latest arrival and negative if the expected arrival time is projected after latest arrival. Since it is called at the end of the episode, the agent is already past it's deadline and so the value will always be negative.
123
+
-**The train did not reach it's target yet**: The reward is negative and equal to the estimated amount of time needed by the agent to reach its target from
124
+
it's current position, if it travels on the shortest path to the target, while accounting for it's latest arrival time.
125
+
`agent.get_current_delay()`*refer to it in detail [here](../environment/timetables)*
126
+
The value returned will be positive if the expected arrival time is projected before latest arrival and negative if the expected arrival time is projected
127
+
after latest arrival. Since it is called at the end of the episode, the agent is already past it's deadline and so the value will always be negative.
93
128
94
-
-**The train never departed**: If the agent hasn't departed (i.e. status is `READY_TO_DEPART`) at the end of the episode, it is considered to be cancelled and the following reward is provided.
-**The train never departed**: If the agent hasn't departed (i.e. status is `READY_TO_DEPART`) at the end of the episode, it is considered to be cancelled and
The reward is calculated in [envs/rail_env.py](https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/rail_env.py)
99
135
```
100
136
101
-
102
-
103
137
🚉 Other concepts
104
138
-----------------
105
139
106
140
### Stochasticity
107
141
108
-
An important aspect of these levels will be their **stochasticity**, which means how often and for how long trains will malfunction. Malfunctions force the agents the reconsider their plans which can be costly.
142
+
An important aspect of these levels will be their **stochasticity**, which means how often and for how long trains will malfunction. Malfunctions force the
143
+
agents to reconsider their plans, which can be costly.
Finally, trains in real railway networks don't all move at the same speed. A freight train will for example be slower than a passenger train. This is an important consideration, as you want to avoid scheduling a fast train behind a slow train!
149
+
Finally, trains in real railway networks don't all move at the same speed. A freight train will for example be slower than a passenger train. This is an
150
+
important consideration, as you want to avoid scheduling a fast train behind a slow train!
0 commit comments