The car has two actuators: the steering angle, and the desired speed (and not acceleration, as one might expect from a “normal” car).
The state of the vehicle comprises 5 numbers: longitudinal and lateral speeds, angular speed (Z-coordinate), and current actuators (steering angle and speed). Note that there’s no position nor yaw. This is because we’re already operating in a coordinate system that’s centered on the car (its position is always (0, 0)), with the X-axis shooting forward in the direction in which the vehicle is pointing (so yaw is always 0).
The cent stands for centerline which is the path we’re following. As the name suggests, this is the path that lies exactly in the middle between the race track’s bounds. (Clearly, this path is not the fastest but we’ll worry about that more in the next blog post.)
And finally, the params are the controller parameters (lookahead distance, speed setpoint, and tire force max) that were set while the data was recorded. The idea is to look for better params while driving via an online gradient ascent procedure such that we constantly adapt to the changing conditions.
The trajectory are 150 2D points representing future positions of the vehicle, and the act is a vector containing future actuators: 9 steering angles, and 9 speeds.
We need the trajectory to compute the score later on, whereas the actuators are, well, needed to drive the vehicle. Although I should say: not all of them, I’m only taking the first steering angle and speed and throw away the rest (as you would normally do in a controller like Model Predictive Control, which I used in a previous blog post).
The colorful boxes in the graph depict modules containing one or more fully connected layers, nothing fancy.
During inference, the state and centerline are given, we can’t change them. But the idea is to probe the model for alternative controller parameters that might yield a more promising trajectory.