home / skills / benchflow-ai / skillsbench / finite-horizon-lqr

finite-horizon-lqr skill

safe

/tasks/r2r-mpc-control/environment/skills/finite-horizon-lqr

This skill enables finite-horizon LQR-based MPC control by computing backward Riccati gains and applying the first optimal input.

npx playbooks add skill benchflow-ai/skillsbench --skill finite-horizon-lqr

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.1 KB

---
name: finite-horizon-lqr
description: Solving finite-horizon LQR via dynamic programming for MPC.
---

# Finite-Horizon LQR for MPC

## Problem Formulation

Minimize cost over horizon N:
```
J = Σ(k=0 to N-1) [x'Qx + u'Ru] + x_N' P x_N
```

## Backward Riccati Recursion

Initialize: P_N = Q (or LQR solution for stability)

For k = N-1 down to 0:
```python
K_k = inv(R + B'P_{k+1}B) @ B'P_{k+1}A
P_k = Q + A'P_{k+1}(A - B @ K_k)
```

## Forward Simulation

Starting from x_0:
```python
u_k = -K_k @ x_k
x_{k+1} = A @ x_k + B @ u_k
```

## Python Implementation

```python
def finite_horizon_lqr(A, B, Q, R, N, x0):
    nx, nu = A.shape[0], B.shape[1]
    K = np.zeros((nu, nx, N))
    P = Q.copy()

    # Backward pass
    for k in range(N-1, -1, -1):
        K[:,:,k] = np.linalg.solve(R + B.T @ P @ B, B.T @ P @ A)
        P = Q + A.T @ P @ (A - B @ K[:,:,k])

    # Return first control
    return -K[:,:,0] @ x0
```

## MPC Application

At each timestep:
1. Measure current state x
2. Solve finite-horizon LQR from x
3. Apply first control u_0
4. Repeat next timestep

Overview

This skill implements a finite-horizon Linear Quadratic Regulator (LQR) solver using dynamic programming for Model Predictive Control (MPC). It computes time-varying feedback gains over a specified horizon and returns the optimal first control action for the current state. The implementation is lightweight and suitable for online MPC loops where a quadratic cost and linear dynamics are assumed.

How this skill works

The solver runs a backward Riccati recursion to compute a sequence of cost-to-go matrices and corresponding feedback gains from the horizon end to the present. After the backward pass it performs a forward simulation rule: the first control is u0 = -K0 x0, where K0 is the gain at time 0. In MPC use, this process is repeated each control step: measure the state, solve the finite-horizon LQR, apply the first control, and shift the horizon.

When to use it

You have linear time-invariant or time-varying system dynamics and quadratic stage/final costs.
You need an online MPC controller that is computationally cheap and deterministic.
You can re-solve an optimal control problem at each control step with moderate horizon length.
You want time-varying feedback gains that account for finite horizon effects.
You prefer an exact dynamic-programming solution under the LQR assumptions.

Best practices

Choose horizon N to balance performance and computation; longer horizons improve performance but cost more.
Regularize R (make it positive definite) to ensure numerical stability during matrix solves.
Initialize terminal cost P_N to a meaningful value (e.g., solution of infinite-horizon Riccati or Q) to improve terminal behavior.
Use numerically robust linear solvers for (R + B' P B) and check conditioning.
Recompute gains online at each timestep for true MPC behavior rather than using open-loop sequences.

Example use cases

Real-time MPC for small robots or drones with linearized dynamics.
Trajectory tracking where quadratic penalties on state deviation and control effort are appropriate.
Fast approximate optimal control for systems with tight CPU budgets and moderate horizon lengths.
Warm-starting more complex nonlinear MPC by linearizing around the current trajectory.
Teaching and prototyping LQR/MPC concepts in control courses or labs.

FAQ

What should I pick for the terminal cost P_N?

Use the infinite-horizon Riccati solution if stability is critical, otherwise setting P_N = Q is a simple default.

Can this handle constraints on states or inputs?

No—finite-horizon LQR assumes unconstrained linear dynamics with quadratic costs. Use constrained MPC solvers for hard limits.

How often must I re-solve the finite-horizon problem?

Re-solve at every control timestep for receding-horizon MPC. If dynamics are static and environment is unchanged, less frequent updates can be used but reduce reactivity.