home / skills / benchflow-ai / skillsbench / finite-horizon-lqr

This skill enables finite-horizon LQR-based MPC control by computing backward Riccati gains and applying the first optimal input.

npx playbooks add skill benchflow-ai/skillsbench --skill finite-horizon-lqr

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.1 KB
---
name: finite-horizon-lqr
description: Solving finite-horizon LQR via dynamic programming for MPC.
---

# Finite-Horizon LQR for MPC

## Problem Formulation

Minimize cost over horizon N:
```
J = Σ(k=0 to N-1) [x'Qx + u'Ru] + x_N' P x_N
```

## Backward Riccati Recursion

Initialize: P_N = Q (or LQR solution for stability)

For k = N-1 down to 0:
```python
K_k = inv(R + B'P_{k+1}B) @ B'P_{k+1}A
P_k = Q + A'P_{k+1}(A - B @ K_k)
```

## Forward Simulation

Starting from x_0:
```python
u_k = -K_k @ x_k
x_{k+1} = A @ x_k + B @ u_k
```

## Python Implementation

```python
def finite_horizon_lqr(A, B, Q, R, N, x0):
    nx, nu = A.shape[0], B.shape[1]
    K = np.zeros((nu, nx, N))
    P = Q.copy()

    # Backward pass
    for k in range(N-1, -1, -1):
        K[:,:,k] = np.linalg.solve(R + B.T @ P @ B, B.T @ P @ A)
        P = Q + A.T @ P @ (A - B @ K[:,:,k])

    # Return first control
    return -K[:,:,0] @ x0
```

## MPC Application

At each timestep:
1. Measure current state x
2. Solve finite-horizon LQR from x
3. Apply first control u_0
4. Repeat next timestep

Overview

This skill implements a finite-horizon Linear Quadratic Regulator (LQR) solver using dynamic programming for Model Predictive Control (MPC). It computes time-varying feedback gains over a specified horizon and returns the optimal first control action for the current state. The implementation is lightweight and suitable for online MPC loops where a quadratic cost and linear dynamics are assumed.

How this skill works

The solver runs a backward Riccati recursion to compute a sequence of cost-to-go matrices and corresponding feedback gains from the horizon end to the present. After the backward pass it performs a forward simulation rule: the first control is u0 = -K0 x0, where K0 is the gain at time 0. In MPC use, this process is repeated each control step: measure the state, solve the finite-horizon LQR, apply the first control, and shift the horizon.

When to use it

  • You have linear time-invariant or time-varying system dynamics and quadratic stage/final costs.
  • You need an online MPC controller that is computationally cheap and deterministic.
  • You can re-solve an optimal control problem at each control step with moderate horizon length.
  • You want time-varying feedback gains that account for finite horizon effects.
  • You prefer an exact dynamic-programming solution under the LQR assumptions.

Best practices

  • Choose horizon N to balance performance and computation; longer horizons improve performance but cost more.
  • Regularize R (make it positive definite) to ensure numerical stability during matrix solves.
  • Initialize terminal cost P_N to a meaningful value (e.g., solution of infinite-horizon Riccati or Q) to improve terminal behavior.
  • Use numerically robust linear solvers for (R + B' P B) and check conditioning.
  • Recompute gains online at each timestep for true MPC behavior rather than using open-loop sequences.

Example use cases

  • Real-time MPC for small robots or drones with linearized dynamics.
  • Trajectory tracking where quadratic penalties on state deviation and control effort are appropriate.
  • Fast approximate optimal control for systems with tight CPU budgets and moderate horizon lengths.
  • Warm-starting more complex nonlinear MPC by linearizing around the current trajectory.
  • Teaching and prototyping LQR/MPC concepts in control courses or labs.

FAQ

What should I pick for the terminal cost P_N?

Use the infinite-horizon Riccati solution if stability is critical, otherwise setting P_N = Q is a simple default.

Can this handle constraints on states or inputs?

No—finite-horizon LQR assumes unconstrained linear dynamics with quadratic costs. Use constrained MPC solvers for hard limits.

How often must I re-solve the finite-horizon problem?

Re-solve at every control timestep for receding-horizon MPC. If dynamics are static and environment is unchanged, less frequent updates can be used but reduce reactivity.