Back to the Homepage Open PDF

Forest Navigation UAV

Choose how you want to continue

Forest Navigation UAV

Learning-Based Navigation in Dense Forests

Thodoris Evangelakos
Autonomous Agents - INF412
27-02-2026

The Problem

Autonomous UAV navigation in dense forests.

Goal:

Particular challenges:

Success Criteria

We define success strictly:

Speed is desired but safety is non negotiable.

System Architecture

Closed-loop pipeline:

Pipeline

Key idea: Learning handles complexity. Shield enforces safety.

Why This Approach?

Why Learning?

Why SAC?

Why a Safety Shield?

Training Strategy

Train in custom fast in-memory simulator (fastsim).

Why?

Then validate in:

Observations and Actions

Observation (96D vector)

Action

Reward Design (High-Level)

Reward encourages:

+ Progress toward goal
+ Speed aligned with goal
- Proximity to obstacles
- Per-step penalty
- Large collision penalty
- Stalling and/or truncating

Key lesson: reward shaping matters, otherwise the UAV exploits degenerate behaviors.

Results

Final Performance

Learning Curves

Reward curve and success/collision trends:

Reward Success

(Show only 1-2 most important plots during presentation.)

Demo

Demo video is linked from the main showcase page.

Key observations:

Engineering Challenges

1) Reward Hacking

Agent farmed speed rewards by circling around the map.

Fix:

Result: no more circling.

Engineering Challenges

2) Sim-to-Sim Transfer Gap

Policy trained in fastsim failed in Gazebo.

Cause:

Fix:

Engineering Challenges

3) Computational Bottleneck

Training was too slow due to CPU-bound raycasting and collision checks.

Fix:

Result: reasonable training times.

Engineering Challenges

4) Unrealistic Dynamics

Velocity commands caused “flying saucer” behavior.

Fix:

Result: improved transfer.

What Worked

Limitations

Future Work

Takeaways

Thank You

Questions?