Forest Navigation UAV

Learning-Based Navigation in Dense Forests

Thodoris Evangelakos
Autonomous Agents - INF412
27-02-2026

1 / 20Next ▶

The Problem

Autonomous UAV navigation in dense forests.

Goal:

Reach a target location
As fast as possible
Without collisions

Particular challenges:

Narrow passages and occlusions
Partial observability (LiDAR only)
High-speed control under real-time constraints
Collision is catastrophic

◀ Prev2 / 20Next ▶

Success Criteria

We define success strictly:

Success: Reach goal within 30s without collision
Collision: Any contact with obstacle
Time-to-goal: Measured only for successful runs
Minimum clearance: Smallest distance to obstacle
Shield intervention rate: % steps safety layer modified action

Speed is desired but safety is non negotiable.

◀ Prev3 / 20Next ▶

System Architecture

Closed-loop pipeline:

Sensors
Observation vector
SAC Policy
Safety Shield
Command
Simulator / Gazebo

Key idea: Learning handles complexity. Shield enforces safety.

◀ Prev4 / 20Next ▶

Why This Approach?

Why Learning?

Forests have structure and patterns
Hard to hand-engineer every case
Fast runtime inference

Why SAC?

Continuous control
Stable training
Good sample efficiency

Why a Safety Shield?

Learned policies are not guaranteed safe
Shield = deterministic last line of defense (usually)

◀ Prev5 / 20Next ▶

Training Strategy

Train in custom fast in-memory simulator (fastsim).

Why?

Orders of magnitude faster than Gazebo
16 parallel environments
10M timesteps feasible within project constraints

Then validate in:

Gazebo + ROS2
Procedurally generated forest worlds
Higher density than real forests (2x-10x as dense in tests)

◀ Prev6 / 20Next ▶

Observations and Actions

Observation (96D vector)

90 LiDAR beams (normalized)
cos(goal angle), sin(goal angle)
Normalized distance
Normalized speed, yaw rate
Normalized height error

Action

Forward acceleration
Yaw acceleration
Vertical acceleration
Scaled to physical limits, then filtered by safety shield

◀ Prev7 / 20Next ▶

Reward Design (High-Level)

Reward encourages:

+ Progress toward goal
+ Speed aligned with goal
- Proximity to obstacles
- Per-step penalty
- Large collision penalty
- Stalling and/or truncating

Key lesson: reward shaping matters, otherwise the UAV exploits degenerate behaviors.

◀ Prev8 / 20Next ▶

Results

Final Performance

85% success rate
11.95% collision rate
Median time to goal: 12.35s
Minimum clearance: 2.806m
Shield interventions: 8.43%

◀ Prev9 / 20Next ▶

Learning Curves

Reward curve and success/collision trends:

(Show only 1-2 most important plots during presentation.)

◀ Prev10 / 20Next ▶

Demo

Demo video is linked from the main showcase page.

Key observations:

Controlled navigation
Shield prevents imminent collisions
Smooth goal convergence
Low likelihood of getting stuck between trees

◀ Prev11 / 20Next ▶

Engineering Challenges

1) Reward Hacking

Agent farmed speed rewards by circling around the map.

Fix:

Virtual barrier around the map
Massive increase in success reward

Result: no more circling.

◀ Prev12 / 20Next ▶

Engineering Challenges

2) Sim-to-Sim Transfer Gap

Policy trained in fastsim failed in Gazebo.

Cause:

Oversimplified dynamics
Sensor mismatch

Fix:

Matched configurations
Froze height, roll, pitch initially
Refined action interface

◀ Prev13 / 20Next ▶

Engineering Challenges

3) Computational Bottleneck

Training was too slow due to CPU-bound raycasting and collision checks.

Fix:

Hash grid for trees
Lazy collision checking
Parallelized stepping

Result: reasonable training times.

◀ Prev14 / 20Next ▶

Engineering Challenges

4) Unrealistic Dynamics

Velocity commands caused “flying saucer” behavior.

Fix:

Switched to acceleration control
Added inertia realism

Result: improved transfer.

◀ Prev15 / 20Next ▶

What Worked

SAC handled continuous control well
Shield enabled aggressive but safe motion
Domain randomization improved robustness
Fast simulator enabled rapid iteration

◀ Prev16 / 20Next ▶

Limitations

Sim-to-sim gap still present
Shield can be overly conservative
Dynamics still simplified
No wind, no complex aerodynamics
No multi-agent behavior yet

◀ Prev17 / 20Next ▶

Future Work

Improve physics realism
Refine safety shield (less conservative)
Train in denser, more complex forests
Add vision-based perception
Multi-agent coordination
Real UAV testing

◀ Prev18 / 20Next ▶

Takeaways

Learning + hard safety layer is powerful
Fast training infrastructure is essential
Reward design and transfer are critical
Promising results, but not yet real-world ready

◀ Prev19 / 20Next ▶

Thank You

Questions?

◀ Prev20 / 20Restart ↺

Forest Navigation UAV

Choose how you want to continue

Forest Navigation UAV

Learning-Based Navigation in Dense Forests

The Problem

Success Criteria

System Architecture

Why This Approach?

Why Learning?

Why SAC?

Why a Safety Shield?

Training Strategy

Observations and Actions

Observation (96D vector)

Action

Reward Design (High-Level)

Results

Final Performance

Learning Curves

Demo

Engineering Challenges

1) Reward Hacking

Engineering Challenges

2) Sim-to-Sim Transfer Gap

Engineering Challenges

3) Computational Bottleneck

Engineering Challenges

4) Unrealistic Dynamics

What Worked

Limitations

Future Work

Takeaways

Thank You