2025; Master's Thesis: Part 1 of 4: Foundation

Guining Pertin
Dec 31, 2025
4 min read

//certain sections of the original thesis have been removed

Introduction

This is part one of the four part series covering my Master's Thesis at KAIST titled "Accelerating Policy Learning for Robust Control of Robotic Manipulators and Aerial Vehicles via Physics-Informed Guidance" under Professor Dong Eui Chang. The full thesis can be found at KAIST library: THESIS. It has in-depth explanation of all the concepts, simulation and control design, and the experimental results. Please note that these posts do not cover the entire thesis in detail and is instead built upon my thesis defence slides.

NOTE: Parts of this thesis are still partly utilized for other research work, and so some sections have been removed.

My thesis and the thesis defence were well received by my supervisor Professor Dong Eui Chang, and the committee members, Professor Changdong Yoo, and Professor Kwang-Soo Kim. This blog post covers the preliminaries from the thesis, starting from the motivation, and summary of approaches.

Thesis abstract

Deploying autonomous robotic systems in maritime environments presents a dual control challenge: stabilizing vessel-mounted manipulators against stochastic wave motion and navigating aerial vehicles through aerodynamic disturbances. To resolve the conflict between the precision of model-based control and the adaptability of model-free learning, this thesis proposes a unified framework of Physics-Informed Guidance. We investigate this through three complementary methodologies applied to a Maritime Manipulator-Drone Loading System.

For manipulator stabilization, we introduce implicit guidance, utilizing Large Language Models (LLMs) to synthesize expert gain-scheduling heuristics, and explicit guidance, leveraging differentiable physics simulations for rapid gradient-based tuning. For agile drone tracking, we present structural guidance via a residual RL architecture, where a model-free agent augments an approximate optimal controller derived from the Hamilton-Jacobi-Bellman (HJB) equation. Experimental validation confirms that embedding physical priors into the learning loop significantly accelerates convergence and ensures robust performance against significant environmental and inertial perturbations compared to standard baselines.

The question that started the research

We have a manipulator placed on a ship that handles a payload and transfers to the drone. The drone loads it and flies in an agile manner, tracking a given trajectory, robust to payload mass and sea winds. The dynamics are complex, and the environment is extremely hostile, making it a very difficult engineering problem!

Controller designs for both manipulators and quadrotors often follow two lines of approach:

Traditional control: model-based like feedback linearization, cascade PID but they may fail to adapt.
Learning-based control: model-free like reinforcement learning (RL) that can adapt but take an enormous amount of time to learn anything useful.

We would prefer using learning-based approaches for their adaptability, but…

Why does a manipulator require learning kinematics, just to understand how it moves?
Why does a drone require learning hovering, just to understand how to fly stably?

Consider the example of a human walking on a tightrope. Humans already understand walking and balancing, and the only focus is on learning the task at hand at a higher abstraction level. This same problem brings up questions for robot learning:

Can a robot be fed with prior knowledge?
Can this prior knowledge help learn more important skills?
More importantly, how can we embed such knowledge?

We raise the following question:

Why must a robot learn the laws of physics from scratch?

and argue that true “intelligence” in robotics should not be defined by the ability to learn everything, but by the ability to adapt known models to unmodeled realities. We explore this idea of embedding physical knowledge into learning agents or utilizing such physical knowledge to improve learning agents through the lens of the manipulator-drone loading system.

The three founding ideas

We investigated methods to inject prior knowledge directly into the learning loop and categorize our contributions into three distinct methodologies

Implicit guidance via language models: (redacted)
Explicit guidance via differentiable simulation: For the same manipulator stabilization task, we investigated a more fundamental acceleration method. Traditional RL treats the environment as non-differentiable, relying on stochastic gradient estimation. Our second contribution explored gradient-based optimization via differentiable simulations. We utilized analytic gradients of the performance cost with respect to control parameters through the simulation steps, allowing us to deterministically optimize these parameters. We analyzed this as an exploratory investigation into rapid tuning, comparing its sample efficiency against standard RL.
Structural guidance via residual control: For the drone trajectory tracking task, the challenge differs: the nominal dynamics are well-known, but the aerodynamic reality involves complex, unmodeled nonlinearities. Our third contribution is a residual RL architecture, which provides structural guidance. Instead of learning a policy from scratch, we derive an approximate optimal controller using the Hamilton-Jacobi-Bellman (HJB) equation via the Nonlinear Systems Toolbox (NST) to serve as a fixed, stability-guaranteeing backbone. We then train an RL agent to learn only the gap between this nominal model and the physical reality. This hybrid architecture ensures the drone maintains the theoretical guarantees of model-based control while acquiring the robustness of data-driven methods without excessive computational burden.

All the above methods involve intricate model-based analysis and designs, and then includes model-free designs to further improve performance beyond what is achieveable via the individual approaches alone. The next three blogs cover each approach in detail along with the results and videos.