2026-02-28
link: https://arxiv.org/pdf/2602.08278
This work introduced a history-conditioned-transformer-based morphology-agnostic policy that could be transferred zero-shot to different dex hand embodiments.
Introduction
Different dex hands have different dynamics, leading to different behavior with a same control strategy. The sim2real gap becomes more severe when considering multiple embodiments.
DexFormer learns about morphology from observation histories, enabling an online adaptation to different hands. To train DexFormer, they construct different hands by procedural randomization of canonical dex hands.
Methodology
The observation includes hand, arm, and object point cloud, and the training objective is to maximize discounted rewards.
Shared action space
As hands are different, it is crucial to define a \(D_F\) dimension shared action space.
For hand with missing joints, we simply do a zero-padding.
The executed finger action was obtained after doing a temporal smoothing \(a_t=\lambda a'_t+(1-\lambda) a_{t-1}\).
Embodiment Generation
New hands are generated by sampling from a distribution defined over morphology parameters of canonical hands, preserving their original topological structure.
History conditioned transformer
Let \(h_t=o_{t-H+1,\dots,t}\), that is a window with length \(H\) of observations.
After adding learned position embedding, they are fed into a transformer encoder with a casual mask to get the action distribution of \(a'_{t}\).