读论文-DexFormer

读论文 2026-02-28

读论文-DexFormer

2026-02-28

link: https://arxiv.org/pdf/2602.08278

This work introduced a history-conditioned-transformer-based morphology-agnostic policy that could be transferred zero-shot to different dex hand embodiments.

Introduction

Different dex hands have different dynamics, leading to different behavior with a same control strategy. The sim2real gap becomes more severe when considering multiple embodiments.

DexFormer learns about morphology from observation histories, enabling an online adaptation to different hands. To train DexFormer, they construct different hands by procedural randomization of canonical dex hands.

Methodology

The observation includes hand, arm, and object point cloud, and the training objective is to maximize discounted rewards.

Shared action space

As hands are different, it is crucial to define a \(D_F\) dimension shared action space.

For hand with missing joints, we simply do a zero-padding.

The executed finger action was obtained after doing a temporal smoothing \(a_t=\lambda a'_t+(1-\lambda) a_{t-1}\).
Embodiment Generation

New hands are generated by sampling from a distribution defined over morphology parameters of canonical hands, preserving their original topological structure.
History conditioned transformer

Let \(h_t=o_{t-H+1,\dots,t}\), that is a window with length \(H\) of observations.

After adding learned position embedding, they are fed into a transformer encoder with a casual mask to get the action distribution of \(a'_{t}\).