读论文-过目合集

读论文 2026-01-16

读论文-过目合集

2026-01-16

Sim2real Image Translation Enables ViewpointRobust Policies from Fixed-Camera Datasets

link: https://arxiv.org/pdf/2601.09605

Image translation: we have lots of simulated trajectories, and use image translation to translate the simulated image to a sim2real image.

MANGO aims to solve the problems of traditional methods, including:

diffusion is too slow. MANGO uses GAN.
fail to generalize on different viewpoints on the fixed-viewpoints target domain

Method: Go through a encoder-decoder model to get a sim2real translated image with the following losses.

Use gt segmentation to calculate a segNCE loss, to make sure a pixel feature is similar to other pixel of the same seg class.
Encode the result image again to get PatchNCE loss
GAN loss with fixed-viewpoint real image.

Result: has rather comparable performance with 35M parameter ACT to the 4.5B VISTA(another viewpoint augmentation method). However MANGO is just much faster.