TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment

1University of Michigan 2NVIDIA 3Carnegie Mellon University 4UC Berkeley 5University of Washington 6Microsoft Research

This work was partially done during Youngsun Wi's Meta FAIR internship. * Equal advising.

Abstract

We present TactAlign, a cross-sensor tactile alignment method for cross-embodiment human-to-robot policy transfer.

Human demonstrations collected by wearable devices (e.g., tactile gloves) provide fast and dexterous supervision for policy learning, and are guided by rich, natural tactile feedback. However, a key challenge is how to transfer human-collected tactile signals to robots despite the differences in sensing modalities and embodiment. Existing human-to-robot (H2R) approaches that incorporate touch often assume identical tactile sensors, require paired data, and involve little to no embodiment gap between human demonstrator and the robots, limiting scalability and generality. TactAlign transforms human and robot tactile observations into an shared latent representation using a rectified flow, without paired datasets, manual labels, or privileged information. Our method enables low-cost latent transport guided by hand-object interaction-derived pseudo-pairs.

We demonstrate that TactAlign improves H2R policy transfer across multiple contact-rich tasks (pivoting, insertion, lid closing), generalizes to unseen objects and tasks with human data (≤ 5 minutes), and enables zero-shot H2R transfer on a highly dexterous tasks (light bulb screwing)

Cross-Sensor Tactile Alignment

UMAP projection of Human and Robot Tactile Features

Rectified flow maps the glove latent distribution to overlap with the robot distribution. Colors denote normalized raw tactile magnitude (0: no contact, 1: highest force/shear), computed separately for glove and robot data.

[Source] Human Tactile Features

Human tactile features distribution.

[Target] Robot Tactile Features

Robot tactile features distribution.

[Result] Overlaid Distribution

Overlaid distribution result.

Rectified Flow from Human to Robot Tactile Features

Step = 0

Human-Robot Cotraining

We evaluate the effectiveness of TactAlign for H2R policy co-training on three representative contact-rich manipulation tasks: pivoting, insertion, and lid-closing. All three tasks require tactile reasoning and begin from a non-contact state, either between the fingertip and the object or between a randomly grasped object and the environment. The pivoting and insertion tasks evaluate generalization to unseen objects within the same task, while the lid closing task additionally tests our alignment module's generalization to an unseen task class not used during training.

For each task, we collect 140–160 human demonstrations, where 100 demonstrations (≈ 30 minutes) are from the same object seen by the robot ("seen-to-both" object), and 20 demonstrations are collected for each additional "human-only" object (≈ 5 minutes per object). For robot data, we collect 50 using a single training object (≈ 60 minutes).

Seen-by-both Human-only Unseen-by-both

Rollout Speed: 2x

TactAlign (Ours)

No Tactile (Baseline)

Robot Only (Baseline)

Co-training Results Summary

Comparison of co-training results across different methods: Robot-only (blue), Without Tactile (orange), Without Alignment (green), and TactAlign (red). Results are shown as average success rates and broken down by object categories (Seen by both, Seen by tactile, Held-out). We observe that augmenting robot training with human demonstrations, which are substantially easier to collect and ≈4× faster in our setting, markedly improves generalization to unseen objects.

Average Success Rate

Average success rate bar graph

Object-wise Success Rate

Object-wise success rate bar graph

Zero-shot Human-to-Robot (No Robot Data)

TactAlign enables dexterous, zero-shot, and generalizable human-to-robot transfer with touch. The light bulb screwing task highlights the following key aspects:

  • Human-level dexterity: Demonstrators rely on rich tactile feedback to guide precise finger motions during screwing.
  • Zero-shot policy transfer: The robot executes the task without task-specific robot demonstrations.
  • TactAlign's task-level generalization: We re-use tactile alignment learned from pivoting and insertion.
  • Fast data collection: 20 human demonstrations in ~10 minutes.

Human Demonstration

TactAlign (Ours)

No Tactile Alignment (Baseline)

No Tactile (Baseline)

BibTeX

@article{tactalign,
  author    = {Wi, Youngsun and Yin, Jessica and Xiang, Elvis and Sharma, Akash and Malik, Jitendra and Mukadam, Mustafa and Fazeli, Nima and Hellebrekers, Tess},
  title     = {TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment},
  journal   = {arXiv},
  year      = {2025},
}