Transformer Based Vision Guided Tissue Processing#
Overview#
This project demonstrates the automation of tissue packing using ViperX robotic arms and the Action Chunking Transformer (ACT) algorithm. A leader-follower setup was employed to collect demonstration data and train the ACT model for autonomous operation.
Hardware Configuration#
Leader-Follower System:
Leader: Two ViperX 300 S arms (750mm reach, 6 DOF, 750g payload) recorded tissue-packing motions.
Follower: Two ViperX Aloha arms mirrored leader movements via real-time joint position streaming at 50Hz.
Dataset Generation#
Collected Data:
Joint angle sequences (from leader arms)
Gripper states (open/close)
Time-synchronized camera feeds (2 viewpoints)
Demonstration Style:
Teleoperated leader arms performed 100+ tissue-folding and box-packing sequences.
ACT Implementation#
Key Adaptations for Tissue Packing#
Chunk Size: 14 actions (0.16s horizon) balanced responsiveness and error correction.
Input Modalities:
HDF5 file contents:
- action: <HDF5 dataset "action": shape (149, 14), type "<f8">
- observations:
- images:
- top: <HDF5 dataset "top": shape (149, 480, 640, 3), type "|u1">
- qpos: <HDF5 dataset "qpos": shape (149, 14), type "<f8">
Transformer Architecture:
6-layer encoder (vision + proprioception fusion)
4-layer decoder for action sequence prediction
Training Protocol#
Loss: L1 reconstruction + KL divergence (β=0.5)
Augmentations:
Random lighting variations
Synthetic occlusions (simulate tissue wrinkles)
Joint position noise (σ=0.8°)
Results#
Metric |
Leader Performance |
ACT Follower |
---|---|---|
Success Rate |
92.4% |
85.7% |
Cycle Time |
4.2s/item |
4.8s/item |
Positional Error (σ) |
0.5mm |
1.1mm |
Conclusion#
The system achieved autonomous operation with less than a 15% performance gap compared to human teleoperation, demonstrating ACT’s effectiveness for deformable object manipulation. Future work could integrate real-time haptic feedback and dynamic chunk-size adaptation.