Open-vocabulary Pick and Place via Patch-level Semantic Maps

Jia, Mingxi; Huang, Haojie; Zhang, Zhewen; Wang, Chenghao; Zhao, Linfeng; Wang, Dian; Liu, Jason Xinyu; Walters, Robin; Platt, Robert; Tellex, Stefanie

Computer Science > Robotics

arXiv:2406.15677 (cs)

[Submitted on 21 Jun 2024]

Title:Open-vocabulary Pick and Place via Patch-level Semantic Maps

Authors:Mingxi Jia, Haojie Huang, Zhewen Zhang, Chenghao Wang, Linfeng Zhao, Dian Wang, Jason Xinyu Liu, Robin Walters, Robert Platt, Stefanie Tellex

View PDF HTML (experimental)

Abstract:Controlling robots through natural language instructions in open-vocabulary scenarios is pivotal for enhancing human-robot collaboration and complex robot behavior synthesis. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and struggle with generalization. This paper introduces Grounded Equivariant Manipulation (GEM), a novel approach that leverages the generative capabilities of pre-trained vision-language models and geometric symmetries to facilitate few-shot and zero-shot learning for open-vocabulary robot manipulation tasks. Our experiments demonstrate GEM's high sample efficiency and superior generalization across diverse pick-and-place tasks in both simulation and real-world experiments, showcasing its ability to adapt to novel instructions and unseen objects with minimal data requirements. GEM advances a significant step forward in the domain of language-conditioned robot control, bridging the gap between semantic understanding and action generation in robotic systems.

Subjects:	Robotics (cs.RO)
Cite as:	arXiv:2406.15677 [cs.RO]
	(or arXiv:2406.15677v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2406.15677

Submission history

From: Mingxi Jia [view email]
[v1] Fri, 21 Jun 2024 22:49:23 UTC (29,855 KB)

Computer Science > Robotics

Title:Open-vocabulary Pick and Place via Patch-level Semantic Maps

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Open-vocabulary Pick and Place via Patch-level Semantic Maps

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators