What we're looking for

5–8+ years in machine learning research/engineering
At least 2+ years in robotics or VLA-related work
Led or significantly contributed to training large-scale VLM/LLM or multi-modal foundation models
Hands-on experience training action-conditioned models
Built and maintained large-scale multi-modal datasets
Proven track record deploying policies on real robot hardware
Prior experience designing and running large-scale GPU training jobs
Published or shipped work related to robotics learning, VLA, or multi-modal modeling
Experience in a startup or fast-paced research environment

About the role

The job involves working with deep learning technologies, specifically focusing on transformers, self-supervised learning, and multi-modal models. The role requires strong proficiency in PyTorch and experience with large-scale distributed training. A deep understanding of vision-language-action model design, including policy transformers, diffusion policies, behavior cloning, and temporal modeling, is essential.

The position also requires knowledge of robotics fundamentals, such as kinematics/dynamics, manipulation, teleoperation data, and policy deployment on real robots. Experience with sensor fusion, including RGB, depth, and proprioception, as well as multi-modal representation learning, is necessary.

The job involves developing data pipelines for video-language datasets, robot demonstrations, and simulation data using tools like Isaac Sim and MuJoCo. Expertise in pre-training and post-training processes, including fine-tuning, evaluation, and model optimization for inference, is required.

Strong engineering capabilities in Python, machine learning infrastructure, ROS/ROS2, and simulation tools are needed. The role includes leading or significantly contributing to training large-scale vision-language models, language models, or multi-modal foundation models. Hands-on experience in training action-conditioned models, building and maintaining large-scale multi-modal datasets, and deploying policies on real robot hardware is expected.

The job also involves designing and running large-scale GPU training jobs and publishing or shipping work related to robotics learning, vision-language-action, or multi-modal modeling. Experience in a startup or fast-paced research environment, delivering end-to-end experiments and rapid iterations, is beneficial. The position is based in Warsaw, with a salary of $100k USD plus equity.

Equity details

Type: equity

Details: negotiable

VLA Researcher

What we're looking for

About the role

Equity details

Ready to apply?