All jobs

VLA Researcher

TechTree's client
Salary
$80k – $100k
Work mode
On-site
Warsaw, PolandOn-sitePosted 17 weeks ago

What we're looking for

  • 5–8+ years in machine learning research/engineering
  • At least 2+ years in robotics or VLA-related work
  • Led or significantly contributed to training large-scale VLM/LLM or multi-modal foundation models
  • Hands-on experience training action-conditioned models
  • Built and maintained large-scale multi-modal datasets
  • Proven track record deploying policies on real robot hardware
  • Prior experience designing and running large-scale GPU training jobs
  • Published or shipped work related to robotics learning, VLA, or multi-modal modeling
  • Experience in a startup or fast-paced research environment

About the role

The job involves working with deep learning technologies, specifically focusing on transformers, self-supervised learning, and multi-modal models. The role requires strong proficiency in PyTorch and experience with large-scale distributed training. A deep understanding of vision-language-action model design, including policy transformers, diffusion policies, behavior cloning, and temporal modeling, is essential.

The position also requires knowledge of robotics fundamentals, such as kinematics/dynamics, manipulation, teleoperation data, and policy deployment on real robots. Experience with sensor fusion, including RGB, depth, and proprioception, as well as multi-modal representation learning, is necessary.

The job involves developing data pipelines for video-language datasets, robot demonstrations, and simulation data using tools like Isaac Sim and MuJoCo. Expertise in pre-training and post-training processes, including fine-tuning, evaluation, and model optimization for inference, is required.

Strong engineering capabilities in Python, machine learning infrastructure, ROS/ROS2, and simulation tools are needed. The role includes leading or significantly contributing to training large-scale vision-language models, language models, or multi-modal foundation models. Hands-on experience in training action-conditioned models, building and maintaining large-scale multi-modal datasets, and deploying policies on real robot hardware is expected.

The job also involves designing and running large-scale GPU training jobs and publishing or shipping work related to robotics learning, vision-language-action, or multi-modal modeling. Experience in a startup or fast-paced research environment, delivering end-to-end experiments and rapid iterations, is beneficial. The position is based in Warsaw, with a salary of $100k USD plus equity.

Equity details

Type: equity
Details: negotiable

Ready to apply?

Submit your application today.

VLA Researcher
$80k – $100k