All jobs
VLA Researcher
TechTree's client
Salary
$80k – $100k
Work mode
On-site
What we're looking for
- 5–8+ years in machine learning research/engineering
- At least 2+ years in robotics or VLA-related work
- Led or significantly contributed to training large-scale VLM/LLM or multi-modal foundation models
- Hands-on experience training action-conditioned models
- Built and maintained large-scale multi-modal datasets
- Proven track record deploying policies on real robot hardware
- Prior experience designing and running large-scale GPU training jobs
- Published or shipped work related to robotics learning, VLA, or multi-modal modeling
- Experience in a startup or fast-paced research environment
About the role
The job involves working with deep learning technologies, specifically focusing on transformers, self-supervised learning, and multi-modal models. The role requires strong proficiency in PyTorch and experience with large-scale distributed training. A deep understanding of vision-language-action model design, including policy transformers, diffusion policies, behavior cloning, and temporal modeling, is essential.
The position also requires knowledge of robotics fundamentals, such as kinematics/dynamics, manipulation, teleoperation data, and policy deployment on real robots. Experience with sensor fusion, including RGB, depth, and proprioception, as well as multi-modal representation learning, is necessary.
The job involves developing data pipelines for video-language datasets, robot demonstrations, and simulation data using tools like Isaac Sim and MuJoCo. Expertise in pre-training and post-training processes, including fine-tuning, evaluation, and model optimization for inference, is required.
Strong engineering capabilities in Python, machine learning infrastructure, ROS/ROS2, and simulation tools are needed. The role includes leading or significantly contributing to training large-scale vision-language models, language models, or multi-modal foundation models. Hands-on experience in training action-conditioned models, building and maintaining large-scale multi-modal datasets, and deploying policies on real robot hardware is expected.
The job also involves designing and running large-scale GPU training jobs and publishing or shipping work related to robotics learning, vision-language-action, or multi-modal modeling. Experience in a startup or fast-paced research environment, delivering end-to-end experiments and rapid iterations, is beneficial. The position is based in Warsaw, with a salary of $100k USD plus equity.
The position also requires knowledge of robotics fundamentals, such as kinematics/dynamics, manipulation, teleoperation data, and policy deployment on real robots. Experience with sensor fusion, including RGB, depth, and proprioception, as well as multi-modal representation learning, is necessary.
The job involves developing data pipelines for video-language datasets, robot demonstrations, and simulation data using tools like Isaac Sim and MuJoCo. Expertise in pre-training and post-training processes, including fine-tuning, evaluation, and model optimization for inference, is required.
Strong engineering capabilities in Python, machine learning infrastructure, ROS/ROS2, and simulation tools are needed. The role includes leading or significantly contributing to training large-scale vision-language models, language models, or multi-modal foundation models. Hands-on experience in training action-conditioned models, building and maintaining large-scale multi-modal datasets, and deploying policies on real robot hardware is expected.
The job also involves designing and running large-scale GPU training jobs and publishing or shipping work related to robotics learning, vision-language-action, or multi-modal modeling. Experience in a startup or fast-paced research environment, delivering end-to-end experiments and rapid iterations, is beneficial. The position is based in Warsaw, with a salary of $100k USD plus equity.
Equity details
Type: equity
Details: negotiable