AI researcher

Required Skills

LangChain
RLHF
Artificial Inteligence
Tech Research
About micro1
micro1 is a data engine that helps AI labs train foundational models and enterprises build AI agents. We provide frontier evaluations and reinforcement learning environments used to improve LLM capabilities, as well as contextual evaluations used to monitor and improve AI agents in enterprise settings. Our data engine includes an AI recruiter agent that sources and vets domain experts, a data platform that enables rapid production of high-quality training data, and a pipeline performance system that ensures both quality and velocity.
Our goal is to have 1 billion people doing meaningful work by contributing their expertise to the development of frontier AI models. We’ve raised $40M+ in funding, and our AI recruiter has powered more than 1 million AI-led interviews as our global network of experts expands to form the human intelligence layer for AGI.

Job Description

Job Type: Full-time

Location: Remote (Anywhere)

Total Compensation: $220k–320k


Job Summary:

At micro1, we’ve built an AI recruitment engine to help companies hire top global talent. Our AI agent, Zara, autonomously sources & vets candidates, cutting recruitment costs by 87%.


We work with top AI Labs to help them build robust human-in-the-loop evaluation pipelines, conduct post-training model evaluations, and identify critical failure modes. Long term, we aim to become the default infrastructure for human oversight of frontier models and create the most reliable pre-vetted global talent pool for AI labs.

We’re hiring a US-based AI Researcher to help push the boundaries of LLM and VLM evaluation. You’ll join our core research team to work on real-world evaluation challenges—designing custom eval sets, identifying model failure modes, and partnering with leading labs to improve model robustness and alignment. If you care about model safety, enjoy fast prototyping, and want to shape how foundation models are benchmarked, this role is for you.


What You’ll Do:

  • Design custom evaluation sets and benchmark tasks for reasoning, math, safety, and planning
  • Develop failure taxonomies for hallucinations, refusal behavior, overconfidence, and jailbreaks
  • Build scalable human-in-the-loop workflows for rubric-based scoring, preference ranking, and adversarial testing
  • Work directly with top AI labs to stress test frontier models and identify hidden failure patterns
  • Prototype automated evaluation pipelines and agent-assisted evaluators to scale human oversight
  • Lead or support the development of GAIA-style evaluations or internal benchmarks for labs
  • Publish internal memos and external papers on findings from research pilots and lab collaborations
  • Collaborate with researchers from top AI Labs on joint eval initiatives.



What We’re Looking For:

  • Strong Python fundamentals and experience writing production-grade research code
  • Hands-on experience with LangChain, LangGraph, and RLHF or RLAIF evaluation pipelines
  • Familiarity with model evaluation frameworks (e.g., TruthfulQA, MMLU, ARC, MT-Bench, GAIA, VLM evals)
  • Deep understanding of foundation model failure modes and evaluation methodologies
  • Graduate degree in CS or related field (PhD/Master’s preferred), or equivalent research experiences.


Bonus If You Have:

  • Publications in top-tier AI conferences or journals (NeurIPS, ICLR, ICML, ACL, etc.)
  • Experience working with top AI labs
  • Competitive gaming experience or deep understanding of multi-agent interaction
  • Exposure to VLM evaluation, multimodal reasoning, or agentic collaboration frameworks.


Perks & Details:

  • Work closely with founders and leading AI researchers on problems that matter
  • Drive real impact: your evaluations shape how leading labs measure model safety and capability
  • Hardcore but flexible schedules with a remote global team

Apply now

Please note that after completing the interview process, you’ll be added to our talent pool and considered for this and other roles that match your skills.

Have any questions? See FAQs

Refer and Earn