AI researcher

Required Skills

LangChain
RLHF
Artificial Inteligence
Tech Research

Job Description

Job Title: AI Researcher

Job Type: Full-Time

Location: Remote


Job Summary:

At micro1, we’ve built an AI recruitment engine to help companies hire top global talent. Our AI agent, Zara, autonomously sources & vets candidates, cutting recruitment costs by 87%.


We work with top AI Labs to help them build robust human-in-the-loop evaluation pipelines, conduct post-training model evaluations, and identify critical failure modes. Long term, we aim to become the default infrastructure for human oversight of frontier models and create the most reliable pre-vetted global talent pool for AI labs.

We’re hiring an AI Researcher to help push the boundaries of LLM and VLM evaluation. You’ll join our core research team to work on real-world evaluation challenges—designing custom eval sets, identifying model failure modes, and partnering with leading labs to improve model robustness and alignment. If you care about model safety, enjoy fast prototyping, and want to shape how foundation models are benchmarked, this role is for you.


What You’ll Do:

  • Design custom evaluation sets and benchmark tasks for reasoning, math, safety, and planning
  • Develop failure taxonomies for hallucinations, refusal behavior, overconfidence, and jailbreaks
  • Build scalable human-in-the-loop workflows for rubric-based scoring, preference ranking, and adversarial testing
  • Work directly with top AI labs to stress test frontier models and identify hidden failure patterns
  • Prototype automated evaluation pipelines and agent-assisted evaluators to scale human oversight
  • Lead or support the development of GAIA-style evaluations or internal benchmarks for labs
  • Publish internal memos and external papers on findings from research pilots and lab collaborations
  • Collaborate with researchers from top AI Labs on joint eval initiatives.


What We’re Looking For:

  • Strong Python fundamentals and experience writing production-grade research code
  • Hands-on experience with LangChain, LangGraph, and RLHF or RLAIF evaluation pipelines
  • Familiarity with model evaluation frameworks (e.g., TruthfulQA, MMLU, ARC, MT-Bench, GAIA, VLM evals)
  • Deep understanding of foundation model failure modes and evaluation methodologies
  • Graduate degree in CS or related field (PhD/Master’s preferred), or equivalent research experiences.


Bonus If You Have:

  • Publications in top-tier AI conferences or journals (NeurIPS, ICLR, ICML, ACL, etc.)
  • Experience working with top AI labs
  • Competitive gaming experience or deep understanding of multi-agent interaction
  • Exposure to VLM evaluation, multimodal reasoning, or agentic collaboration frameworks.


Perks & Details:

  • Work closely with founders and leading AI researchers on problems that matter
  • Drive real impact: your evaluations shape how leading labs measure model safety and capability
  • Hardcore but flexible schedules with a remote global team

Apply now

Please note that by applying & completing our interview process, you will be added to our talent pool. This means you’ll be considered for this and all other possible roles that may match your skills. These potential opportunities will be sent your way as a micro1 certified candidate.

Have any questions? See FAQs