AI researcher

Required Skills

LangChain

RLHF

Artificial Inteligence

Tech Research

Job Description

Job Title: AI Researcher

Job Type: Full-Time

Location: Remote

Job Summary:

At micro1, we’ve built an AI recruitment engine to help companies hire top global talent. Our AI agent, Zara, autonomously sources & vets candidates, cutting recruitment costs by 87%.

We work with top AI Labs to help them build robust human-in-the-loop evaluation pipelines, conduct post-training model evaluations, and identify critical failure modes. Long term, we aim to become the default infrastructure for human oversight of frontier models and create the most reliable pre-vetted global talent pool for AI labs.

We’re hiring an AI Researcher to help push the boundaries of LLM and VLM evaluation. You’ll join our core research team to work on real-world evaluation challenges—designing custom eval sets, identifying model failure modes, and partnering with leading labs to improve model robustness and alignment. If you care about model safety, enjoy fast prototyping, and want to shape how foundation models are benchmarked, this role is for you.

What You’ll Do:

Design custom evaluation sets and benchmark tasks for reasoning, math, safety, and planning
Develop failure taxonomies for hallucinations, refusal behavior, overconfidence, and jailbreaks
Build scalable human-in-the-loop workflows for rubric-based scoring, preference ranking, and adversarial testing
Work directly with top AI labs to stress test frontier models and identify hidden failure patterns
Prototype automated evaluation pipelines and agent-assisted evaluators to scale human oversight
Lead or support the development of GAIA-style evaluations or internal benchmarks for labs
Publish internal memos and external papers on findings from research pilots and lab collaborations
Collaborate with researchers from top AI Labs on joint eval initiatives.

What We’re Looking For:

Strong Python fundamentals and experience writing production-grade research code
Hands-on experience with LangChain, LangGraph, and RLHF or RLAIF evaluation pipelines
Familiarity with model evaluation frameworks (e.g., TruthfulQA, MMLU, ARC, MT-Bench, GAIA, VLM evals)
Deep understanding of foundation model failure modes and evaluation methodologies
Graduate degree in CS or related field (PhD/Master’s preferred), or equivalent research experiences.

Bonus If You Have:

Publications in top-tier AI conferences or journals (NeurIPS, ICLR, ICML, ACL, etc.)
Experience working with top AI labs
Competitive gaming experience or deep understanding of multi-agent interaction
Exposure to VLM evaluation, multimodal reasoning, or agentic collaboration frameworks.

Perks & Details:

Work closely with founders and leading AI researchers on problems that matter
Drive real impact: your evaluations shape how leading labs measure model safety and capability
Hardcore but flexible schedules with a remote global team

Apply now

First name

Last name

Phone

Linkedin profile URL

Find valid URL

Upload your resume

Click to upload or drag & drop (.pdf)

Please note that by applying & completing our interview process, you will be added to our talent pool. This means you’ll be considered for this and all other possible roles that may match your skills. These potential opportunities will be sent your way as a micro1 certified candidate.

Have any questions? See FAQs