
Chain‑of‑Thought Data Curator
Required Skills
gold‑standard CoT datasets
Designs rubrics & evaluates multi‑step reasoning
Generalist
STEM‑leaning profile
Job Description
Job Title: Chain‑of‑Thought Data Curator
Job Type: Full-time or Part-time, contract
Location: Remote
Job Summary:
Join our customer's team as a Chain‑of‑Thought Data Curator and play a pivotal role in advancing large-language-model reasoning. You'll be responsible for crafting and evaluating gold-standard datasets that push the limits of multi-step reasoning in AI. Leverage your STEM-oriented and generalist mindset to create benchmarks that set the industry standard.
Key Responsibilities:
- Develop and curate gold-standard Chain-of-Thought (CoT) datasets across diverse reasoning-heavy tasks.
- Design clear, scalable rubrics and instructions to evaluate and annotate multi-step reasoning processes.
- Write precise, well-structured CoT responses that demonstrate high-level generalist reasoning, with a preference for STEM contexts.
- Critically assess logical flow, correctness, and justification within reasoning chains, ensuring rigor and fidelity.
- Identify and document common model failure types, such as hallucination, shortcut reasoning, and unsupported leaps.
- Collaborate with AI trainers, model evaluators, and RLHF annotators to refine CoT benchmarks and annotation protocols.
- Stress-test the depth and reliability of LLM reasoning across varied benchmarks.
Required Skills and Qualifications:
- Extensive experience in creating or curating CoT or instruction tuning datasets for AI/LLMs.
- Proven ability to design and implement binary or graded rubrics for evaluating multi-step reasoning outputs.
- Robust generalist analytical skills, ideally with a STEM or competitive exam background.
- Exceptional written and verbal communication abilities, with attention to clarity and structure.
- A deep understanding of LLM failure modes and reasoning pitfalls in model outputs.
- Experience balancing fine-grained evaluation criteria with scalable instructions for diverse teams.
- Background in RLHF annotation, AI model evaluation, or prompt engineering highly valued.
Preferred Qualifications:
- Experience with instruction tuning, model evaluation, or advanced prompt engineering projects.
- Exposure to cross-disciplinary reasoning tasks and datasets.
- Strong track record of collaborating with AI research or data curation teams.