Job Title: AI Evaluation Specialist

Job Type: Contractor

Location: Remote

Job Summary: In this role, you'll apply your expertise to help train next-generation AI systems. Your work will shape how models learn, reason, and perform through high-quality, real-world input.

Key Responsibilities:

Design and implement self-contained evaluation tasks, including prompts, supporting files, and detailed grading rubrics to assess AI performance on practical computer-based workflows.
Define clear, unambiguous written criteria for what constitutes successful and unsuccessful task completion across diverse administrative and workflow scenarios.
Meticulously observe and document AI agent behaviors, producing crisp, precise summaries and reports in high-quality English.
Iterate and refine evaluation tasks and rubrics based on feedback and team collaboration to ensure robust benchmarking methodologies.
Work cross-functionally across a wide range of domains, adapting evaluation frameworks as project requirements evolve.
Collaborate with the customer's team to share insights and help drive continuous improvement in AI evaluation techniques.
Champion meticulousness, structured observation, and clear written communication throughout all project deliverables.

Required Skills and Qualifications:

Minimum 3 years of experience in roles emphasizing written precision and structured thinking—such as paralegal, executive assistant, junior analyst, librarian, document archival specialist, research assistant, technical writer, QA analyst, etc.
Native or fluent in English writing, with a demonstrated ability to produce observations that are succinct, specific, and unambiguous.
Proven skill in designing or applying rubric-based evaluation, grading against set criteria, or building structured scoring frameworks.
High attention to detail and ability to notice subtle patterns or inconsistencies others might miss.
Exceptional written and verbal communication skills, especially for documenting nuanced observations and feedback.
Fluency in navigating computers, common SaaS tools, web browsers, file management, and document editing platforms.
Strong self-direction, with the ability to independently take ownership of ambiguous or loosely defined projects.

Preferred Qualifications:

Prior experience evaluating AI outputs or participating in technology-driven process improvement projects.
Background in developing or refining evaluation rubrics or scoring methodologies.
Comfort working across multiple domains and adapting quickly to new types of workflow challenges.