Mathematics Model Prompt Evaluator Job at SaidGig, Remote

eC9FREVPaC9ucDl6ZzczVzdrVjBkTGN6Umc9PQ==
  • SaidGig
  • Remote

Job Description

Role Overview

Expert mathematicians are invited to author and verify high-quality open-ended prompts for AI model evaluation. In this role, you will craft and review challenging, unambiguous mathematical problems across core subdomains, assessing AI reasoning quality and helping establish rigorous evaluation standards for frontier language models.

Task Types

You will be assigned one of two task types:

  • Authoring Task: Create 5 original, open-ended prompts from your assigned subdomain at varying difficulty levels (undergraduate, advanced undergraduate, or graduate/professional). Prompts should require human judgment to evaluate the quality of the AI''s response, such as chain-of-thought reasoning or proof construction.
  • Verification Task: Review 5 authored prompts for clarity, scope alignment, difficulty accuracy, and uniqueness. Edit prompts and difficulty ratings where needed.
Mathematics Subdomains Covered

Probability & Statistics, Algebra (including Linear Algebra), Ordinary/Partial Differential Equations & Dynamical Systems, Geometry, Graph Theory, Number Theory.

Key Responsibilities
  • Author clear, unambiguous, open-ended mathematical prompts that elicit evaluable AI responses.
  • Verify prompts are within the scope of the assigned subdomain and correctly rated for difficulty.
  • Ensure all 5 prompts in a task are sufficiently distinct from one another with varying difficulty levels.
  • Apply expert judgment to assess the depth and quality of mathematical reasoning required.
  • Edit prompts and difficulty assignments where standards are not met.
Ideal Qualifications
  • Master''s degree or higher in Mathematics, Applied Mathematics, Statistics, or a closely related field.
  • 2–6 years of professional or research experience in a quantitative field.
  • Strong command of graduate-level mathematical concepts including proof writing, analysis, and formal reasoning.
  • Experience in academic research, mathematical competition design, or quantitative industry roles is a plus.
  • Excellent written English and ability to craft precise, well-scoped technical questions.
Work Terms

Expected commitment: 10+ hours/week. Asynchronous, fully remote work.

Job Tags

Remote job

Similar Jobs

Drive Time Transports

CDL A Truck Driver needed! Home weekly - 4.5 day loop, home for reset! GREAT cpm Job at Drive Time Transports

 ...TRUCK DRIVERSMUST LIVE WITHIN 75 MILES OF EL PASO, TX!***NO MORE THAN 2 JOBS IS THE LAST...  ...* HOME WEEKLY! Loop takes approx, 4 1/2 days and then home for reset! Avg weekly: $16...  ...previous 3 years ~ Violating State or local law relating to motor vehicle traffic... 

Compose.ly

Content Manager/Copywriter Job at Compose.ly

 ...Role Overview We're seeking a Content Strategist and Copywriter who can create thematic, long-form content while also delivering high...  ...performance and recommend optimizations Required Skills & Experience ~4+ years in content marketing and copywriting ~ Strong long... 

Bayer

Staff Engineer, Electromechanical Design Job at Bayer

At Bayer were visionaries, driven to solve the worlds toughest challenges and striving for a world where 'Health for all Hunger for none is no longer a dream, but a real possibility. Were doing it with energy, curiosity and sheer dedication, always learning from unique...

Express Employment Professionals - Mentor

Sign Installer Job at Express Employment Professionals - Mentor

 ...Position: Full-Time Sign Installer Agile Sign & Lighting Maintenance is seeking a motivated and reliable Sign Installer to join our team. This is a hands-on role that will grow over time and involves installation, maintenance, and troubleshooting of a variety of signage... 

Elite Sports Clubs

Summer Food & Beverage Intern Job at Elite Sports Clubs

 ...and outs of food & beverage operations in a fast-paced, high-energy club environment. At Elite Sports Clubs, our Food & Beverage Intern will get hands-on experience across poolside service, events, and day-to-day operationsworking alongside leaders to understand...