PhD Researcher in STEM Model Evaluation Job at SaidGig, Remote

eC9zQkZPQi9uNUYwajdEZjZrQjNkTFE5Umc9PQ==
  • SaidGig
  • Remote

Job Description

Role Overview

Contribute to a pioneering project focused on evaluating frontier models by designing and validating complex benchmark tasks in data science, machine learning, finance, and coding. This role emphasizes the development of robust, real-world tasks with executable tests, followed by the analysis of model and agent behavior to identify reasoning and problem-solving gaps.

Key Responsibilities
  • Design challenging, real-world STEM problems.
  • Implement each task within an agentic development environment using Python.
Core Qualifications
  • Deep expertise in data science, machine learning, finance, and/or Python-based coding.
  • Active or recently graduated PhD from a top U.S.-based school.
  • Strong research background in frontier STEM topics.
  • Ability to engage reliably for 30+ hours per week, primarily on weekdays.
  • Demonstrated technical output, such as high-quality open-source contributions, especially in agentic or LLM tooling ecosystems.
  • Comfort with reading and reasoning about agent behavior traces to diagnose failure modes beyond surface-level errors.
More About the Opportunity
  • Initial focus area: agentic workflows for STEM tasks.
  • Familiarity with agentic frameworks and OSS ecosystems is beneficial (e.g., LangChain, MetaGPT, AutoGen, AutoGPT, CrewAI, LlamaIndex, BabyAGI, SuperAGI, CAMEL, AgentGPT, Dify).
  • Deliverables are expected to be reproducible and testable, with clear specifications, deterministic tests where possible, and documented environments.
About

is a talent marketplace connecting top experts with leading AI labs and research organizations. Our investors include Benchmark, General Catalyst, Adam D’Angelo, Larry Summers, and Jack Dorsey. Thousands of professionals across various domains, including law, creative fields, engineering, and research, have joined to work on groundbreaking projects shaping the future of AI.

Job Tags

Remote job, Summer work, Weekday work

Similar Jobs

Sprix USA

Education Program Partnerships & After-School Instructor Job at Sprix USA

 ...Description Education Program Partnerships & After-School InstructorLocation: Greater Los Angeles AreaCompany: SPRIX USA Inc....  ...organizations* Background in EdTech, STEM education, or coding programs* Familiarity with the Southern California education... 

Contemporary Staffing Solutions

Payroll Accounting Analyst Job at Contemporary Staffing Solutions

Job Title: Payroll Accounting AnalystJob Overview:Contemporary Staffing Solutions is seeking a Payroll Accounting Analyst to join our client to support accurate and compliant payroll operations while partnering closely with Finance and HR teams. This role focuses on... 

SolutionHealth

Office Coordinator Gastroenterology Full Time Job at SolutionHealth

 ...within the established policies and procedures of the Elliot Hospital the Office Coordinator will oversee the day-to-day operations...  ...Serving as a subject matter expert they will support efficient front desk operations and registration related revenue cycle activities working... 

Compass Group

PATIENT TRANSPORTER (FULL TIME) Job at Compass Group

 ...We are hiring immediately for a full time PATIENT TRANSPORTER position. Location : South Georgia Medical Center; 2501 N. Patterson St., Valdosta, GA 31602 Note: online applications accepted only . Schedule : 8-hour shifts: 1st, 2nd, and 3rd shift. Requirement... 

Landmark Hospitality

Event and Sales Manager Job at Landmark Hospitality

 ...Landmark Hospitality is looking for a passionate Event Stylist (Wedding & Event Sales Manager) to join our NJ team! Landmark...  ...locations in New Jersey. If you have experience in Event Sales, Event Planning, or additional qualifications that make you suitable for this role...