PhD Researcher in STEM Model Evaluation Job at SaidGig, Remote

eC9zQkZPQi9uNUYwajdEZjZrQjNkTFE5Umc9PQ==
  • SaidGig
  • Remote

Job Description

Role Overview

Contribute to a pioneering project focused on evaluating frontier models by designing and validating complex benchmark tasks in data science, machine learning, finance, and coding. This role emphasizes the development of robust, real-world tasks with executable tests, followed by the analysis of model and agent behavior to identify reasoning and problem-solving gaps.

Key Responsibilities
  • Design challenging, real-world STEM problems.
  • Implement each task within an agentic development environment using Python.
Core Qualifications
  • Deep expertise in data science, machine learning, finance, and/or Python-based coding.
  • Active or recently graduated PhD from a top U.S.-based school.
  • Strong research background in frontier STEM topics.
  • Ability to engage reliably for 30+ hours per week, primarily on weekdays.
  • Demonstrated technical output, such as high-quality open-source contributions, especially in agentic or LLM tooling ecosystems.
  • Comfort with reading and reasoning about agent behavior traces to diagnose failure modes beyond surface-level errors.
More About the Opportunity
  • Initial focus area: agentic workflows for STEM tasks.
  • Familiarity with agentic frameworks and OSS ecosystems is beneficial (e.g., LangChain, MetaGPT, AutoGen, AutoGPT, CrewAI, LlamaIndex, BabyAGI, SuperAGI, CAMEL, AgentGPT, Dify).
  • Deliverables are expected to be reproducible and testable, with clear specifications, deterministic tests where possible, and documented environments.
About

is a talent marketplace connecting top experts with leading AI labs and research organizations. Our investors include Benchmark, General Catalyst, Adam D’Angelo, Larry Summers, and Jack Dorsey. Thousands of professionals across various domains, including law, creative fields, engineering, and research, have joined to work on groundbreaking projects shaping the future of AI.

Job Tags

Remote job, Summer work, Weekday work

Similar Jobs

Merck & Co.

Senior Specialist, Drug/Device Combo Products Engineer - Hybrid Job at Merck & Co.

Job DescriptionThis position is in our organization's Device Development & Technology (DD&T) group in our Research division.- Our organization's DD&T group is responsible for the development of devices, combination products, processes, testing and assembly equipment from...

Novo Nordisk Inc.

IT Infrastructure Analyst II-III (Onsite) Job at Novo Nordisk Inc.

About the Department Site New Hampshire, located in West Lebanon, is where Novo Nordisk's life-saving treatments are brought to life. Our manufacturing facility produces a global supply of our hemophilia and growth hormone product lines, as well as our next generation of...

Legend Biotech USA, Inc.

Operations Associate, 2nd Shift Job at Legend Biotech USA, Inc.

Legend Biotech is a global biotechnology company dedicated to treating, and one day curing, life-threatening diseases. Headquartered in Somerset, New Jersey, we are developing advanced cell therapies across a diverse array of technology platforms, including autologous and...

La-Z-Boy Incorporated

Interior Designer Job at La-Z-Boy Incorporated

 ...: An innovative leader responsible for the La-Z-Boy In-Home Design program and personalized client solutions while reflecting the...  ...or equivalent experience. Experience: 2-3 years in sales or interior design, with a residential emphasis. Skills: Proficient in Microsoft... 

Terrestris Global Solutions

Part-time Entry-Level Project Accountant Job at Terrestris Global Solutions

 ...grow professionally? Wecan help! We are seeking a Part-time Entry-Level Project Accountant to supportour growing team. This role is...  ...re looking for if you have: Bachelor's Degree in Business, Finance, Accounting, or a related field. A minimum of one (1) year...