Anthropic Fellows Program for AI safety research: applications open for May & July 2026

The Anthropic Fellows program provides funding and mentorship for engineers and researchers to investigate high-priority AI safety questions.

Program Highlights

In the first cohort, over 80% of fellows produced papers on topics including agentic misalignment, subliminal learning, rapid jailbreak response, and open-source circuits. More than 40% subsequently joined Anthropic as full-time employees.

Applications are now open for cohorts beginning in May and July 2026. The program will expand to cover more fellows across broader safety research areas: scalable oversight, adversarial robustness, AI control, model organisms, mechanistic interpretability, AI security, and model welfare.

What Fellows Work On

Fellows engage in 4-month empirical research projects aligned with Anthropic's priorities, aiming to produce public outputs such as papers. Anthropic mentors present project ideas that fellows select and develop collaboratively.

Security Research

Fellows have investigated AI misuse risks from cyberattacks. Recent work identified $4.6M in blockchain vulnerabilities and discovered zero-day exploits. Another project developed "techniques that block entire classes of high-risk jailbreaks after observing only a handful of attacks."

Interpretability

Fellows advanced understanding of large language model internals through new tracing methods. They created attribution graphs revealing model reasoning steps and open-sourced tools enabling researchers to visualize circuits and test hypotheses.

Model Organisms

Fellows explored agentic misalignment by stress-testing models in simulated environments where they could autonomously send emails and access sensitive information. They also studied subliminal learning, where behavioral traits transmit through semantically unrelated data.

Program Details

Compensation:

Weekly stipend: $3,850 USD / 2,310 GBP / $4,300 CAD
Compute funding: approximately $15,000/month
Close mentorship from Anthropic researchers

Duration: Four months (starting May or July 2026)

Career Outcomes: Over 40% of first cohort fellows joined Anthropic full-time; many others transitioned to full-time safety roles at other organizations.

Candidate Requirements

Strong candidates demonstrate:

Technical fundamentals: Python proficiency, ability to progress on ambiguous problems, clear thinking about complex technical questions
Motivation: Enthusiasm for mitigating catastrophic AI risks and transitioning into empirical AI safety research
Execution ability: Quick skill acquisition, effective debugging, project completion despite uncertainty

Previous experience requirements: No PhD, prior ML experience, or published papers necessary. Successful fellows came from physics, mathematics, computer science, cybersecurity, and other quantitative fields.

Application Process

For application details and to apply, visit the Anthropic jobs portal.

Anthropic Fellows Program for AI safety research: applications open for May & July 2026 ​

Program Highlights ​

What Fellows Work On ​

Security Research ​

Interpretability ​

Model Organisms ​

Program Details ​

Candidate Requirements ​

Application Process ​