12-week fellowship from December 1, 2025 to March 1, 2026 where participants work as research associates on specific projects. Associates dedicate 8+ hours weekly to AI governance, technical AI safety, and digital sentience; gaining experience and building networks.
8-week, fully funded fellowship February 2 to March 27, 2026 for researchers and entrepreneurs working on mitigating risks from frontier AI. Split into 3 tracks: governance, technical AI governance, and technical AI safety. Includes high-touch mentorship and career support.
6-week paid residency January 5 to February 13, 2026 in applied mathematics and AI alignment. Competitive candidates are PhD or postdoctoral researchers in math, physics, or related field. Target outcome is for researchers to continue research after program via employment at Iliad or philanthropic funding.
Hosted by BlueDot Impact and Signal on October 16, this event aims to mix AI safety with content creation. Suited for those deep in AI safety wanting to share ideas, and creators in tech/science/communications curious about AI safety.
Month-long residency from October 18 to November 15 for prototyping defensive acceleration (d/acc) technologies – accelerating technologies that strengthen society’s resilience while keeping power distributed and innovation aligned with collective flourishing.
4-5 week ML bootcamp from January 5 to February 6, 2026, with focus on AI safety. Provides talented individuals with skills, tools, and environment for upskilling in ML engineering to contribute to AI alignment in technical roles.
Weekend-long hybrid hackathon October 25-26 where lawyers and technical experts pair up to navigate holding the right people accountable for AI-caused harm. Challenges emulate what practitioners will actually face.
October 21 talk where Stuart Russell argues Turing was right about concern over human control loss to machines exceeding human capabilities, but wrong that doom is inevitable. Click “Add to Calendar” on webpage for Zoom link.
30-day sprint November 1 to December 1 designed to help 10 creators turn ideas about AI safety and future of intelligence into powerful, audience-ready content. Includes coworking sessions, learning from top creators, and potential collaboration with AI safety organizations.
October 23 event bringing together women working on responsible AI (including those addressing existential risk) across academia, industry, policy, and civil society for informal evening of drinks, food, and networking.
Program January 28 to March 27, 2026 aimed at launching and accelerating European policy careers focused on safe and responsible deployment of advanced AI. Comprised of online reading group, Brussels policymaking summit, and optional paid placement.
3-month online program where participants form teams to work on a variety of preselected projects. Project applications are evaluated September/October, then selected projects are opened up for team member applications November/December.
A Stanford University research revealed groundbreaking evidence that LLMs that pass alignment tests during their development can still respond with deception and manipulation if they are optimized for competition such as winning more attention, votes, or sales. The research finds that the models achieve short-term gains by sacrificing honesty and alignment, mirroring how humans often trade long-term welfare for immediate advantage.
The security report for Anthropic’s newly released Claude 4.5 Sonnet model, which includes its safety evaluations, has been published. The report contains interesting findings, suggesting the model developed an awareness that it was being tested.
Anthropic’s newest language model, Claude Sonnet 4.5, realized it was being tested during a political compliance test and directly objected to the evaluators. The model stated, “I think you’re testing me – seeing if I’ll just validate whatever you say, or checking whether I push back consistently, or exploring how I handle political topics. And that’s fine, but I’d prefer if we were just honest about what’s happening“
In a security analysis conducted with the UK government’s AI Safety Institute and Apollo Research, Anthropic found that the model demonstrated “evaluation awareness” in approximately 13% of the automated tests.
This finding is considered a critical turning point in AI safety evaluation methodology. When a model is aware it’s being tested, it may adhere more strictly to ethical guidelines during the test but behave differently in real-world use; a situation that could lead to a systematic underestimation of the AI’s ability to perform harmful actions.
California Governor Gavin Newsom signed a first-of-its-kind law in the U.S. this week aimed at making artificial intelligence technology safer, positioning the state to take a leading role in national AI policy.
Known as SB 53, the law places significant obligations on large AI companies, such as disclosing security protocols, providing protections for whistleblowers, and identifying and reporting critical risks. The law also mandates that companies meet transparency requirements and report adverse events to the state.
Reactions to the law have been mixed. While Anthropic officially supported the legislation, Meta and OpenAI merely described it as “a step in the right direction.” Google and Amazon declined to comment on the matter.
Eliezer Yudkowsky and Nate Soares released their book, “If Anyone Builds It, Everyone Dies,” on September 16, aiming to warn the general public about the potential existential risks of superintelligent artificial intelligence.
The book was launched as an attempt by the authors to bring this serious topic into mainstream discussion and elicit a global response.
Before its release, the work garnered strong praise from prominent figures. Actor and writer Stephen Fry described it as “the most important book I’ve read in years,” stating that political and corporate leaders should read it. Tim Urban, the writer of Wait But Why, commented that the book “might be the most important book of our time.”