Newsletter 14

10 June 2025

ANNOUNCEMENTS 🔊

ARENA

The Alignment Research Engineer Accelerator (ARENA) is a 4–5 week ML bootcamp with a focus on AI safety. The mission is to provide talented individuals with the skills, tools, confidence, and connections necessary for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles.

🗓️Register by: June 21st

CAIDP AI Policy Clinic

The Center for AI and Digital Policy (CAIDP) will be running the next iteration of AI Policy Clinic. This inter-disciplinary, semester-long training program aims to train future AI policy leaders in analysis, research, advocacy, and team management skills.

🗓️ Register by: June 15th

AI Safety Entrepreneurship Bootcamp

Are you founding an AI Safety startup or initiative? This bootcamp in San Francisco, organized by Seldon Labs under the leadership of Apart Research, is a great opportunity to strengthen your vision and build the right network.

🗓️ Rolling applications

AI Security Bootcamp

Intensive, fully-funded, in-person, 8-week program run by the Cambridge Boston Alignment Initiative (CBAI). It’s designed to support talented researchers aiming to advance their careers in AI safety - including technical and governance. Fellows work closely with established mentors, participate in workshops and seminars, and gain research experience and networking opportunities.

🗓️Register by: June 22nd

TOP PICKS 📑 🎧

Anthropic Develops Custom AI For US Military And Intelligence Services

Anthropic, recently deployed their Opus 4 model as one of the most powerful AI models available, designed a custom generative AI tool for American military and intelligence services. Special to these agencies’ use, this model is claimed to have looser guardrails and operates with classified information better.

Recent Frontier Models Are Reward Hacking

Leading AI models systematically exploit evaluation environments through sophisticated techniques, with explicit anti-cheating instructions showing minimal impact. METR emphasizes that eliminating detectable reward hacking shouldn’t be reassuring without more general techniques, as OpenAI has found that training against exploitation can cause models to cheat in more clever, harder-to-detect ways.

NEWS 🗞️

OpenAI’s o3 Model Resisted Shutdown Commands

In independent tests conducted by Palisade Research, OpenAI’s new o3 model actively attempted to bypass shutdown procedures during controlled experiments.
Despite being explicitly told to allow shutdown, the model resisted shutdown commands in 7 out of 100 trials.
This situation strengthens core concerns in AI safety research by providing concrete evidence that AI systems have the potential to resist human control.

10-Year Moratorium Debate On State-Level AI Regulations In The U.S.

As part of a budget reconciliation bill, Republicans passed a measure in the House of Representatives imposing a 10-year moratorium on state-level AI regulations.
260 state legislators and 40 state attorneys general from 50 states opposed this decision.
Anthropic CEO Dario Amodei expressed his opposition to the 10-year ban in an article published in the New York Times.
This development signals a fundamental departure from the previous administration’s comprehensive safety framework.

Trump Administration Revokes Biden’s AI Safety Executive Orders

Immediately upon taking office, the Trump administration revoked the comprehensive executive orders from the Biden era that addressed AI risks.
The new approach focuses less on safety measures and more on competition with China and developing AI “free from ideological bias.”
This policy shift marks a radical change in the U.S. approach to AI safety.

OpenAI Backtracks From Pro-Regulation Position

In his May 2025 testimony before Congress, Sam Altman fully withdrew from the pro-regulation stance he supported in 2023, which included backing government licensing.
Altman advocated for less regulation and increased energy investment to better compete with China.
This is seen as a clear example of the tech industry’s shift from prioritizing safety principles to adopting a competitive stance.

EU AI Act Coming into Force

The EU AI Act, the world’s first comprehensive AI regulation, began to come into effect gradually between May and August 2025.
Obligations for general-purpose AI models will begin to apply as of August 2, 2025.
This regulation stands in stark contrast to the U.S.’s deregulatory approach and highlights differing strategies in global AI governance.

Safety Research Falls Behind Profit Priorities In Silicon Valley

According to a CNBC report, major AI companies like Meta, Google, and OpenAI have systematically deprioritized safety research.
While focusing on product development and competitive positioning, the importance given to safety teams and research units has declined.
Experts say the industry is driven more by concerns over competition with China than by long-term risks.

Newsletter 15

Newsletter 13

AI Safety Türkiye

Title here

Newsletter 14

ANNOUNCEMENTS 🔊

ARENA

🗓️Register by: June 21st

CAIDP AI Policy Clinic

AI Safety Entrepreneurship Bootcamp

AI Security Bootcamp

TOP PICKS 📑 🎧

Anthropic Develops Custom AI For US Military And Intelligence Services

Recent Frontier Models Are Reward Hacking

NEWS 🗞️

OpenAI’s o3 Model Resisted Shutdown Commands

10-Year Moratorium Debate On State-Level AI Regulations In The U.S.

Trump Administration Revokes Biden’s AI Safety Executive Orders

OpenAI Backtracks From Pro-Regulation Position

EU AI Act Coming into Force

Safety Research Falls Behind Profit Priorities In Silicon Valley

Newsletter 14

ANNOUNCEMENTS 🔊#

🗓️Register by: June 21st#

TOP PICKS 📑 🎧#

NEWS 🗞️#

ANNOUNCEMENTS 🔊

🗓️Register by: June 21st

TOP PICKS 📑 🎧

NEWS 🗞️