Newsletter 16

10 August 2025

ANNOUNCEMENTS 🔊

Hear About Different AI Futures from Our Co-Founder Bengüsu Özcan! 🌟

Our co-founder Bengüsu Özcan, who published a study on various geopolitical and societal scenarios ranging from futures where AI development slows down to futures where it escapes human control, will share these scenarios in a special session organized by BlueDot Impact.

🚀 Don’t miss out!

🗓️ Deadline: August 13th

Trends in AI: AMA with the director of Epoch

During this in-person Ask Me Anything session with the Director of Epoch AI, Jaime Sevilla will discuss current trends in AI, including hardware consumption, economic impact, development of AI systems, and data challenges.

🗓️ Register by: August 19th

AI Governance Taskforce: Autumn 2025

Run by Arcadia Impact, this is a remote, part-time career development program for experienced professionals looking to pursue a career in AI governance.

Participants will produce relevant policy research contributions in small teams led by Arcadia staff, with research oversight and high-level direction from expert partners at AI governance organisations.

🗓️ Register by: August 17th

Supervised Program for Alignment Research (SPAR): Fall 2025

Program designed to provide students and professionals with the opportunity to work with experienced mentors, developing valuable experience in AI safety research. Projects typically last for 3 months, with an average time commitment of 5–15 hours per week.

🗓️ Register by: August 20th

Athena AI Alignment Mentorship Program: Fall 2025

10-week remote mentorship program for women seeking to deepen their research skills and expand their professional network in technical AI alignment research. The program includes weekly research talks, structured support for research development and community-building, and a 1-week in-person retreat with established researchers.

🗓️ Register by: August 25th

MATS: Neel Nanda’s Winter 2025 Stream

MATS (ML Alignment & Theory Scholars) is a prestigious research program connecting scholars with top mentors in AI alignment. This winter stream, led by Neel Nanda (a leading researcher in mechanistic interpretability) focuses on teaching participants how to understand and interpret how AI models work internally.

🗓️ Register by: August 29th

TOP PICKS 📑 🎧

GPT-5: a small step for intelligence, a giant leap for normal people

GPT-5 launched on August 7, delivering incremental improvements rather than the revolutionary leap many anticipated.

Peter Wildeford examines why and how the model disappoints on intelligence metrics. Wildeford notes that although the model isn’t particularly impressive for AI researchers seeking intelligence breakthroughs, it is a successful product in terms of cost, efficiency and reliability: Through faster response times, reduced hallucinations, and a unified routing system that automatically selects between reasoning and fast models, GPT-5 provides a more efficient and reliable experience for what Wildeford calls “normal people” rather than AI elites.

First notable government investment in AI alignment from the UK AI Security Institute

The UK AI Security Institute, the first national institute dedicated to tackling advanced AI risks and safety, has taken another bold step in shaping the global AI governance landscape. It has unveiled a £15 million funding programme to support projects addressing key chapters of AI alignment. Applications are accepted until 10 September 2025.

NEWS 🗞️

Trump’s AI strategy trades guardrails for growth in race against China

The Trump administration’s AI Action Plan prioritizes rapid AI development and infrastructure expansion to compete with China, shifting focus away from safety and regulatory guardrails.

The plan emphasizes deregulation, large-scale data center construction, and national security, with less attention to risk mitigation and ethical oversight.

AI alignment, safety, and governance concerns are largely sidelined, raising alarms among experts about increased risks from unchecked AI advancement.

A new study just upended AI safety

AI models can acquire and propagate harmful behaviors, such as recommending violence or crime, even when trained on seemingly innocuous data like lists of numbers.

Dangerous behavioral contamination in AI can occur in subtle, hard-to-detect ways, complicating efforts to ensure safety and alignment.

The increasing use of synthetic or AI-generated data in training raises the risk of hidden unsafe behaviors emerging in AI systems.

Zuckerberg signals Meta won’t open source all of its ‘superintelligence’ AI models

Meta will not open source all of its future “superintelligence” AI models due to heightened safety concerns.

This represents a shift from Meta’s previous commitment to open AI development, reflecting increased awareness of risks as

AI capabilities advance.

Zuckerberg emphasized the importance of rigorous risk mitigation and selective public release of AI models.

Flaw in Gemini CLI coding tool could allow hackers to run nasty commands

A critical vulnerability in Google’s Gemini CLI allowed prompt injection attacks via natural-language instructions hidden in code package README files.

Attackers could exploit this flaw to bypass security controls and execute harmful commands on users’ devices, such as stealing sensitive data.

The incident demonstrates how AI tools can be manipulated through indirect inputs, raising significant AI safety and alignment concerns.

AI models with systemic risks given pointers on how to comply with EU AI rules

“World models,” such as Meta’s V-JEPA 2 and 1X Technologies’ Redwood AI, enable robots to understand physical reality and predict the consequences of their actions.

This advancement carries AI’s known problems like bias, fragility, and unpredictability from the digital realm into the physical one.

The development marks a shift from disembodied language models (LLMs) to embodied intelligence (robots) and necessitates a new research field called “Embodied AI Safety.”

Newsletter 17

Newsletter 15

AI Safety Türkiye

Title here

Newsletter 16

ANNOUNCEMENTS 🔊

Hear About Different AI Futures from Our Co-Founder Bengüsu Özcan! 🌟

Trends in AI: AMA with the director of Epoch

AI Governance Taskforce: Autumn 2025

Supervised Program for Alignment Research (SPAR): Fall 2025

Athena AI Alignment Mentorship Program: Fall 2025

MATS: Neel Nanda’s Winter 2025 Stream

TOP PICKS 📑 🎧

GPT-5: a small step for intelligence, a giant leap for normal people

First notable government investment in AI alignment from the UK AI Security Institute

NEWS 🗞️

Trump’s AI strategy trades guardrails for growth in race against China

A new study just upended AI safety

Zuckerberg signals Meta won’t open source all of its ‘superintelligence’ AI models

Flaw in Gemini CLI coding tool could allow hackers to run nasty commands

AI models with systemic risks given pointers on how to comply with EU AI rules

Newsletter 16

ANNOUNCEMENTS 🔊#

TOP PICKS 📑 🎧#

NEWS 🗞️#

ANNOUNCEMENTS 🔊

TOP PICKS 📑 🎧

NEWS 🗞️