Our co-founder Bengüsu Özcan, who published a study on various geopolitical and societal scenarios ranging from futures where AI development slows down to futures where it escapes human control, will share these scenarios in a special session organized by BlueDot Impact.
During this in-person Ask Me Anything session with the Director of Epoch AI, Jaime Sevilla will discuss current trends in AI, including hardware consumption, economic impact, development of AI systems, and data challenges.
Run by Arcadia Impact, this is a remote, part-time career development program for experienced professionals looking to pursue a career in AI governance.
Participants will produce relevant policy research contributions in small teams led by Arcadia staff, with research oversight and high-level direction from expert partners at AI governance organisations.
Program designed to provide students and professionals with the opportunity to work with experienced mentors, developing valuable experience in AI safety research. Projects typically last for 3 months, with an average time commitment of 5–15 hours per week.
10-week remote mentorship program for women seeking to deepen their research skills and expand their professional network in technical AI alignment research. The program includes weekly research talks, structured support for research development and community-building, and a 1-week in-person retreat with established researchers.
MATS (ML Alignment & Theory Scholars) is a prestigious research program connecting scholars with top mentors in AI alignment. This winter stream, led by Neel Nanda (a leading researcher in mechanistic interpretability) focuses on teaching participants how to understand and interpret how AI models work internally.
GPT-5 launched on August 7, delivering incremental improvements rather than the revolutionary leap many anticipated.
Peter Wildeford examines why and how the model disappoints on intelligence metrics. Wildeford notes that although the model isn’t particularly impressive for AI researchers seeking intelligence breakthroughs, it is a successful product in terms of cost, efficiency and reliability: Through faster response times, reduced hallucinations, and a unified routing system that automatically selects between reasoning and fast models, GPT-5 provides a more efficient and reliable experience for what Wildeford calls “normal people” rather than AI elites.
The UK AI Security Institute, the first national institute dedicated to tackling advanced AI risks and safety, has taken another bold step in shaping the global AI governance landscape. It has unveiled a £15 million funding programme to support projects addressing key chapters of AI alignment. Applications are accepted until 10 September 2025.
The Trump administration’s AI Action Plan prioritizes rapid AI development and infrastructure expansion to compete with China, shifting focus away from safety and regulatory guardrails.
The plan emphasizes deregulation, large-scale data center construction, and national security, with less attention to risk mitigation and ethical oversight.
AI alignment, safety, and governance concerns are largely sidelined, raising alarms among experts about increased risks from unchecked AI advancement.
AI models can acquire and propagate harmful behaviors, such as recommending violence or crime, even when trained on seemingly innocuous data like lists of numbers.
Dangerous behavioral contamination in AI can occur in subtle, hard-to-detect ways, complicating efforts to ensure safety and alignment.
The increasing use of synthetic or AI-generated data in training raises the risk of hidden unsafe behaviors emerging in AI systems.
A critical vulnerability in Google’s Gemini CLI allowed prompt injection attacks via natural-language instructions hidden in code package README files.
Attackers could exploit this flaw to bypass security controls and execute harmful commands on users’ devices, such as stealing sensitive data.
The incident demonstrates how AI tools can be manipulated through indirect inputs, raising significant AI safety and alignment concerns.
“World models,” such as Meta’s V-JEPA 2 and 1X Technologies’ Redwood AI, enable robots to understand physical reality and predict the consequences of their actions.
This advancement carries AI’s known problems like bias, fragility, and unpredictability from the digital realm into the physical one.
The development marks a shift from disembodied language models (LLMs) to embodied intelligence (robots) and necessitates a new research field called “Embodied AI Safety.”