AI Safety Careers #1

· Author: Berke Çelik and Sayhan Yalvaçer · Blog

AI Safety Careers #1

The Pace of AI Progress

The sentence “AI is developing very fast” is now almost a mundane observation. We skim our articles with ChatGPT, generate images, write code, and have our emails summarized or written using AI tools. AI is now a part of our daily lives. But do we really grasp how fast it is progressing?

GPT-3, released in 2020, could form impressive sentences and even write short, simple stories. However, the same model usually solved very simple math problems incorrectly and struggled to follow anything but the simplest instructions. So, even though it could produce some outputs, it was a model with extremely limited capacity.

Today, AI systems pass undergraduate-level biology exams, score higher than ninety percent of applicants on the bar exam, write professional-level code, and generate coherent answers to complex research questions along with their sources. The technology that stumbled on simple math questions five years ago now rivals the average human in many cognitive tasks.

In this article, we examine the progress in AI, its history, and the trends behind it. While progress in AI could bring enormous benefits, it also brings serious risks. These risks, as we advocate at AI Safety Türkiye, could reach an existential scale – that is, the possibility that significant progress in artificial general intelligence could lead to irreversible global catastrophes. But to understand and explain these risks and potential catastrophes, it is extremely important to understand how AI systems work.

In this article, we examine where AI came from, what fundamental ideas made today’s models possible, and the critical turning points in this process. Although today’s large language models seem new, AI is not a new concept. It is the product of decades of accumulation, different paradigms, and periods full of disappointments.

Alan Turing, Cahit Arf, and Thinking Machines

The question of whether machines can think is much older than the term artificial intelligence. Leibniz dreamed of a universal calculator in the 1600s: a mechanism that would reduce all kinds of reasoning to mechanical operations. In the 1830s, Charles Babbage designed a machine called the Analytical Engine capable of performing logical operations (but could not complete its construction). This machine, which could (partially) be called a computer, had a very simple memory and the capacity to perform operations based on simple arithmetic, albeit very limited. In the 1840s, Ada Lovelace suggested that this machine could be programmed to perform a wide variety of logical operations and complex mathematical operations, beyond just being a complex calculator. However, Lovelace also held that machines could only solve things when given commands, and would not set goals of their own.

If we skip forward about a century and shift directly to the history of artificial intelligence rather than computers, we can focus on Alan Turing’s ideas. In his article “Computing Machinery and Intelligence” published in the journal Mind in 1950, Alan Turing asked the fundamental question of a debate that continues to this day: “Can machines think?” Because Turing was aware of the difficulty of defining “thinking,” he proposed a simple test, which we call the Turing Test today, to answer this question: a human talks to two “interlocutors” via written correspondence. One is human, one is a machine. If the human talking cannot reliably distinguish which one is the machine, we can accept that the machine “thinks.”

In this article, Turing addressed nine possible objections one by one: theological arguments, the issue of consciousness, more theoretical criticisms (e.g., Gödel’s incompleteness theorems), and Lady Lovelace’s objection that “machines only do what they are told.” He refuted each one or showed their limits. But the most striking part is at the end of the article: Turing discussed how machines could “learn.” Instead of programming the rules one by one, he suggested providing experiences to the machine, like educating a child. Through punishment and reward, by trial and error. In 1950, when neural networks were just taking their first steps, Turing outlined the fundamental idea of today’s machine learning.

Almost simultaneously with Turing, another development occurred. Warren McCulloch and Walter Pitts published their article titled “A Logical Calculus of the Ideas Immanent in Nervous Activity” in 1943. Inspired by the human brain, they mathematically modeled the “fire or don’t fire” behavior of a single neuron, and showed that these simple units (artificial neurons) could perform logical operations by connecting to each other.

These two developments – Turing’s idea of universal computability and McCulloch-Pitts’ demonstration that neurons can perform logical operations – are not just historical facts but important conceptual foundations for today’s AI developments and debates.

1956-1980s: Rule-Based Systems and Initial Expectations

Although the history of AI goes way back, the formalization of AI as a discipline dates back to the Dartmouth Workshop in 1956. Figures like John McCarthy, Marvin Minsky, and Claude Shannon came together, and the term “artificial intelligence” was used for the first time at this conference. The basic assumption was this: intelligence is a “thing” that can be codified. Every aspect of human cognitive ability, or any other feature of intelligence, can be defined so precisely that a machine can be made to simulate it. The dominant approach for decades was built on this assumption: rule-based systems.

The concrete form of this approach is called expert systems. The logic goes like this: sit down with the experts of a field, put their decision processes into rules, and then load these rules into the machine. Example: you want to classify thousands of documents as finance and production. The system works like this:

If “balance sheet” appears in the document -> Finance

If “assembly line” appears in the document -> Production

Or in chess: if the opponent plays a certain opening (x) -> you play the response (y) that constitutes a strong move against it.

Although this approach led to various advancements, it has certain bottlenecks and limitations. For example, when “earnings report” is written instead of “balance sheet,” the system might not be able to perceive that the document is a finance document.

Cahit Arf: Three Boundaries from Erzurum

These debates were not limited to America. Nine years after Turing asked the same question, Cahit Arf gave a public lecture in Erzurum during the first academic year of Atatürk University in 1959: “Can a Machine Think and How Can It Think?”. In this speech, Arf pointed out three points that many people would continue to rack their brains over in the following decades:

First: machines can solve problems designed for them. It is possible to build machines that can solve very complex problems, but it is extremely difficult to build machines that can adapt. In his own words: “While the human brain perfects itself on its own initiative, the machine remains as it was built.” A machine only does as much as it is designed to do; it pauses when faced with a new problem. That is, the power of current machines is only as much as the scope of the rules and functions we define for it.

Second: it is theoretically possible to design a self-improving machine. In today’s language, a system with self-improvement capacity can be built. That is, theoretically, we can develop machines that can improve themselves and have capacities beyond the rules we define.

Third, and most strikingly: the real difference between human and machine is in “aesthetic judgment,” as Arf puts it. He meant all kinds of judgments that cannot be expressed with strict rules and contain uncertainty: finding a piece of music beautiful or not, choosing not to do a given task, a situation “not feeling right.” Their common feature is that they do not follow infallible rules. Arf, noting that machines lack this, believes the gap might close one day – but adds that it might only close “after many years, or perhaps never.”

In 1959, a mathematician in Erzurum drew three boundaries: lack of adaptation, potential for self-improvement, and the inaccessibility of aesthetic judgment. The next forty years of AI research were largely spent trying to overcome the first boundary, the lack of adaptation.

The Real Problem of Rule-Based Systems

Let’s expand a bit on the lack of adaptation Arf mentioned. The fragility of rule-based systems had problems beyond just “we can’t write enough rules.”

Most people learn to ride a bike by trial and error, internalizing through experience the rules of balance that make riding possible. You “know” that a document is a finance document, but if asked how you know this, you cannot give a clear list of rules. You feel whether a text is well-written or not, you notice that a facial expression is fake. We are successful in many areas and actions where we couldn’t write the rules if someone said “write the rules.” This was the real problem with rule-based systems. The required knowledge was not already available in rule form. This situation kept AI in a certain place for decades. Chess-playing programs were written, expert systems that made medical diagnoses were developed, software that optimized industrial processes was produced. They were all impressive, creating real value in specific areas. In areas where the rules were clear, repeatable, and had defined boundaries, these systems worked quite well. But they were all in the same mold: a human sits down and writes the rules one by one, and the system applies these rules. Every new field, every new problem meant writing rules from scratch. The system was not learning or did not have the ability to adapt. As Arf said, it remained as it was built.

Neural Networks and Deep Learning

A large part of what humans know cannot be put into rules, or humans do not learn many of the things they learn through rules. So, how can humans “learn” things in various fields?

Let’s consider a child learning to distinguish cats. No one gives them a list of rules saying “ears are pointy, nose is small, has whiskers”. They see hundreds of cats, hear their mother say “look, a cat!”, and are corrected when they call a dog a “cat”. After a certain point, they recognize the cat. There are no rules, but the knowledge is there. You didn’t learn that you shouldn’t touch a stove or fire as a child by reading a regulation either. You touched it, you got burned, you didn’t touch it again. Experience, feedback, adjustment. Similarly, people learn their native languages through experience rather than grammar books.

Looking at it from a more mechanical point of view, the human brain consists of approximately 86 billion neurons and trillions of connections. When you learn something, what changes is the strength of these connections: some get stronger, some get weaker. The brain does not keep a list of rules. To put it very simply, the distribution of strengths across these connections itself constitutes the knowledge. What allows you to recognize a cat is not a “cat rule” written in your brain, but the imprint of thousands of cat experiences on these connections. (In reality, synaptic transmission and the human brain are much more complex than this, but the main idea here is important: knowledge is hidden in connection patterns, not in explicit rules.)

Some researchers had been starting from exactly this point since the 1940s. What if we designed machines like this too? Instead of us writing the rules, what if we showed the system examples and waited for it to find the structure itself? McCulloch and Pitts’ work in 1943 laid the mathematical foundation for this idea. Based on this idea, Frank Rosenblatt developed the Perceptron in 1958: the first artificial neural network capable of recognizing simple visual patterns. But for decades, this approach remained in the background. Computational power was insufficient, the data needed to train it was lacking, and rule-based systems worked relatively well in certain areas (areas where rules could be clearly defined!).

In the late 2000s and early 2010s, three things changed. First, computational power exploded thanks to GPUs (graphics processing units). GPUs were originally designed for video games, but they perfectly suited the type of processing neural networks needed, doing thousands of simple calculations simultaneously. A CPU does a single complex operation very fast; a GPU does thousands of simple operations in parallel. Neural network training is exactly this kind of job. Second, data became abundant thanks to the internet. Third, a few critical algorithmic developments made it possible to train neural networks much more effectively.

Okay, But How Exactly Do Artificial Neural Networks Work? How Can a Machine Learn Things From Examples?

To give a highly simplified example:

You show the system a document and say “this is finance.” Then you show another one, “this is production.” You repeat this with tens of thousands of documents.

There are millions of numerical values inside the system. These are called parameters. Each parameter represents the weight of the system at a decision point. Initially, they are all set randomly, meaning the system knows nothing.

You show the first document, the system makes a guess. It’s most likely wrong. But you had also given the system the correct answer: “this document is finance.” The system compares its own guess with this correct answer and calculates the difference. This is the error signal: the distance between the model’s guess and the actual answer. The greater this distance, the more the model was wrong. Using this error signal, the system adjusts each of the parameters by a very small amount in a direction that will slightly reduce the error next time. Then another document. Guess again, compare again, adjust again. You repeat this millions of times.

A system that initially has random values starts to harbor rules within itself that no one wrote, after millions of error corrections.

An analogy to make this concrete: you are blindfolded in a rugged terrain and trying to find the lowest valley. You don’t have a map, you can’t see around. All you can do is feel the slope under your feet. With each step, you take a small step in the direction where the ground goes down most steeply. You don’t know where you’ve arrived, but you go a little further down with each step. Thousands of steps later, you are in the valley. The name of this process is gradient descent: adjusting the parameters step by step in a direction that will reduce the error. Just as you can reach a certain point by making better choices without designing a route beforehand, artificial neural networks similarly arrive – through trial and error – at a configuration that no one defined. And this configuration harbors rules within it that no one explicitly wrote.

Layers

The first neural networks did this process in one go: take the raw input, produce an answer directly. Like trying to go directly from the raw text of a document to the result of “finance” or “production”. It worked, but it was limited. The patterns you can catch in a single step can only be so complex.

Deep learning added layers to this process. That is, it became possible to design more complex, more layered systems beyond simple document categorization. Each layer began to learn a more abstract representation built upon the previous one.

Computer vision is a good field for making this concrete: when you show the system hundreds of thousands of face photos, the first layer catches the simplest patterns in the raw pixels: edges, contrasts, light-dark transitions. The second layer combines these simple patterns to form more complex shapes: an eye shape, a nose contour, a mouth curve. The third layer brings these shapes together and starts to recognize faces, expressions, identities.

No one programmed this. No one said “first find the edges, then look for eyes”. The layers learned this spontaneously during the training process. Imagine a model that can distinguish cats from dogs: the people who design this model do not write a set of rules describing how cats can be distinguished from dogs. The model is shown millions of “this is a cat” and “this is a dog” examples, and the model creates the representations that provide this distinction within its own internal structure. What emerges is a knowledge structure that no human has explicitly formulated but that works. And this is what makes deep learning so powerful: the same technique works on images, audio, text, and even protein structures. Because no matter how different the fields are, layers of abstraction organize themselves everywhere.

The Language Problem and the Transformer Architecture

But deep learning had a serious limit: language.

In images, the structure of the data is local: the pixel next to a cat’s ear is most likely also a part of the ear. That’s why image recognition models can catch patterns by scanning the picture with a small window – first edges, then shapes, then objects. There is no such luxury in language. The word that determines the meaning of a sentence might be ten words prior. Everything depends on order, context, and distant relationships.

“The trophy didn’t fit into the suitcase because it was too big.” What does “it” refer to in this sentence? The trophy. You solve this instantly because you hold the entire sentence in your mind at the same time. The models of that era had difficulty doing this. They processed the text from left to right, word by word. As they processed each new word, the effect of the previous words weakened step by step – just like a whispered message getting a little more distorted with each pass. Architectures like LSTM developed in later years slowed down this loss, but the problem persisted in long texts. There was a fundamental mismatch between the architecture and the nature of language.

Moreover, the learning task in language is also different. In visual recognition, you give the system examples labeled “this is a cat, this is a dog”. In language models, the approach is different: you give the model a massive amount of text and give it a single task: predict the next piece.

But the model doesn’t read the text word by word as we see it. Just as an image consists of pixels, text consists of pieces called “tokens”. These are the model’s alphabet: it can represent any text with a limited number of pieces. A token is sometimes a whole word, sometimes a syllable, sometimes a suffix. The model builds the sentence by predicting these tokens one by one. When it sees “The capital of Turkey is…”, the next token is “Ankara.” But when it sees “The students are wait…”, the next token is “ing” – not the whole word, just the continuation. The logic is always the same: predict the next one.

But there was a major obstacle to this “predict the next one” task. Remember: the models of that era processed text from left to right, token by token, and the effect of previous tokens weakened step by step. There is no problem in a short expression like “The capital of Turkey” – but when it had to make a prediction based on information from a few paragraphs ago, the model had already lost that context.

In 2017, a team from Google proposed a radical solution to this problem. They introduced the Transformer architecture with the paper titled “ Attention Is All You Need”. The basic idea: instead of processing the text sequentially, enable the model to look at the entire text at the same time. In the sentence “The trophy didn’t fit into the suitcase because it was too big”, when the model looks at the word “it”, it simultaneously looks at all the other words in the sentence and can grasp that “it” points to the “trophy”. This is called the attention mechanism.

It was now possible to process much longer texts with a much richer context. A huge ceiling was lifted. And a question naturally came to mind: what happens if we train this architecture on the text of the entire internet, on a massive scale?

Answer: Models emerge that do many things very well, from history to health, from video production to music production.

The Transformer architecture (and some other developments that took place in subsequent years) made it possible for models like Claude Opus 4.6, which we received help from while writing this article, to emerge. Systems that pass the bar exam, can produce music, can make diagnoses, write code at a professional level, and produce coherent answers to complex research questions.

Can these models perform the things we mentioned flawlessly, in a way that could completely eliminate the need for humans? No. Still, we think it is hardly possible to deny that these systems are impressive. These models can currently detect 12 different cancers from 10 drops of blood with 99% accuracy. 65% of software developers write code with AI tools every week. It’s reported that close to 100% of OpenAI’s code is now written by AI.

But the same models, in different hands, do different things. In late 2025, a hacker manipulated Anthropic’s Claude model to breach 10 Mexican government institutions, stealing the tax records, voter information, and census data of millions of people.

We will discuss how models with these capabilities work in more detail in the second article.

But let’s take a step back: What is this model doing when it reads a patient’s symptoms and says “probably pneumonia,” or when you ask for the capital of Angola? It is predicting the next token. When it spots a loophole in a contract? It is predicting the next token. When it rewrites a paragraph of this article and improves the flow? It is predicting the next token. When you ask for the population of Angola’s capital or ask a follow-up question like, “Wait, there’s also this finding, maybe it’s not pneumonia,” it is predicting tokens in the exact same way.

At this point, many people ask the following question: do these systems really understand anything, or are they giant prediction machines that have statistically digested all the text produced by humanity? Some critics think it’s the latter and call these systems “stochastic parrots”: mechanisms with no capacity for understanding, simply spitting out the most likely next token.

Maybe they are right. Maybe these systems have learned not how the world works, but how it is described. But perhaps “just predicting” is a misleading framing. Because to predict the next token really well, at some point you have to grasp grammar, context, logic, and world knowledge. Inside the mechanism we call “just predicting,” something nobody programmed is starting to emerge. The question is: is that “thing” genuine understanding, or a very convincing imitation of it?

If you are curious about the answer to this question, you can subscribe to our newsletter and stay tuned for our upcoming articles!