What is AI?
AI, or Artificial Intelligence, refers to the simulation of human intelligence in machines that are programmed to think and learn.
It encompasses a wide range of technologies and techniques that enable computers to perform tasks that typically require human intelligence, such as understanding natural language, recognizing patterns, making decisions, and solving problems.
Types of AI
There are different types of AI, including:
- Machine Learning (ML): A subset of AI that focuses on the development of algorithms that allow computers to learn from and make predictions based on data.
- Natural Language Processing (NLP): A branch of AI that enables computers to understand and process human language.
- Computer Vision: A field of AI that enables computers to interpret and understand visual information from the world.
- Reinforcement Learning: A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize some notion of cumulative reward.
- Generative AI: A type of AI that can generate new content, such as text, images, or code, based on the patterns it has learned from existing data.
- Deep Learning: A subset of machine learning that uses neural networks with many layers to model complex patterns in data.
Note
Generative AI
Overview
Generate AI are models generating new content based on the patterns they have learned from existing data. They can generate text, images, code, and more. They are trained on large datasets and can understand the context of the input to generate coherent and relevant responses.
Examples of Generative AI models:
- GPT (Generative Pre-trained Transformer): A language model developed by OpenAI that can generate human-like text based on the input it receives.
- Stable Diffusion: A model that can generate high-quality images based on textual descriptions.
How do Transformers work?
- Transformers are a type of neural network architecture that has revolutionized the field of natural language processing (NLP) and generative AI.
- They are designed to handle sequential data, such as text, and can capture long-range dependencies in the data.
- They use a mechanism called “self-attention” to weigh the importance of different parts of the input when generating output
Links:
- Visual Explaination of Transformers
- Attention is All You Need (original paper)
- The Illustrated Transformer (blog post)
What is a GPT?
GPT stands for Generative Pre-trained Transformer. Let’s break that down:
| Word | Meaning |
|---|---|
| Generative | It generates new text — it doesn’t just look things up, it creates responses word by word. |
| Pre-trained | Before you ever use it, the model has already been trained on massive amounts of text from the internet (books, articles, code, websites, etc.). |
| Transformer | The underlying architecture (see above) that allows the model to understand context and relationships between words. |
How does a GPT generate text?
At its core, a GPT works by predicting the next word (technically the next token) over and over again.
Example:
You type: "The capital of France is"
The model looks at all those words and predicts the most likely next word: "Paris".
Then it takes "The capital of France is Paris" and predicts the next token, maybe "." — and so on, until it has generated a full response.
Tip
What does a GPT know?
A GPT does not have a database it searches through. Instead, all the knowledge is baked into its model weights — billions of numbers that were adjusted during training.
This means:
- It can recall facts it saw during training, but it can also get things wrong (called hallucinations).
- It has a knowledge cutoff — it doesn’t know about events that happened after its training data was collected.
- It doesn’t truly understand things the way humans do — it is very good at recognizing and reproducing patterns in language.
What makes GPTs useful for coding?
GPTs have been trained on vast amounts of source code, documentation, and programming discussions. This makes them surprisingly good at:
- Writing code from natural language descriptions
- Explaining code in plain language
- Finding bugs and suggesting fixes
- Translating between programming languages
- Generating tests, documentation, and boilerplate code
Note