In this course, we'll be implementing GPT from scratch. We'll also gain understanding of fundamental concepts necessary to advance to higher levels, bigger challenges in AI. We'll touch on python, math, statistics, and specific AI algorithms. Let's get going!
AI is mostly rooted in math and statistics. The simplest AI is basically regression algorithm, analysis & prediction.
There are some specific AI algorithms that are worth studying and re-implementing from scratch. Just like any (fullstack) web developer should implement an MVC from scratch, before using one or several industry-standard frameworks, an AI engineer should also implement basic algorithms from scratch, before using industry-standard frameworks such as pytorch.
Some interesting AI algorithms are:
- GPT
- Stable Diffusion
- GAN
How does key, value, query work in a GPT transformer?
“The query key and value concept come from retrieval systems. For example, when you type a query to search for some video on Youtube, the search engine will map your query against a set of keys (video title, description etc.) associated with candidate videos in the database, then present you the best matched videos (values).
A token is a vector (eg a vector embedded in the BeRT space, as will often be the case in this series). While BeRT that we use will have dimensionaliry 768, for simplicity, let's say our embedded space is 3-dimensional and each token is just a 3d vector.
token_dog = np.array([1.0, -0.5, 0.2]) # Token: "dog"
token_plays = np.array([0.4, 0.1, 0.4]) # Token: "plays"
token_fetch = np.array([0.6, -0.8, 0.7]) # Token: "fetch"
Tokens are keys. The query weights matrix is of dimensions ( n_tokens * n_tokens ) . The keys weights matrix is also of domensions ( n_tokens * n_tokens ) . The values weights matrix is also of dimensions ( n_tokens * n_tokens ) .
@TODO to-read:
- https://jaketae.github.io/study/gpt/
- https://guyuena.github.io/PyTorch-study-Tutorials/beginner/transformer_tutorial.html
- https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html?highlight=transformer
- https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Transformers
- https://towardsdatascience.com/build-your-own-transformer-from-scratch-using-pytorch-84c850470dcb
- http://nlp.seas.harvard.edu/annotated-transformer/
- re-visit: digit generation from MNIST with VAEs
Done
- https://towardsdatascience.com/illustrated-guide-to-transformers-step-by-step-explanation-f74876522bc0
- https://stats.stackexchange.com/questions/421935/what-exactly-are-keys-queries-and-values-in-attention-mechanisms
- https://github.com/hkproj/pytorch-transformer
- https://d2l.ai/chapter_attention-mechanisms-and-transformers/attention-scoring-functions.html
- https://github.com/karpathy/minGPT/blob/master/mingpt/model.py