Body

In my search for quick and easy tutorials on the internals of GPT, I have forked two repos from karpathy :

https://github.com/piousbox/minGPT and https://github.com/piousbox/nanoGPT

They are surprisingly useful and clear in their simplicity. While I am looking for a more low-level understanding of the algorithms than "make this API call and receive inference", I am also looking for something sufficiently simple that it actually runs on any of my PCs or laptops. For example, one needs 16Gb of video RAM just to run GPT-J (which will be described in another article).

For now though, I'd like to point out that running a local instance of a GPT, say gpt2, is very straightforward. Running the below file will already generate inference on your local machine.

import torch

from transformers import GPT2LMHeadModel, GPT2Tokenizer

device     = "cuda" if torch.cuda.is_available() else "cpu"
model_type = 'gpt2-medium' # or "gpt2", "gpt2-medium", "gpt2-large", "gpt2-xl" based on your needs
tokenizer  = GPT2Tokenizer.from_pretrained(model_type)
model      = GPT2LMHeadModel.from_pretrained(model_type)
model.to(device)
model.eval();

## Experimenting with different prompts
prompts = [
    "Request: Write an effective sales email letter for a company that offers software consulting and development solutions.\n\n\nResponse: Hello, this is Alex Jones with Wasya Co, a software development solution firm. I am writing to you today to talk about the services that",
    "In one sentence, explain the meaning of life.",
    "What happened in the movie Romeo and Juliet, in terms of the plot?",
]
for prompt in prompts:
    input_ids = tokenizer.encode(prompt, temperature=1.0, return_tensors="pt").to(device)
    output = model.generate(input_ids, max_length=100)[0]
    print(f"Prompt: {prompt}\nOutput: {tokenizer.decode(output,      skip_special_tokens=True)}\n")
Please login or register to post a comment.