With all the talk & articles about GPT, a few weeks ago I decided to spend a couple days coding one from scratch. The training size can’t be the same (for $$$ reasons), but the model can follow a very similar architecture to the one used by GPT-3. As long as we narrow the training data, …