Why don’t others explain GPT-2 or Transformers like this? Why all the hard blackbox algerbra?
The world record compression of enwik8 (100MB, mostly text) is 14.8MB. Compression is better than Perplexity for Evaluation. GPT-2, by looking at openAI’s own benchmark, [could] compress enwik8 to about maybe 12MB. My compressor explained in the video is fully understood and gets a score of 21.8MB. It has already been shown in algorithms in the Hutter Prize that grouping words mom/father help compression a Great amount and so does boosting recently activated letters or better yet words if they haven’t tried so (similar words are likely to occur again soon), which I have yet to add to my letter predictor. So I well understand how to get a score of about 16MB at least. I think that puts a lot of point into my points.
Another thing my current code doesn’t do yet is robust exact matching for typos, and time delay similarity of positions, so “hor5e the” should partially activate “that zebra” because horse = zebra some amount and has similar position in order, convolutionarily heard up to higher layers (by using time delay “windows”).
Below I present 4 really interesting things that fit together:
Predicting the next word/letter accurately (or semantically do Translation) let’s you navigate the cause=effect maze of possibilities. Try GPT-2 to see https://talktotransformer.com/
We can see in Facebook’s Blender and PPLM below in the links, they use Reward Dialog Goals to “steer” the prediction words to a desired path in the “maze” of thoughts/ possibilities to make the future it wants. I’ve said this before seeing their work. It knows how to therefore kind-of correctly explain a way (the How)) towards its goals (Where in the maze). It can take an unseen question (goal, starting condition) and will be forced to answer it while makes sure certain things are seen like no one harmed for example (end/ intermediate conditions).
Now, to answer hard questions/ problems like cancer or immortality, it isn’t straight-forward, it will need to update/ specialize to new domains of data and make new sub-goals that help it get to the root answers it wants. Like me I wanted immortality and I took up AI, then I took up a lot of other things like models, induction, etc. Like robots that learn how to walk using RL, you keep updating the source of data to look into, ex. some particular motor action - or website or question. It tweaks its way to some specialized data domain. We’re going to give AGI just a few root goal rewards for food and immortality feature nodes, and let it update to new goals/questions and recursively seek new data from those.
To do that; make new sub-goals, it needs to know what is a related goal. As AGI predicts, it must therefore recognize semantic features or “exit cutoffs” that are semantically related goals or “answers” and therefore leak neurotransmitter reward to them, because it needs more, specific, data, and repeats until solves root goal (sees satisfactory answer that align with large data. This infection of other related nodes causes them to be Permanently Active Nodes now, where it has a “thing” for AI now and talks a lot about all things AI.
So in summary, it predicts by frequency, relaventness, recency (temp energy), reward. And as it predicts, it infects (with reward transmitter) related nodes to update to new questions/domains so it can specialize deeper in there to collect data from.