DearDiary 1.0 Help

LLM on SSD

I simply have some kind of general feeling that a LLM may not absolutely need a large amount of fast memory to run.

There is a paper about it named "LLM in Flash". I need to read it first.

To be honest, I expected more than "up to twice the size of the available DRAM". What about 10x ? 100x ? What the point of using a LLM if you can't use a large one ?

The 4-5x and 20-25x increase in inference speed is interesting though. But not the point.

The cost of running a LLM

LLM can't be forever limited by memory and can't always be on large, expensive, cloud servers. The public don't understand the insane ecological and economical cost of running a LLM. It really is a shame to use a LLM for simple requests like "what is the weather today ?" or "what is the capital of France ?".

And yes, I'm aware of how ironic it is to use a LLM to write about the ecological cost of using a LLM. I'm not sure if it's funny or sad.

The future

I don't understand the paper so that's it for now until something else come up. The paper is published by Apple, if there is value in this research it will appear on some Apple product in the future.

Last modified: 23 January 2024