IT Home reported on June 7 that the wall-facing intelligent released the end-side model MiniCPM 4.0 on the evening of the 6th. The company said that the new model can achieve up to 220 times speed up in extreme scenarios through its self-developed CPM.cu inference framework, and supports deployment in vLLM, SGLang, LlamaFactory and other frameworks.
The 8B Lightning Sparse Edition released this time uses an innovative sparse architecture to set off an efficient storm; the other 0.5B It is called "the strongest small cannon that is light and flexible."
According to the official introduction, the MiniCPM 4.0 series LLM model launched on the wall this time has two parameter scales: 8B and 0.5B. In response to the technical difficulties of a single architecture that are difficult to take into account different scenarios of long and short texts, MiniCPM 4.0-8B adopts the "efficient dual-frequency shift" mechanism, which can automatically switch attention mode according to task characteristics: when dealing with difficult long text and deep thinking tasks, sparse attention is enabled to reduce the computational complexity, and switch to dense attention in short text scenarios to ensure accuracy, achieving efficient response to long and short text switching.
According to IT Home, MiniCPM 4.0 can be found in Deploy open source frameworks such as vLLM, SGLang, LlamaFactory, XTuner. Its built-in self-developed CPM.cu is an extremely fast end-side reasoning framework that brings 90% of the model slimming and speed improvement from the aspects of speculative sampling innovation, model compression and quantitative innovation, and end-side deployment framework innovation. The official claims that it will achieve the smoothness of end-side reasoning "from innate to life".