π€― AI Breakthrough: LFM2.5 - Game Changer! π
AI
π§ Audio Summaries
π§



Liquid AI recently released LFM2.5-VL-450M, an updated vision-language model designed for compute-constrained environments. The model utilizes a 450M-parameter footprint and incorporates bounding box prediction, improved instruction following, and expanded multilingual understanding. Itβs compatible with hardware like NVIDIA Jetson Orin and the Samsung S25 Ultra. Testing revealed processing speeds of 233ms for 256x256 images on Jetson Orin and 950ms for 256x256 images on the Samsung S25 Ultra. The modelβs performance, as measured by benchmarks including POPE, OCRBench, and MMBench, demonstrated improvements over previous iterations. Ultimately, LFM2.5-VL-450M represents a significant step forward in deploying advanced AI capabilities within resource-limited settings.
LFM2.5-VL-450M: A New Era of Edge Vision-Language Models
Liquid AI has released LFM2.5-VL-450M, a significant update to its vision-language model family, designed for deployment on edge hardware. This new release addresses key limitations of previous models, offering improved performance and capabilities within a compact 450M-parameter footprint.
What are Vision-Language Models (VLMs)?
A vision-language model (VLM) is a sophisticated AI system capable of simultaneously processing both images and text. Users can provide a photo and ask questions about it in natural language, and the model will generate a relevant response. Traditional VLMs require substantial GPU memory and cloud infrastructure, making them unsuitable for resource-constrained environments like warehouse robots or retail shelf cameras. LFM2.5-VL-450M represents Liquid AIβs solution to this challenge, enabling VLM functionality on edge devices.
Model Architecture and Technical Specifications
The foundation of LFM2.5-VL-450M is the LFM2.5-350M language model, coupled with the SigLIP2 NaFlex shape-optimized 86M vision encoder. The model boasts a 32,768-token context window and a 65,536-vocabulary size. Notably, LFM2.5-VL-450M supports native resolution processing up to 512x512 pixels without upscaling, preserving non-standard aspect ratios and employing a tiling strategy to handle large images efficiently. This tiling approach splits large images into 512x512 patches, incorporating thumbnail encoding to maintain global scene context, a critical feature often missing in simpler tiling methods. Users can dynamically adjust the maximum image tokens and tile count during inference, optimizing for speed and quality based on the available compute budget.
Training and Scaling of LFM2.5-VL-450M
Liquid AI significantly enhanced LFM2.5-VL-450M through a substantial scaling of the pre-training phase. The model was trained on 28 terabytes of tokens, a significant increase from the 10TB used for the previous LFM2-VL-450M. Post-training, the model leveraged preference optimization and reinforcement learning to refine its grounding, instruction following, and overall reliability across a range of vision-language tasks. This rigorous training process resulted in substantial improvements across key benchmarks.
Key Enhancements and Benchmark Results
The most notable addition to LFM2.5-VL-450M is bounding box prediction, demonstrated by a score of 81.28 on the RefCOCO-M benchmark, a significant leap from zero on the previous model. This capability enables the model to output structured JSON data with normalized coordinates identifying objects within a scene β moving beyond simple image captioning to provide spatial information. Multilingual support was also substantially improved, with MMMB scores rising to 68.09 across Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. Furthermore, instruction following improved, with MM-IFEval scores increasing to 45.00, indicating a more reliable adherence to constraints within prompts. Function calling support, measured by BFCLv4 at 21.08, was added, allowing the model to invoke external tools in agentic pipelines. Across a suite of benchmarks, LFM2.5-VL-450M consistently outperformed previous models. Notable scores include 86.93 on POPE, 684 on OCRBench, 60.91 on MMBench (dev en), and 58.43 on RealWorldQA. Two benchmarks demonstrated particularly significant gains: MMVet, which tests open-ended visual understanding, improved from 33.85 to 41.10, and CountBench, which evaluates object counting, increased from 47.64 to 73.31. InfoVQA held roughly flat at 43.02 versus 44.56 on the prior model. Language-only benchmarks also saw improvements, with IEval increasing from 51.75 to 61.16 and Multi-IF from 26.21 to 34.63. Despite these gains, LFM2.5-VL-450M does not outperform across all tasks; the MMMU (val) score dropped slightly from 34.44 to 32.67, indicating that the model isn't ideal for knowledge-intensive tasks or fine-grained OCR.
Performance Across Diverse Hardware Platforms
LFM2.5-VL-450Mβs design prioritizes efficient operation on edge hardware. The model utilizes Q4_0 quantization, enabling it to run seamlessly across a range of devices, including Jetson Orin, AMD Ryzen AI Max+ 395, and the Samsung S25 Ultra. Latency measurements reveal the modelβs performance: On Jetson Orin, image processing takes 233ms for 256x256 and 242ms for 512x512 pixels, maintaining a processing speed of under 250ms β sufficient for real-time video streams. On the Samsung S25 Ultra, latency is 950ms for 256x256 and 2.4 seconds for 512x512, still responsive for interactive applications. On AMD Ryzen AI Max+ 395, processing times are 637ms for 256x256 and 944ms for 512x512, under one second for the smaller resolution.
Applications and Use Cases
LFM2.5-VL-450M is particularly well-suited for real-world deployments where low latency, compact structured outputs, and efficient semantic reasoning are crucial. This includes industrial automation, passenger vehicles, agricultural machinery, and warehouses, where compute-constrained environments often limit perception models to bounding-box outputs. LFM2.5-VL-450M goes further, providing grounded scene understanding in a single pass, enabling richer outputs for settings like warehouse aisles, including worker actions, forklift movement, and inventory flow, while still fitting existing edge hardware like a Jetson Orin. For wearables and always-on monitoring, devices such as smart glasses, body-worn assistants, dashcams, and security or industrial monitors can benefit from an efficient VLM that produces com...[truncated due to length]
Our editorial team uses AI tools to aggregate and synthesize global reporting. Data is cross-referenced with public records as of April 2026.