After a packed week for the Verita AI team at ICML 2025, one thing is clear: we’re entering a new era of AI development. From advances in multimodal models, the field is moving fast and not just in model performance, but in how we train, evaluate, and deploy AI systems. As the conference unfolded, recurring themes began to crystallize.

Multimodal AI Is the Next Frontier for Foundation Model Labs

Single-modality models are no longer sufficient. Leading research labs are rapidly advancing multimodal systems that combine language, vision, audio, and sensor inputs to interpret the world more holistically. Capturing accents, emotions, and natural conversations is critical for audio. These systems will require new architectures and massive, high-quality data that reflect real-world complexity.

The GPT-1 Era of Physical AI

Vision-language models are laying the foundation for physical AI, just as GPT-1 did for language. But this shift demands a new kind of training data: not just labeled images, but timestamped, multi-sensor sequences that capture physical cause-and-effect. In robotics, this also means rich supervision such as reward signals, demonstrations, and real-world coaching that go far beyond simple image-text pairs.

Human-Labeled Seeds + Synthetic Scaling Is the New Paradigm

We’re seeing a clear trend: start with high-quality human-labeled data, then scale with synthetic augmentation. Self-training, model-generated labels, and data synthesis are accelerating training while reducing reliance on costly manual annotation. But synthetic data is only as good as the seeds it grows from which makes expert labeling more important than ever.

Safety Evaluations Are Still Lagging

Despite growing model capabilities, safety often remains an afterthought. In-house red teams are becoming essential, not optional. These teams test models for edge cases, misuse risks, and hidden failure modes that standard benchmarks miss. As foundation models become more powerful and more widely deployed, rigorous testing especially in subjective, high-stakes environments will be critical.

The Future of Pretraining Lies Behind Paywalls

The highest-quality training data is increasingly private, locked behind paywalls, inside exclusive communities, or tucked away in proprietary enterprise systems. This is driving a shift in pretraining strategies toward more curated, high-signal data sources.

Diffusion LLMs Are the Next Architecture Wave

Diffusion models reshaped image generation and now they’re coming for language and multimodal tasks. Early results in long-form planning and complex decision-making are promising. This architectural shift will demand new kinds of evaluation and training feedback, especially as outputs become more compositional and open-ended.

Foundation Models Are Accelerating Drug Discovery

In biotech, foundation models are transforming how researchers identify and test new molecules. These models can predict properties, refine compounds, and prioritize candidates far faster than traditional methods. It’s a seismic shift that compresses timelines from years to months.

Scaling Laws Extend to World Models

World models are the next frontier: they simulate environments, anticipate interactions, and enable planning. Applying scaling laws to these models shows promise but the challenge is not just scale. It’s how to train effectively from messy, real-world data. Supporting these models will require diverse, structured data from physical interactions and environments.

Inference, Not Training, Is the Real-World Bottleneck

State of the art models such as OpenAI’s o3‑pro and xAI’s Grok‑Heavy already exist, yet their high latency and cost keep everyday users from experiencing “instant” responses. Until we can serve these large models with sub‑second inference at an affordable price, mainstream adoption will lag behind their potential. In short, accelerating and cheapening inference, rather than building ever‑bigger networks is the next critical hurdle for mass‑market AI. This will open up emerging markets around the globe if they can have readily access to the top models.

ICML 2025 confirmed what we already believed at Verita AI: the future of AI won’t be won by scale alone. It will be shaped by how models understand the real world, adapt to human context, and earn our trust in high-impact domains. That future demands better data, more nuanced evaluations, and structured human input at every step.

ICML 2025

Takeaways

ICML 2025

Takeaways

Verita