<p>Advances in machine learning have led to the development of foundation models for atomistic materials chemistry, enabling quantum-accurate descriptions of interatomic forces across chemically diverse compounds at reduced computational cost. Hitherto, the accuracy and utility of these models have been assessed relying on descriptors based on formation energies or idealized harmonic atomic vibrations. Yet, the rigorous and physically interpretable quantification of their capability to describe both realistic anharmonic atomic dynamics and technologically relevant observables remains a pressing problem. Here, we address this problem, leveraging the Wigner formulation of heat transport and the Grüneisen approach to thermal expansion to connect the atomic-physics awareness of foundation models to their utility in predicting experimentally observable thermomechanical properties, presenting standards and fine-tuning protocols needed to achieve first-principles accuracy. We apply our framework to a database of 103 solids with diverse compositions and structures, demonstrating that it overcomes the major bottlenecks of current methods for designing heat-management materials — high cost, limited transferability, or lack of physics awareness — and its potential to discover materials for next-gen technologies ranging from thermal insulation to neuromorphic computing.</p>