Python's performance won over ML

Python dominates the machine learning landscape. This is somewhat down to its dedication to performance. I can think of no other language that ranks performance so low in its priorities. But this set Python up well.

Before we got to GPU-driven many-billion parameter models, people were already doing compute intensive numerical work in Python, or at least Python was the veneer over highly optimised FORTRAN or C code that made it far easier to work with.

Python's lack of performance meant that libraries were forced to offload compute, importantly even transferring data between Python and extensions would be slow if done in small pieces. This led to these libraries adopting APIs where you transfer what you need to them early and infrequently, then work with references to that data in Python code. This is fundamentally what a PyTorch tensor or a NumPy array is.

You may perform few high level logical operations on masses of data, for example sorting or summing it, meaning that writing this part in Python has little impact on speed. The number crunching that would take a long time in Python was handled outside of it.

This is a perfect match for interacting with GPUs. Even more so than CPU number crunching, GPUs depend on you providing the data to them to munch on ahead of time. Users of Python libraries were used to this mentality, and library writers were too.