In the race for AI supremacy, the competition between processor developers is fierce. Google asserts that its latest Tensor chips, also known as Tensor Processing Units (TPUs), are the fastest, most energy-efficient, and most optimized processors in the market, outperforming NVIDIA's widely-used A100 chips.
NVIDIA's A100 processors are currently the go-to chips for AI and Machine Learning workloads among major software studios. However, NVIDIA's H100 is even more advanced than the A100.
Google's TPUs have recently made significant advancements. The tech giant recently disclosed new information about the supercomputers it uses for AI model training, claiming that its systems are both faster and more power-efficient than NVIDIA's similar offerings.
Google developed its proprietary processor, the TPU, which is now in its fourth generation. The company utilizes TPUs for over 90% of its AI training, which involves inputting data into models to make them functional.
Google recently published an article detailing how it connected over 4,000 TPUs into a supercomputer using its custom-developed optical switches to facilitate connections between individual machines.
As large language models like Google's Bard or OpenAI's ChatGPT have grown in size, enhancing connections has become a critical competitive factor for AI supercomputer creators. These models must be distributed across thousands of processors, which then collaborate for weeks or months to train the model. Google's PaLM model, the company's largest publicly disclosed language model, was trained over 50 days using two 4,000-chip supercomputers.
Google's supercomputers allow for easy reconfiguration of connections between processors, avoiding issues and optimizing performance. In a blog post about the system, Google Fellow Norm Jouppi and Google Distinguished Engineer David Patterson wrote, "Circuit switching makes it easy to route around failed components... because of this flexibility, we can even change the topology of the supercomputer interconnect to accelerate the performance of an ML (machine learning) model."
Google has been operating its supercomputer since 2020 in a data center in Mayes County, Oklahoma. The company claims that its chips are up to 1.7 times faster and 1.9 times more power-efficient than a system based on NVIDIA's A100 chip. Google did not compare its fourth-generation processor to NVIDIA's H100 chip, as the H100 was released after Google's chip using newer technology.
Google hinted at a new TPU in development to compete with the NVIDIA H100 but provided no further details, only stating they have a healthy pipeline of chips in the works.