ADVERTISEMENTs

Researchers develop solutions to make AI efficient and accessible

The team's research addresses the substantial computational power and energy consumption required by modern AI models, which often render them expensive, environmentally taxing, and inaccessible to smaller organizations.

Anshumali Shrivastava / (Photo by Jeff Fitlow/Rice University)

A team of researchers at Rice University, led by associate professor Anshumali Shrivastava, has developed innovative methods to improve the efficiency and customization of large language models (LLMs), as a significant step toward making artificial intelligence (AI) more accessible and sustainable.

The team's research addresses the substantial computational power and energy consumption required by modern AI models, which often render them expensive, environmentally taxing, and inaccessible to smaller organizations. Their work was showcased in three papers presented at the Neural Information Processing Systems (NeurIPS) conference in Vancouver, British Columbia, last December.

"Generative artificial intelligence is still in its infancy when it comes to broader integration," said Shrivastava, who holds positions in computer science, electrical and computer engineering, and statistics, and is a member of Rice's Ken Kennedy Institute. "We have a long way to go till we see the full potential of this technology in play."

One of the key innovations introduced by Shrivastava's team is the "Sketch Structured Transforms (SS1)" method. This technique employs parameter sharing to reduce the memory and computation needs of AI models while maintaining their accuracy. Applied to popular LLMs, SS1 achieved a processing speed increase of over 11 percent without necessitating additional fine-tuning.

Additionally, the researchers developed the "NoMAD Attention" algorithm, enabling LLMs to operate efficiently on standard computer processors (CPUs) instead of the traditionally required graphics processing units (GPUs). This advancement allows AI tools to run directly on devices like phones or laptops, broadening their accessibility. "Our algorithm makes everything run twice as fast without any accuracy loss," noted Tianyi Zhang, a Rice doctoral student and first author on two of the papers.

Addressing the challenge of managing context memory in large AI models, the team introduced "coupled quantization," a method for compressing memory without compromising response quality. By compressing related pieces of memory together, they achieved significant efficiency gains. "We found that we could shrink the memory down to just one bit per piece of information — basically the smallest possible size — while still preserving the model's performance," Zhang explained.

Shrivastava envisions a future where advanced AI is accessible to all organizations, not just tech giants. "We’re only scratching the surface of what AI can do, and already the energy and computing demands are significant," he said. "If we want a future where AI solves problems in health care, climate science, etc., we need to make it vastly more efficient. It is clear that the next frontier of efficiency in AI will come via algorithms."

The research received support from the National Science Foundation, the Ken Kennedy Institute, Adobe, and VMware.

Anshumali Shrivastava earned his doctorate in computer science from Cornell University in 2015 and holds an Integrated M.Sc. in mathematics and computing from the Indian Institute of Technology, Kharagpur, where he was awarded the Silver Medal for being the top of his program.
 

Comments