Cloud vs on-premises: which is the best deployment option for LLMs?

Angelo Mosca

18 Mar 2024

With the launch of Generative AI technology, there has been a wide diffusion of several LLMs that, with different features, are able to support use cases in many areas.

Almost one year after the public availability of such a kind of tools, some enterprises are struggling with evaluating which deployment option best meets their requirements. Is it better using cloud native solutions or opting for an on-premise solution?

Answering this question is not so easy because there are a lot of factors to be taken in consideration. The beginning of 2023 witnessed an incredible hype around Generative AI, LLMs and their disruptive capabilities, that gained the attention of consumer audience at first. Then, starting from there, an incredible number of firms, across all the industries and all the business areas, started to explore the power of GenAI, trying to understand what the impact on their businesses could be and on the productivity of their workforces.

In this continuously evolving GenAI-players landscape, Google Cloud quickly positioned itself as one of the leaders, quickly releasing to its customers a set of powerful tools, enterprise-grade and ready-to-use, for starting work with Generative AI.

During recent months, the Generative AI offering by Google Cloud has been evolving and consolidating, with some clear concepts in their strategy: openness, easiness, and responsibility.

The cloud advantage: Google Cloud offering

Google Cloud in the Generative AI area is mainly focused around Vertex AI, which has become the real core of all the AI-based platform services by the Cloud Service Provider (CSP). In particular, the most relevant GenAI components that have been added to the AI/ML platform are:

Vertex AI Model Garden: following the openness and free-to-choose mantra of Google Cloud, Vertex AI Model Garden is a comprehensive platform which allows customers to choose between a complete set of LLMs (by Google Cloud and by 3^rd party providers) which could best fit the requirements of a specific scenario / use case, tuning and testing it to reach out the best ratio between performances and cost. This “garden” is continuously fueled with new models (currently there are more than 130 enterprise-ready models to be chosen from) like Gemini, Gemma or Mistral AI;
Vertex AI Search: an easy-to-use service to quickly set up Google quality multi-modal, multi-turn search experiences for customers and employees. It allows to deliver relevant, personalized search experiences really in minutes, for enterprise apps or consumer-facing websites, without any need for technical background and/or skills;
Vertex AI Conversation: likewise, Vertex AI Search, Conversation supports the capability of building custom chat and voice bots powered by Google Cloud’s Generative AI that are grounded on specific enterprise data, according to the use case they are built for. It combines deterministic workflows with Generative AI to make conversations more dynamic and personalized thanks to multi-modal support.

These products, like all the other ones in the GenAI offering technology stack (e.g. Vertex AI Studio) are fully integrated and powered by the different flavors of Gemini LLM, giving customers the possibility to access very edge of innovation in this area.

The last aspect (but not the least) to be considered when looking at Google Cloud GenAI offering is related to the availability of specific hardware (TP U v5e, L4 GPU, A100 80G and H100) that is built-in into Google Cloud services to specifically support GenAI related training, tuning and execution workloads.

With that in mind, it becomes easy to scope the advantages that an enterprise could get choosing Google Cloud as the platform for running GenAI solutions and LLM models:

Up to speed innovation: leveraging Google Cloud platform services helps any enterprise to be ready to use the latest innovations in Generative AI as soon as they are ready to go. In the last 12 months, several new LLMs have been announced and launched, and they have been integrated in Vertex AI after few weeks of private/public preview;
Advanced maintenance: no time and effort have to be spent by enterprises in the maintenance area due to check-ups, updates and patching being fully managed by Google Cloud team itself;
Unlimited access: no restriction is in place in terms of location to access Google Cloud platform services;
Extreme flexibility in scalability: the needed resources can be automatically scaled up and down according to specific needs, without any downtime. They are always ready to serve the specific use cases but, if no needed anymore, they can be “decommissioned” without any financial impact;
Lower starting costs thanks to economies of scale: specific hardware (other than software, as well) for supporting LLMs (GPUs, in particular) requires huge upfront investments that can be easily avoided leveraging the economy of scale of a cloud platform.

On-premises solutions and their benefits

If several enterprises have started their journey through Generative AI leveraging cloud services in the typical “try-and-buy” approach – the ones that are some steps ahead in this journey are starting to consider on-premises deployment as an alternative to cloud one for different reasons, that can be technical, business or regulation related.

Even it can seem strange at a glance, on-premises deployment can be a good fit in specific scenarios and can bring several advantages to enterprises:

Data safety: deploying LLMs and Generative AI solution on-premises gives the enterprise the highest possible level of control over data that can be a paramount requirement, in particular in the context of highly regulated industries;
Low dependency: with the on-premises deployment there is no dependency on cloud providers tools so any choice can be (in some cases, not every time) easily reverted without any concern related to lock-in;
Customization: with the full control of what is set up, any enterprise can define at fine-grained level of detail which are their needs, and which are the solution that helps to address those specific ones

Cloud vs on-premises: How to choose

Considering all these aspects it seems tricky to choose which direction could be the right one. Assuming that there is no “one-answer-fits-all”, some considerations on cloud deployments, and especially on Google Cloud solutions, can be made to overcome some concerns:

Google Cloud is a platform secure by design and thanks to their recently launched sovereignty offering it can help to keep sovereignty over data even for the most sensible workloads;
Thanks to the open philosophy at the foundation of their platform, Google Cloud helps customers to be free in choosing which model to use (even not Google-owned ones) so to reduce lock-in risk at minimum;
The built-in features of Vertex AI help customers to fine-tune and customize LLM models to find out the right balance between cost and performances other than to find the right fit for their specific needs.

In the end, Google Cloud platform services offer comprehensive tools that are secure, scalable, cost-optimized, and always up to date in terms of capability and features. For this reason, they can fit for almost any need, even the most challenging ones.

And, if strict requirements are in place from security and data protection point of view or specific customizations are required, on-premises deployment can be a valid option to be pursued, maybe only for dedicated workloads.

For this reason, thanks to its strong partnership with Google Cloud and its deep industry knowledge, Capgemini can act as a trusted advisor towards enterprise customers that are at the beginning of their GenAI strategy definition and need to evaluate which is the right path to follow, to pursue their objectives and reach their targets. Leveraging the long-term experience that we gained on real projects in complex contexts, we can support your cloud journey to get the best ROI out of GenAI solutions enrollment.

So how can Innovation, meet intelligence? We will be exploring this at Google Cloud Next.

Capgemini at Google Cloud Next 2024

Google Cloud Next brings together a diverse mix of developers, decision makers, and cloud enthusiasts with a shared vision for a better business future through technology. As a Luminary Sponsor, Capgemini is committed to elevating the event experience with opportunities to boost learning and engagement and get fresh insight into today’s riveting topics – including generative AI.

Whether the aim is empowering businesses or their people to unlock the power of generative AI, Capgemini is at the forefront of this revolution. Our continuous work in this growing domain means we are equipped to help our partners capitalize on this unique technology and engineer use cases for enhanced and unprecedented customer experiences.

Come by our booth and let’s discuss the possibilities in the world of Generative AI, Cloud, Data/AI, and Software Engineering. Or reach out to us – we would love to hear your perspective on how we can get ready for what comes next.

Author

A senior cloud advisor with more than 10 years of cross-industry experience with focus on enterprise architecture and cloud strategy definition. In the last 2 years Angelo, as part of the Southern and Central Europe Cloud CoE team, has been committed to advice customers on business transformation through cloud adoption, other than to drive the overall business development on Google Cloud technology in the whole region.