top of page
Writer's pictureGaurav Sharma

The Case Against Cloud Based Language Models - Costs and The Road Ahead.

Updated: Jan 1


I saw the recent Apple launch on May 7th (iPad with M4 Chips). This sparked an interesting question in my mind. If we have so many powerful machines available at our disposal (Laptops, Tablets, and Phones), why are we still not using their local computing power instead of just sending requests to a cloud-based LLM model?


So, I did a quick napkin calculation, and one trend stood out very clearly! Apple plans to use On-device LLMs rather than Cloud-based LLMs.


In this article, I will examine this hypothesis in depth. This is especially true because we, as Procurement Professionals, are driven by evaluating the total cost of ownership (TCO) when making strategic technology investments. In the case of sourcing for large language models and their infrastructure, we will need to be detailed in our vigilance on such costs rather than relying solely on cloud providers.

Rise of LLMs


The rise of LLMs has been nothing short of revolutionary. As we all learn and grasp the basics of LLMs as consumers, the Chief Procurement Officers (and Procurement Managers) are required to gain an expert-level understanding of the costs associated with such technologies.


The base case of a Cloud LLM is simple. Users send queries to powerful servers in the cloud, which process these requests and return the responses. However, as I mentioned earlier, the underutilization of our devices is an issue. As we delve deeper, many drawbacks become quite apparent. Hence, Apple's move makes even more sense.



Rise in Computing Power


Computing power is on the rise. The current laptops and smartphones (with processors and GPUs) are more than capable of running local LLM. Before we expand on this, let us first understand what "Local LLM" means. In simple words,


  • Cloud LLMs run on the cloud provider's servers (Like Chatgpt, which runs on OpenAI's own cloud infrastructure), which users access remotely over the Internet.

  • Local LLMs run directly on hardware/servers within the user's own on-premises data center or local infrastructure. This could mean your laptop/smartphone or a server setup by your company.


According to a report by Deloitte, the average computing power of smartphones has increased by over 1000% in the past decade, with similar trends observed in laptops as well.


Cost of Constant Communication - The CPO Concern


Just so you know, each interaction with a cloud-based LLM incurs Network Bandwidth, Latency, and, most importantly, Usage fees from the service provider (Like Open AI charges based on the number of tokens).


Do note that your standard USD 20/Month is not going to help you with your enterprise-level use cases. Because you will need API access, and that's where the concept of token consumption per query comes into the picture.


The question is: Should we opt for a cloud-based LLM solution or bring the power of LLMs on-premises? Let's crunch the numbers.

Lets first analyze the scenario of cloud-based LLMs



A.) Cloud LLM Costing Model


Let's assume a scenario for a mid-sized company with 200 potential users.


1.) Cost of LLM itself:


Before we delve into the details, please note there are two parts to the cloud LLM cost.

Cost of Input tokens (User asking questions) and Cost of Output Tokens (LLM response).


Back to our scenario calculation, assuming 200 queries per user per day (this roughly translates to 20 queries/10 working day hours each day), we will generate about 1.2 million queries a month for all users. An average query will contain around 50 tokens, and hence, the total input tokens will be around 60 Million. As of April 2024, for the GPT-4 Model (32k), Open AI charges about USD 60 per 1 Million Token.


Hence, the total cost of input tokens will be USD 3600 per month. Similarly, the total cost of output tokens will be USD 7200 per month. Why the difference?


According to the new pricing model of Open AI, they will charge more on the Output token (twice the amount that of input tokens)


So, 3 year TCO (for LLM Only) comes out to be = USD 129,600 * 3 = Roughly about USD 389,000

 

However, the costs don't stop there. We must also account for Cloud Compute costs. These are where the hidden costs lie, and they will take you for a ride when it comes to scaling your LLM model to more users. Note that we are still using a base case scenario of a mere 200 users only. If you are launching an organization or even a consumer-based application, the number of users can grow exponentially!!


Following are some typical costs.


2.) Cloud Compute Costs


For AWS, Azure etc. This is where you will host your User Interface for Chatbot or Use Case).


Pricing for Amazon Web Services m5.large (2 vCPUs and 8 GiB memory): Approximately $0.096 per hour per instance.

  • Assuming two instances running 24/7 for high availability: Monthly Cost: $0.096/instance/hour × 2 instances × 24 hours/day × 30 days = $138.24


3.) Data Transfer Costs


Given that each query involves data transfers both to and from the OpenAI API, and assuming higher data payloads:


Estimate per Query: 10 KB sent + 10 KB received = 20 KB per query


Total Data Transfer for 1.2 Million Queries: 1.2 million × 20 KB = 24,000,000 KB or 24,000 MB or 24 GB


AWS Data Transfer Costs (after the first 1 GB free, at $0.09 per GB): $0.09/GB × (24 GB - 1 GB free) = $2.07


4.) Storage Costs


For storing logs, processed data, and potentially other artifacts like temporary data or session states:


  • Estimate: Instead of just logs, if we consider backups and data durability requirements: S3 Standard Storage for 50 GB per month: $0.023/GB/month × 50 GB = $1.15


 

3-YEAR TCO (All Combined) for 200 Users and Cost Simulation for User Scaling



So, now we are getting the hang of our TCO. But the costs of cloud computing, etc., are still not high! Remember, we are still at 200 users.


Let's run some scenarios with 200 users, 500 users, 1000 users, and 2000 users!



Ouch! This quickly becomes 2.6 Million in a 3-year TCO for a 2000 user base scenario!

 

B.) On-Prem LLM Costing Model


Now, let's analyze what it will cost us to deploy an on-prem LLM model. For an on-premises deployment scenario similar to Scenario 4, where there are 2,000 users generating a total of 12 million queries per month, we need to consider various factors, including hardware, software, maintenance, staffing, and operational costs. Here's a detailed breakdown:

Wow, that's almost 60% cheaper than on-cloud LLM.


We have not even touched upon the primary reasons for moving out of oncloud LLMs. I am sure you know quite a few of them already, but I will list them anyway.


Why On-Prem LLMs are the future.


While the on-prem LLM option is cost-competitive, it offers additional strategic benefits that cloud solutions cannot match:


  • Data privacy/security: Spend data remains within your secure environment.

  • Low latency: No network dependencies for lightning-fast query responses.

  • Model customization: Fine-tune the LLM for your specific spend taxonomy.

  • IP protection: Your data and query patterns remain proprietary.



At least this napkin calculation provided some thought for you as a CPO or Procurement Manager regarding your upcoming and ongoing Cloud LLM use cases.


Also, Interestingly, you can now estimate the true cost of your AI-driven Procurement Applications (i.e., Spend Analysis, eSourcing, etc). If you have a user base of only 200 buyers, you should not be paying exorbitant prices. You are just paying for other users of your esteemed suppliers.


What do you think? I claim to be no expert on these costs, and I could have missed many important aspects. Let me know!


Until next time.


Keep Supernegotiating!


If you liked my article, consider supporting my work!




For more such articles, follow Supernegotiate on



248 views0 comments

Recent Posts

See All
bottom of page