How to Evolve Your Tech Stack to Leverage LLMs

Depending on your specific use case, your approach to LLMs may vary. Here’s a breakdown of the options and best practices.

How to Evolve Your Tech Stack to Leverage LLMs

Although the tech industry is conflicted about AI, everyone can agree on one thing: generative AI is a generational change that will affect every industry.

To remain competitive companies will need to evolve their products and services to leverage or include AI-powered features along with their existing offerings.

Companies that already rely on a well-architected and scalable distributed system have a ‘hospitable environment’ for AI applications to flourish, but even they will need to evolve their tech stack, adapt their architecture, and future-proof for constant AI evolution.

Depending on your specific use case, the criticality of the new AI feature, and the available time-to-market, your approach may vary. Here’s a breakdown of the options and best practices:

SaaS LLM provider or OSS LLM?

Let’s say that you’re building a new product that involves human interaction through natural language and you need to leverage LLMs (Large Language Models). This is a mission-critical product that is pivotal to your whole company strategy where time-to-market is less important than obtaining a specific, high-quality output and maintaining complete control over proprietary data and trade secrets.

In that case, you’ll want to consider developing and refining an LLM in-house, which will mean making a significant investment in computational resources, data, and AI expertise.

However, let’s say that you’re adding a feature to an existing platform that will enhance the user experience but may not be the most critical element. You also want to ship asap, maybe before a competitor or to capitalize on a market trend. In that case, you may want to select a third-party provider that already offers an off-the-shelf, pre-trained, LLM as a SaaS solution.

When considering your options, you want to weigh these pros and cons:

Using a SaaS LLM Provider

Depending on the problem you’re solving, you can accomplish a lot with simple prompt engineering and off-the-shelf LLMs. Starting “simple” and experimenting will also help you assess the best approach for your use case.

Examples: OpenAI, Google

Quick Deployment with Pre-trained Models Limited Control & Customization over the AI model behavior and functionalities
Consistent System Upgrades and Maintenance Dependency on the Provider and Less Control (you rely on the vendor’s availability and you cannot add features on demand).
Scalability (you leverage the provider's infrastructure and expertise) Data Privacy & Copyright Concerns
Cost Effective (pay-per-use)

Self-hosted with SaaS LLM API integration

The SaaS provider provides the API and has proprietary rights to the model. You are the API integrator building a client that interfaces with their service. Data is owned by you, while you delegate portions of data to training or using the SaaS provider’s LLM.

Quick Deployment Skill and Expertise (to train the model)
Customization (the AI feature’s code and behavior are based on your logic) Dependency on the Provider and Less Control (you rely on the vendor’s availability and you cannot add features on demand).
Scalability (you leverage the provider's infrastructure and expertise)
Cost Effective (pay-per-use) - Although you incur in computation expenses to train or tune the SaaS provider’s models on your data.

Self-hosted with an OSS (Open-Source Software) LLM Model

There are many OSS projects in this space, with frequent new exciting releases (e.g. the release of Llama2 by Meta and the (upcoming) release of OpenAI’s G3PO).

Flexibility & Customization Resource Intensive (significant developer expertise, time, and resources for training models, data preparation, and ongoing maintenance).
Full Control (you own the AI implementation, enabling fine-tuning, extensions, and integration with the existing architecture.) Training Data Challenges (in particular acquiring and preparing large, diverse datasets for training)
Community Support Potential Complexity (in integrating and maintaining the OSS model within the existing distributed architecture)
Cost Effective Not Enterprise-Ready (many open source projects are developed mainly as research projects and not for commercial users)

Self-hosted, Built In-House LLM

Full Ownership (over the entire AI development lifecycle, ensuring data privacy, protection of proprietary trade secrets, adherence to privacy and security policies, and full integration.) Resource & Cost Intensive
Tailored Solutions to your product's unique requirements and business goals Extensive Skill and Expertise Requirements
Long-Term Flexibility (evolving the AI features alongside the product's growth and changing needs) Multi-year Process

It’s worth highlighting that while “cost” is just one line in the CONs section of the above table, it has a significant weight:

The costs that went into training chatGPT […] are estimated to be around ~$4.6 million when using the lowest GPU cloud provider, excluding R&D and human resourcing costs. You can refer to this article for insights on estimated costs for training LLMs at scale.

Evolving your Distributed System to Embrace AI

Adding AI to your existing software means reviewing and evolving every part of your software: your tech stack, data management, system architecture, integrations, security, etc. Even small, everyday things, can suddenly become more complex — for example, anyone who is developing deep learning apps with GPUs knows that Docker images can be enormous, sometimes as big as 10GB.

Apart from general changes such as scaling your compute and storage resources to accommodate the AI workloads, there are two specific best practices that I recommend:

(1) Architect for change

The tech industry is changing constantly, and the AI landscape even more so and more quickly. Just think that two of the leading deep learning frameworks — Tensorflow and PyTorch — are only ~7-8 years old. Infrastructure tools like Sagemaker and Kubeflow are about ~5-6 years old. New tools and game-changing libraries continue to emerge every few years.

When adding an LLMs-based feature to your product, you want to architect it so that you can adjust as your product grows and evolves.

  • Embeddings - LLMs use embeddings to convert words or other content (e.g. images) into numerical vectors, capturing semantic relationships and contextual meaning. Embeddings are typically “learned” using neural networks - so expect continued innovation in this area. Be sure to plan for change so that you can integrate new advances easily, which will help improve your feature’s performance.
  • Models - You want to choose models that are modular, scalable, and capable of handling inputs that are appropriate for the problem you are solving. These models should support transfer learning, allowing you to reuse pre-trained weights or adjust them to suit your specific tasks. Keep in mind that you ideally want to architect your app to support multiple embeddings and models simultaneously.
  • Context window size - The context window size refers to the number of tokens that an LLM considers when predicting the next token in a sequence. A large context window allows you to do more things with prompt engineering and can reduce the need to refine an LLM for a specific use case.
  • Prompt engineering - You want to ensure that your developers can easily experiment with prompts used to interact with the AI model, which provides greater control over the AI's output.

Some LLMs provide additional parameters and extensions to fine-tune the model's behavior. For example, they may include options to adjust the temperature (i.e. the randomness of the generated text), the maximum length of generated responses, or the ability to implement conditional generation, allowing you to control the output of the LLM more precisely for your particular use case.

(2) Track & evaluate performance

Historically, quality has been associated with uptime, availability, or the presence (or lack thereof) of bugs - situations where the presence of an issue is clear to the end user. However, with generative AI, the presence of errors is hard to decipher: hallucinations are common and yet difficult to detect.

A model can evolve and change over time: it may need to be fine-tuned regularly with fresh data points and user feedback to stay relevant and accurate.

Having a well-thought out process to track and evaluate the performance of your AI feature and how the iterations on your LLM are working out, is a key consideration. This may lead you to add MLOps platforms like Weights & Biases to your tech stack.

The Future is Iterative

Tech evolves in waves and ripples, from tsunami-level changes like the adoption of the Internet and the recent advancements in AI with LLMs, to smaller waves like the invention of Javascript, React, etc. The one constant is that it will continue to evolve 😉.

Consider just the hardware aspect: Today we’re solving problems with a single machine, that required an entire cluster of machines in the past. That’s because a single server can now be configured with more memory, storage, and cores than what could have been provided by an entire cluster of servers 10 years ago.

It’s easy to imagine a future where AI itself is used to make all the recommendations in this post, but tailored directly to your use case with specific recommendations on how to optimize the design of your system architecture, intelligently allocate resources, etc.

As you evolve your product, team, infrastructure, and software, you’ll need a tool to empower you to keep up with the latest tech waves. A platform that shows you the real-time status of your distributed system and enables easy version control and branching.

We’re not just imagining that future: we’re actively building it. Multiplayer wants to help teams solve bigger and bigger problems by effortlessly designing, developing, and managing distributed systems with the assistance of AI. If that sounds exactly like what you need, sign up for the beta waitlist!