How To Deploy LLMs Part 2: Public vs. Private
- eDiscovery and Investigations
- 3 Mins
In Part 1 of our series, “How To Deploy Large Language Models (LLMs),” we discuss the risks associated with different deployment options. It is important to consider these risks, as they can significantly impact your deployment’s overall effectiveness and reliability.
Delving deeper into the pros and cons of each deployment architecture, we examine these options in terms of performance, cost, and capabilities. A thorough analysis clarifies and informs the best deployment strategy for your objectives and operational requirements.
External Third-Party LLM API
Pro: Potentially Lower Cost
While it is impossible to make a general assessment of the relative cost savings of using a third-party LLM API, compute utilisation is one of several driving factors in calculating this reduction. External APIs, such as OpenAI’s GPT-4, are often more expensive per token than open-source models. Still, the latter incurs infrastructure costs associated with hosting the open-source model, which applies whether in use or not. Dedicated deployments may often be more expensive to operate than the third-party API option. An open-source model shared by clients would be less expensive than the dedicated option but would depend on the number of clients sharing the service provider’s hosted LLM.
Pro: Potentially Higher Performance
The performance of any end-user product depends on more than the LLM used. The LLM is only one component in a larger architecture, so various considerations and limiting factors could exist. It is worthwhile but certainly not conclusive to consider benchmarks such as open-source leaderboards and studies. While different models can perform differently relative to each other on various tasks (e.g., summarisation or question and answering), in general, GPT-4 and likely its future iterations will outperform open-source models. However, often, a combination of prompting, tuning, and other architectural design supporting a smaller model will outpace the performance of a larger model deployed in a less sophisticated architecture. To evaluate performance, it would be more accurate to benchmark the solutions in which your technology provider is deploying or integrating these LLMs from end-to-end. Considering performance benchmarks for the LLM alone can be misleading.
Con: Limited Tuning (Customization)
Tuning (and fine-tuning) involves modifying a portion of the original model to perform better at a specific task, which is much more limited with external models like OpenAI GPT.
Con: Potential Data Retention and Use Policies
As mentioned previously, external LLM API providers may have data retention and use policies described in their services agreement with your solution provider. Be sure to inquire about this early on.
Con: No Control Over Versioning
Solution providers relying on a third-party LLM API service are at the provider’s mercy regarding any changes or updates to the underlying model. Typically, third-party LLM providers are continuously updating and improving these models. Though this appears positive, it may lead to sudden and unexpected changes in performance on any downstream task that has leveraged the outputs of a prior model version. These changes may not be noticeable or particularly critical to most. However, in the legal services industry — where consistency and repose are critical to defensibility, such as in document review — this often results in a change to a document classification or designation. This could throw the defensibility and quality of a prior production into serious question.
Custom LLM Provided by Solution Provider
Pro: Full Control Over Data Retention
The solution provider fully controls the model and exercises complete flexibility to define the data retention and use policies. This allows clients to set the standard or at least negotiate the same with their service provider.
Pro: Full Control Over Versioning
As with data retention and use, a solution provider hosting their own LLM could implement version and update controls for clients, allowing them complete control over the versions leveraged on their various matters and in a way that maintains the client’s posture concerning the work they have performed with model outputs. For document review, a service provider leveraging these controls could provide assurances that document classifications or scores will remain consistent and unchanged unless the client takes deliberate action.
Pro: Higher Turning Flexibility
Service providers hosting their own LLM have full control over the LLM parameters. This gives them and their clients tremendous optionality to tune and customise the LLM to their specific tasks or data — so long as the service provider has the technical expertise to make that capability available.
Con: Potentially Higher Cost
For many reasons, models deployed by solution providers are likely to be based on open-source foundational models, such as Llama-3. These models are cheaper to run on per token than an OpenAI GPT-4. However, as mentioned above, the total cost of ownership can often be significantly higher when factoring in infrastructure costs associated with hosting the dedicated environment and GPU resources, regardless of the utilisation.
Both external third-party LLM APIs and custom LLMs offered by solution providers have unique benefits and challenges. External APIs provide lower upfront costs and potentially higher performance but have limitations in customisation, data policies, and version control. Custom LLMs offer more flexibility and control but tend to be more expensive to operate.
Understanding these trade-offs is key to choosing the deployment strategy that best meets your specific needs and goals.
Igor Labutov, Vice President, Epiq AI Labs
Igor Labutov is a Vice President at Epiq and co-leads Epiq AI Labs. Igor is a computer scientist with a strong interest in developing machine learning algorithms that learn from natural human supervision, such as natural language. He has more than 10 years of research experience in Artificial Intelligence and Machine Learning. Labutov earned his Ph.D. from Cornell and was a post-doctoral researcher at Carnegie Mellon, where he conducted pioneering research at the intersection of human-centered AI and machine learning. Before joining Epiq, Labutov co-founded LAER AI, where he applied his research to develop transformative technology for the legal industry.
The contents of this article are intended to convey general information only and not to provide legal advice or opinions.