A Specialized, Customizable, Secure Self-Service Tool
Introduction
Large Language Models (LLMs) are a powerful resource, but developing a custom model can be costly, technically complex, and may raise security and privacy issues. Furthermore, running a model is resource-intensive and may require more effort than individuals or small groups can spare. LLM Factory addresses these challenges by providing a secure environment for users to develop their own tailored models and centralized inference for easy application distribution. LLM Factory offers control, flexibility, accessibility, and security.
What is LLM Factory
LLM Factory is a self-service tool developed by the University of Kentucky’s Center for Applied Artificial Intelligence (CAAI).
Use LLM Factory to fine-tune your own LLM. Train a model with your own data in a secure HIPAA-compliant environment controlled by the University of Kentucky. Configure a model to meet your needs and produce more accurate and effective AI applications.
In addition, LLM Factory users can securely query cutting-edge GPTs. Base models like Llama 3 8B and Llama 3.1 450 B are accessible via the chat interface or through OpenAI compatible API.
Leveraging LLM Factory’s OpenAI compatible API keys, users can seamlessly integrate their custom adapters and fine-tuned models into a library of tools. Plus, LLM Factory hosts a range of powerful models, including Whisper for transcribing short audio files and an embeddings model for RAG applications, which can be integrated into secure API requests as well.
For a more technical overview of LLM Factory, please refer to the paper linked in the citation section below.
Citation
arXiv:2402.00913 |
Why use LLM Factory?
Customization and Flexibility
LLM Factory allows users to fine-tune their own LLM. Develop specialized models tailored to a project’s specific needs. By leveraging cutting-edge AI advancements and the latest foundational pre-trained models, users can create models that meet their specific needs.
Security and Privacy
Unlike public platforms, your information and interactions are private on LLM Factory. Datasets uploaded to LLM Factory for adapter training are stored locally on a UK server, and only accessible by you and your team. Fine-tuned models are private, only accessible through a unique configuration ID and personal API Key. All API requests and chat conversations within LLM Factory’s interface are secure.
OpenAI Compatible API
LLM Factory’s API is OpenAI compatible, allowing users to expose their models to external APIs and integrate with a library of tools. This means a range of advanced features, including embeddings, transcriptions, function calling, and more can be easily supported.
Free Access
Mainstream tools like GPT-4 require subscriptions; LLM Factory is free to CAAI collaborators. CAAI can host models at a significantly lower cost because we use on-premises computational resources for inference. To use LLM Factory, simply reach out with a project idea to be granted access.
What supports LLM Factory?
The Center for Applied AI functions within the Institute for Biomedical Informatics. Efforts are supported by the University of Kentucky College of Medicine.
How LLM Factory Works
An Introduction fine-tuning, data security, API Requests, and hardware
CAAI’s LLM Factory operates on the cutting edge of AI advancements, namely Parameter Efficient Fine Tuning (PEFT) and Low Rank Adaption (LoRA). LLM Factory leverages the latest and greatest open-source models, including Llama 3.1 405B and Nemotron-4 340B.
Techniques like Parameter Efficient Fine-Tuning and Low Rank Adaption have revolutionized the way we leverage pre-trained knowledge in LLMs. Layering additional information, called adapters, on top of a base model enables fine-tuning on only a small subset of parameters. Previously, custom model training required significant computational resources. Now, with LoRA methods, we can achieve comparable performance to traditional full fine-tuning, but with significantly reduced memory usage and trainable parameters. This process is more efficient and less expensive. Fine-tuned models developed through LoRA methods are comparable to that of traditional full fine-tuned models (add source??), but they are much easier to create and scale.
Building on the efficiency of PEFT and LoRA, we take fine-tuning to the next level by harnessing the power of the latest state-of-the-art models as foundational base models. These models offer an immense capacity for pre-trained knowledge, requiring equally impressive computing capabilities. CAAI uses an on-site NVIDIA DGX computing cluster, boasting 3.2TB of VRAM, that enables us to host a variety of base models and a multitude of adapters. Rather than hosting a bunch of different, separate models, users interface with the system programmatically. This is computationally and cost efficient.
Users have the power to create LLMs that not only are informed on the vast amounts of data that foundational models have been trained on but that also learn from project specific data. With LLM Factory, users have access to cutting-edge models and the ability to customize those models to meet their unique needs. It’s our hope this drives unprecedented achievements and efficient workflows.
Users can easily integrate their fine-tuned models, or base models available through LLM Factory, with OpenAI’s ecosystem. LLM Factory API Keys are OpenAI compatible, meaning you can seamlessly integrate OpenAI’s capabilities into your applications, services, or projects. LLM Factory’s User Guide covers tool/function calling, embedding, transcription and more. You don’t have to be an experienced developer to use LLM Factory.
LLM Factory offers efficient, scalable fine-tuning on a user-friendly platform controlled by the University of Kentucky. Alternative fine-tuning options, like OpenAI or Anthropic, don’t have clear or customizable data restriction policies. With LLM Factory, your data and interactions are secure. Only you and your team members have access to your trained adapters. Data used for fine-tuning, conversation history through the chat interface, and API calls are all secure. LLM Factory will soon be officially HIPAA compliant, furthering research objectives across disciplines. Individual, private, HIPAA compliant instances are available now, by request! If you would like to learn more, please reach out to ai@uky.edu.
Explore Llama 3 8B, the base model at the foundation of LLM Factory
LLM Chat – Center for Applied AI Hub (uky.edu)
Explore an Adapted Model
LLM Compare – Center for Applied AI Hub (uky.edu)
News & Updates
Currently LLM Factory supports the base model Llama 3 8B. As of May 2024, the self-service tool is available to CAAI collaborators, and we are accepting requests for new projects. If you would like to access LLM Factory or discuss possible collaboration, please contact us! You can fill out our collaboration intake form here.
HIPAA Compliance
LLM Factory V1.3 (July 2024) is not HIPAA Compliant. Private, HIPAA compliant instances are available on an individual basis! If you would like to learn more, please reach out to ai@uky.edu
How to Use LLM Factory
LLM Factory is available on CAAI’s self-service tool website: data.ai.uky.edu
Microsoft is the current Single Sign-On provider, you must use a Microsoft 365 account to access LLM Factory. Before you can get started, you must be granted the necessary permissions from a CAAI Administrator. Please contact us for access.
When your Data.AI account is set-up, please explore the User Guide documentation linked in the menu of LLM Factory. The User Guide provides in-depth instructions for navigating the LLM Factory interface; it covers model training, including data curation and model evaluation, as well as how to implement models using Open-AI compatible API Keys.
Collaborative Projects using LLM Factory
The following tools are in development:
SpeakEZ – transcription, diarization, summarization, theme extraction
KyStats – code generation and querying databases with natural language
AgriGuide – RAG methods and LangChain tools for community and agricultural specific resources, multi-modal chat and image interface
Population Health – distance learning assistant using conversational LLMs and generating synthetic actors
CELT – RAG methods and LangChain tools integrated into a website to help users navigate and find resources