Tutorials
Using Custom Models with NeuralFlow
Three ways to integrate your own fine-tuned or proprietary AI models into NeuralFlow pipelines.
While NeuralFlow integrates with all major AI providers, sometimes you need to use your own fine-tuned or proprietary models. This guide covers three ways to bring custom models into your NeuralFlow pipelines.
Option 1: Model API Endpoint
If your model is already deployed behind an API, you can connect it to NeuralFlow as a custom endpoint:
from neuralflow import CustomModel
model = CustomModel(
name="my-fine-tuned-bert",
endpoint="https://models.mycompany.com/bert-v2/predict",
auth={"type": "bearer", "token_env": "MY_MODEL_TOKEN"},
input_schema={"text": "string"},
output_schema={"label": "string", "score": "number"}
)
Option 2: Container Deployment
Upload your model as a Docker container, and NeuralFlow handles scaling, monitoring, and load balancing:
# Package your model
nf model package ./my-model --runtime python3.11 --gpu a100
# Deploy to NeuralFlow's infrastructure
nf model deploy my-fine-tuned-bert:v2 --replicas 3 --gpu-type a100
Option 3: Hugging Face Integration
Deploy any Hugging Face model directly:
from neuralflow import HuggingFaceModel
model = HuggingFaceModel(
repo="myorg/custom-sentiment-v3",
task="text-classification",
device="cuda"
)
Performance Tips
- Use GPU instances for transformer models — CPU inference is 10-50x slower
- Enable request batching for throughput-sensitive pipelines
- Set up auto-scaling rules based on queue depth, not just CPU usage
- Cache responses for deterministic inputs to reduce costs