Basics

Use the inference endpoint to perform rewriting with your fine-tuned model. The inference endpoint has two modes: full and rewrite

Full Mode

Full mode performs inference from a standard AI model such as gpt-4o or Claude 3.7 Sonnet and then styles the output of the AI with your fine-tuned model. This provides a full end-to-end inference process that is similar to using the OpenAI or Anthropic API.

Specify the fine-tuned model ID in model.

Provide the list of messages in the converations in messages.

The full_mode_options parameter enables setting base_model and base_temperature to specify the model and temperature for the base AI model for inference.

Rewrite Mode

Rewrite mode uses your fine-tuned model to rewrite text that you provide it. In this mode, you will have generated text using an AI model separately, and you submit this AI-generated text to our endpoint to be rewritten.

Specify the fine-tuned model ID in model.

Provide the input text as a string in message.

No conversational turns or instructions to the AI need to be submitted.

If you have a large block of multi-paragraph text, include all of it in a single API call. You don’t need to segment the text into separate calls.

The styled content is returned at response["choices"][0]["message"]["content"]

Streaming

Set the stream option to True to enable streaming responses. Streaming can be used in full or rewrite mode.