Getting Started with LLM for Text Generation

We are going to build a note-to-blog-draft generator that turns rough bullet points into a structured markdown article. It is useful for developers and technical writers who need to publish frequently but want to skip the blank-page problem.

What you'll need

Python 3.10 or newer.
The OpenAI SDK installed with pip install openai.
An Oxlo.ai API key from https://portal.oxlo.ai. Oxlo.ai uses flat per-request pricing, so generating a 2,000-word draft costs the same as a one-line reply. See https://oxlo.ai/pricing for plan details.

Step 1: Instantiate the client

Import the SDK and point it at Oxlo.ai. The base URL and API key are the only changes needed to turn the OpenAI client into an Oxlo.ai client.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY", "YOUR_OXLO_API_KEY")
)

# Sanity check
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Say hello"},
    ],
)
print(response.choices[0].message.content)

Step 2: Define the system prompt

The system prompt keeps the model in its lane. We want a technical writer that outputs only markdown, not commentary.

SYSTEM_PROMPT = """You are a technical writing assistant.
Your job is to convert rough bullet points into a well-structured markdown blog draft.
Follow these rules:
- Write an H1 title and logical H2 sections.
- Expand bullets into full paragraphs with concrete detail.
- Do not add fluff, emojis, or marketing hyperbole.
- Output only valid markdown.
- If the notes mention code, include a code block with syntax highlighting."""

Step 3: Build the generator function

This function accepts raw notes, packages them with the system prompt, and returns the finished draft. I use Llama 3.3 70B because it follows long instructions reliably, but Oxlo.ai also offers Qwen 3 32B and Kimi K2.6 if you need multilingual or vision support later.

def generate_draft(notes: str, model: str = "llama-3.3-70b") -> str:
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Turn these notes into a blog draft:\n\n{notes}"},
        ],
        temperature=0.7,
        max_tokens=4096,
    )
    return response.choices[0].message.content

Step 4: Stream the output

For long drafts, waiting for the full response feels slow. Adding streaming lets you print tokens as they arrive. The Oxlo.ai endpoint supports the standard OpenAI streaming format with no cold starts on popular models.

def generate_draft_stream(notes: str, model: str = "llama-3.3-70b"):
    stream = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Turn these notes into a blog draft:\n\n{notes}"},
        ],
        temperature=0.7,
        max_tokens=4096,
        stream=True,
    )
    for chunk in stream:
        token = chunk.choices[0].delta.content
        if token:
            print(token, end="", flush=True)

Run it

Save the script as draft_generator.py, export your key, and run it with the sample notes:

import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY", "YOUR_OXLO_API_KEY")
)

sample_notes = """
- Topic: Using Oxlo.ai for long-context text generation
- Oxlo.ai has flat per-request pricing, not per-token
- Good for agents and drafts that get long
- Compatible with OpenAI SDK
- Models: Llama 3.3 70B, DeepSeek V3.2, Kimi K2.6
"""

generate_draft_stream(sample_notes)

With the sample notes above, the generator outputs something like this:

# Using Oxlo.ai for Long-Context Text Generation

Generating long-form content with traditional token-based APIs can get expensive fast. Oxlo.ai takes a different approach with flat per-request pricing.

## Why Per-Request Pricing Matters

When you are building agents or writing first drafts, context windows fill up quickly. With token-based billing, every extra paragraph costs more. Oxlo.ai charges one flat rate per request, so a 100-token prompt and a 10,000-token prompt cost the same.

## OpenAI SDK Compatibility

You do not need to rewrite your stack. Point the official OpenAI SDK at https://api.oxlo.ai/v1, pass your Oxlo.ai API key, and you are ready to go.

## Model Options

Oxlo.ai hosts a range of open-source models. For general writing, Llama 3.3 70B is a solid default. If you need reasoning or coding mixed into the draft, DeepSeek V3.2 or Kimi K2.6 are available on the same endpoint.

Next steps

Wire the generator into a CLI tool that reads notes from a markdown file and writes the draft to stdout. If you publish regularly, consider upgrading to a paid plan at https://oxlo.ai/pricing so you are not limited by the daily request cap on long-form batches.