Unlocking the Potential of LLMs for Semantic Role Labeling

Semantic Role Labeling (SRL) is the shallow semantic parsing task that identifies predicate-argument structures in text, answering who did what to whom, where, when, and why. Traditional systems rely on heavy linguistic annotation, pipeline models, and domain-specific feature engineering. Large language models have simplified this dramatically. With the right prompt and a capable inference backend, you can extract fine-grained semantic roles from raw text without training custom pipelines. For teams running SRL at scale, the backend choice directly impacts cost and latency, especially when processing long documents or running agentic workflows that chain multiple reasoning steps. Oxlo.ai offers a developer-first inference platform with flat per-request pricing and a broad model catalog that is well suited for these exact workloads.

What is Semantic Role Labeling

At its core, SRL maps sentences to predicate-argument structures. Given a sentence such as The committee approved the funding yesterday, an SRL system identifies approved as the predicate and assigns roles such as ARG0 (agent, The committee), ARG1 (theme, the funding), and ARGM-TMP (temporal modifier, yesterday). These labels typically follow PropBank or FrameNet conventions. Unlike dependency parsing, which focuses on syntactic heads, SRL targets semantic relations. This makes it valuable for question answering, event extraction, and knowledge base construction.

From Pipeline Models to Prompts

Classical SRL systems use BiLSTM-CRF architectures or BERT-based token classifiers. These models require annotated training data for each domain and language, and they struggle with implicit arguments or long-range dependencies. LLMs shift the problem from model training to prompt design. A sufficiently capable model can perform zero-shot SRL given a clear definition of roles and a structured output format. Few-shot examples improve consistency, particularly for ambiguous predicates or non-canonical word order.

Prompt Engineering and Structured Output

To get reliable SRL output from an LLM, constrain the response format. JSON mode reduces parsing overhead and prevents the model from drifting into explanatory prose. Oxlo.ai supports JSON mode and function calling across its chat models, so you can enforce schemas directly through the API.

Below is a minimal Python example using the OpenAI SDK with Oxlo.ai. The example sends a sentence and requests a JSON object containing predicates and their arguments.

import openai
import os

client = openai.OpenAI(
    api_key=os.environ["OXLO_API_KEY"],
    base_url="https://api.oxlo.ai/v1"
)

prompt = """Perform semantic role labeling on the sentence below.
Return valid JSON with a "predicates" array. Each predicate must contain
a "verb" field and an "arguments" array. Each argument must have a
"role" and a "text" span.

Sentence: The committee approved the funding for the project yesterday.

Desired schema:
{
  "predicates": [
    {
      "verb": "approved",
      "arguments": [
        {"role": "ARG0", "text": "The committee"},
        {"role": "ARG1", "text": "the funding"},
        {"role": "ARGM-CAU", "text": "for the project"},
        {"role": "ARGM-TMP", "text": "yesterday"}
      ]
    }
  ]
}"""

response = client.chat.completions.create(
    model="qwen3-32b",
    messages=[{"role": "user", "content": prompt}],
    response_format={"type": "json_object"}
)

print(response.choices[0].message.content)

Running this against Oxlo.ai returns structured annotations without custom training data. You can swap qwen3-32b for llama-3.3-70b or deepseek-r1-671b depending on whether you need general-purpose fluency or deep reasoning for complex, nested clauses.

Choosing a Model for SRL

Not every SRL workload needs the same capability profile. Oxlo.ai hosts more than 45 models across seven categories, and several are particularly relevant for semantic parsing.

General-purpose accuracy: Llama 3.3 70B is a reliable default for English SRL with strong instruction following.
Deep reasoning: DeepSeek R1 671B MoE excels at disambiguating difficult predicates, light-verb constructions, and nested clauses that require multi-step analysis.
Multilingual and agentic tasks: Qwen 3 32B handles non-English SRL and integrates cleanly into agent workflows that chain tool calls.
<