This guide covers accessing publicly available data. Always review a site's robots.txt and Terms of Service before automated access.
TL;DR
Give your AI agent direct, structured access to SEC EDGAR filings by calling AlterLab's Extract or Search API. The agent receives clean JSON, ready for LLMs, without handling anti-bot measures or parsing HTML.
Why AI agents need SEC EDGAR data
AI agents benefit from SEC EDGAR data in several concrete ways:
- Regulatory filing monitoring: Track 10‑K, 10‑Q, and 8‑K filings for sentiment analysis or risk detection.
- Earnings data extraction: Pull financial tables and MD&A sections to feed into forecasting models.
- Compliance research: Scan for specific clauses or disclosures across thousands of filings to build a knowledge base for legal‑tech agents.
Why raw HTTP requests fail for agents
Direct requests to sec.gov often run into obstacles that waste an agent's token budget and slow pipelines:
- Rate limiting: SEC EDGAR enforces per‑IP limits that cause HTTP 429 responses.
- JavaScript rendering: Some pages rely on client‑side scripts that return empty HTML to a simple GET.
- Bot detection: Automated triggers may challenge with CAPTCHAs or block the IP entirely.
- Failed parsing: Agents spend tokens trying to extract data from malformed or incomplete HTML, reducing the useful context window.
Connecting your agent to SEC EDGAR via AlterLab
AlterLab's Extract API (/api/v1/extract) returns structured data directly, handling rendering and anti‑bot internally. See the Extract API docs for full schema options.
Python example
```python title="agent_sec-gov.py" {3-8}
client = alterlab.Client("YOUR_API_KEY")
Request structured data from a filing page
result = client.extract(
url="https://www.sec.gov/ixviewer/ix.html?doc=/Archives/edgar/data/1234567/000123456723000005/tsla-20231231.htm",
schema={"title": "string", "filedDate": "string", "docType": "string"}
)
print(result.data) # {'title': 'TSLA Form 10-K', 'filedDate': '2024-02-08', 'docType': '10-K'}
**cURL equivalent**
```bash title="Terminal"
curl -X POST https://api.alterlab.io/api/v1/extract \
-H "X-API-Key: YOUR_KEY" \
-d '{
"url": "https://www.sec.gov/ixviewer/ix.html?doc=/Archives/edgar/data/1234567/000123456723000005/tsla-20231231.htm",
"schema": {"title": "string", "filedDate": "string", "docType": "string"}
}'
The response is ready JSON—no HTML stripping, no regex, no retries. This keeps the agent's context window focused on useful data.
Using the Search API for SEC EDGAR queries
When you need to discover filings before extracting them, the Search API (/api/v1/search) returns a list of matching URLs with metadata. This is useful for building dynamic pipelines that react to new filings.
Python example – search for recent Apple 10‑Ks
```python title="agent_sec-search.py" {3-7}
client = alterlab.Client("YOUR_API_KEY")
results = client.search(
query="Apple Inc 10-K",
start_date="2023-01-01",
end_date="2023-12-31",
limit=5
)
for r in results.data:
print(r["url"], r["filedAt"])
**cURL example**
```bash title="Terminal"
curl -X POST https://api.alterlab.io/api/v1/search \
-H "X-API-Key: YOUR_KEY" \
-d '{
"query": "Apple Inc 10-K",
"start_date": "2023-01-01",
"end_date": "2023-12-31",
"limit": 5
}'
The search output gives you a curated set of URLs to feed into the Extract API, keeping the agent's tool calls minimal and efficient.
MCP integration
AlterLab provides an MCP server that lets Claude, GPT, or Cursor agents treat the Extract and Search APIs as first‑class tools. See the AlterLab for AI Agents tutorial to get started. This eliminates boilerplate code: the agent simply calls the tool with a URL and receives structured output.
Building a regulatory filing monitoring pipeline
Here is an end‑to‑end example of an agent that watches for new Tesla filings, extracts key fields, and passes the data to an LLM for summarization.
Pipeline outline
- Agent triggers a scheduled tool call to AlterLab Search for recent Tesla 10‑K/10‑Q filings.
- For each result URL, the agent calls AlterLab Extract with a schema targeting
title,filedDate,docType, and a custom field forriskFactors. - The clean JSON is inserted into the agent's context window.
- An LLM receives the structured data and produces a short brief: “Tesla filed a 10‑K on 2024‑02‑08 highlighting supply‑chain risks.”
Python pipeline snippet
```python title="filing_monitor.py" {5-12}
from openai import OpenAI # example LLM client
alterlab_client = alterlab.Client("YOUR_API_KEY")
llm_client = OpenAI(api_key="OPENAI_KEY")
def get_latest_tsla_filings():
search_res = alterlab_client.search(
query="Tesla Inc 10-K OR 10-Q",
start_date="2024-01-01",
limit=3
)
return search_res.data
def extract_filing_info(url):
return alterlab_client.extract(
url=url,
schema={
"title": "string",
"filedDate": "string",
"docType": "string",
"riskFactors": "string"
}
).data
def run_pipeline():
filings = get_latest_tsla_filings()
for f in filings:
data = extract_filing_info(f["url"])
prompt = f"Summarize the following SEC filing: {data}"
response = llm_client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content)
if name == "main":
run_pipeline()
This pipeline shows how an agent can move from discovery to extraction to reasoning without writing custom parsers or handling bots.
<div data-infographic="steps">
<div data-step data-number="1" data-title="Agent requests data" data-description="LLM agent calls AlterLab tool with target URL"></div>
<div data-step data-number="2" data-title="AlterLab fetches + extracts" data-description="Handles anti-bot, returns structured JSON"></div>
<div data-step data-number="3" data-title="Agent uses clean data" data-description="No parsing, no retries — data goes straight to LLM context"></div>
</div>
## Key takeaways
- Use AlterLab's Extract API for immediate structured access to SEC EDGAR pages, bypassing rendering and anti‑bot hurdles.
- Leverage the Search API to build dynamic discovery workflows that feed extraction calls.
- Integrate via AlterLab's MCP server to treat web data as a native tool for LLM agents.
- Always verify robots.txt and rate limits; the responsibility for compliant access rests with the user.
- Cost scales with successful requests—review the pricing page for agent‑oriented estimates.
AlterLab // Web Data, Simplified.












