Documentation Index
Fetch the complete documentation index at: https://docs.lobstr.io/llms.txt
Use this file to discover all available pages before exploring further.
This guide walks through a complete workflow: pick a crawler, configure a squid, add tasks, run it, and download results — all via the API.
Prerequisites
You’ll need an API key. Find it in your lobstr.io dashboard under API in the sidebar.
Set it as an environment variable to use in the examples below:
export LOBSTR_API_KEY="your_api_key_here"
Step 1: Verify your credentials
Confirm your key is working before proceeding.
import requests,os
API_KEY = os.environ["LOBSTR_API_KEY"]
headers = {"Authorization": f"Token {API_KEY}"}
response = requests.get("https://api.lobstr.io/v1/me", headers=headers)
user = response.json()
print(f"Logged in as: {user['first_name']} {user['last_name']} ({user['email']})")
Step 2: Find a crawler
Crawlers define what site you’re scraping. List available crawlers and pick the one you need.
response = requests.get("https://api.lobstr.io/v1/crawlers", headers=headers)
crawlers = response.json()
for crawler in crawlers:
print(f"{crawler['id']} {crawler['name']}")
Note the id of the crawler you want to use. For example, the Google Maps Reviews crawler.
Step 3: Create a squid
A squid is your configured scraping project — it ties together a crawler, your settings, and your tasks.
payload = {
"name": "My first squid",
"crawler": "CRAWLER_ID" # from Step 2
}
response = requests.post(
"https://api.lobstr.io/v1/squids",
headers={**headers, "Content-Type": "application/json"},
json=payload
)
squid = response.json()
squid_id = squid["id"]
print(f"Squid created: {squid_id}")
Step 4: Add tasks
Tasks tell the squid what to scrape — typically URLs or search queries. The accepted keys depend on the crawler (use Get Crawler Parameters to check).
payload = {
"squid": squid_id,
"tasks": [
{"url": "https://maps.google.com/?cid=1234567890"},
{"url": "https://maps.google.com/?cid=0987654321"}
]
}
response = requests.post(
"https://api.lobstr.io/v1/tasks",
headers={**headers, "Content-Type": "application/json"},
json=payload
)
result = response.json()
print(f"Added {len(result['tasks'])} tasks ({result['duplicated_count']} duplicates skipped)")
Step 5: Start a run
A run executes all pending tasks in the squid.
payload = {"squid": squid_id}
response = requests.post(
"https://api.lobstr.io/v1/runs",
headers={**headers, "Content-Type": "application/json"},
json=payload
)
run = response.json()
run_id = run["id"]
print(f"Run started: {run_id}")
Step 6: Poll until complete
Check the run status periodically until it reaches a terminal state.
import time
terminal_statuses = {"done", "aborted", "error"}
while True:
response = requests.get(f"https://api.lobstr.io/v1/runs/{run_id}", headers=headers)
run = response.json()
status = run["status"]
print(f"Status: {status} — {run['total_results']} results so far")
if status in terminal_statuses:
print(f"Run finished: {run['done_reason']}")
break
time.sleep(10)
Typical runs complete in seconds to a few minutes depending on task count and
concurrency. Avoid polling more frequently than every 5 seconds.
Step 7: Download results
Once the run is done, fetch your data.
response = requests.get(
"https://api.lobstr.io/v1/results",
headers=headers,
params={"squid": squid_id, "limit": 100, "page": 1}
)
data = response.json()
print(f"Total results: {data['total_results']}")
for row in data["data"]:
print(row)
For large datasets, iterate through pages using the page parameter. See the Pagination guide for details.
Complete example
import os, time, requests
API_KEY = os.environ["LOBSTR_API_KEY"]
CRAWLER_ID = "YOUR_CRAWLER_ID"
headers = {"Authorization": f"Token {API_KEY}"}
json_headers = {\*\*headers, "Content-Type": "application/json"}
# Create squid
squid = requests.post(
"https://api.lobstr.io/v1/squids",
headers=json_headers,
json={"name": "Quickstart squid", "crawler": CRAWLER_ID}
).json()
squid_id = squid["id"]
# Add tasks
requests.post(
"https://api.lobstr.io/v1/tasks",
headers=json_headers,
json={"squid": squid_id, "tasks": [{"url": "https://example.com"}]}
)
# Start run
run_id = requests.post(
"https://api.lobstr.io/v1/runs",
headers=json_headers,
json={"squid": squid_id}
).json()["id"]
# Poll until done
while True:
run = requests.get(f"https://api.lobstr.io/v1/runs/{run_id}", headers=headers).json()
if run["status"] in {"done", "aborted", "error"}:
break
time.sleep(10)
# Fetch results
results = requests.get(
"https://api.lobstr.io/v1/results",
headers=headers,
params={"squid": squid_id, "limit": 100, "page": 1}
).json()
print(f"Done — {results['total_results']} results collected")