Web Scraping

Extract Clean Data
from
Any URL

JavaScript Rendered Markdown Output Language Control URL Discovery

# pip install supacrawlx
from supacrawlx import Client

client = Client("YOUR_API_KEY")
result = client.web.scrape(
    url="https://example.com/article",
    no_links=False,   # optional: exclude markdown links
    lang="en"         # optional: preferred language (ISO 639-1)
)
print(result.name)
print(result.content)
print(result.og_url)
print(result.count_characters)
print(result.urls)   # list of URLs found on the page

Features

What Makes This API Special

JavaScript Rendering

Full Chromium rendering means you get the complete DOM — including content loaded by React, Vue, and other JS frameworks.

Clean Markdown Output

Boilerplate, ads, and nav are stripped automatically. You get just the article or page content in clean markdown.

Link Control (noLinks)

Pass noLinks=true to strip all hyperlinks from the markdown response. Useful when you only need plain text content without clickable references.

Language Selection (lang)

Request page content in a specific language using ISO 639-1 codes (e.g. en, de, fr). Works on sites that support multiple languages via Accept-Language headers.

Open Graph & Metadata

Every response includes the page title (name), description, ogUrl (Open Graph URL), and countCharacters — ready for indexing or display.

URL Discovery

The urls[] array in every response lists all links found on the page. Use it to plan downstream crawls or map a site's internal link graph.

Use Cases

Platform-Specific Workflows

AI Training Data

Build web-scale text datasets for fine-tuning LLMs. Extract clean article text from thousands of URLs with a single script.

DatasetFine-tuningScale

Competitive Intelligence

Monitor competitor pricing pages, job boards, and product updates without fragile custom scrapers.

MonitoringBatchPricing

RAG Grounding

Feed up-to-date web content into your RAG pipeline for answers grounded in current information.

RAGLangChainReal-time

FAQs

Web Scraping API Questions

What does the noLinks parameter do?

When noLinks is true, all markdown hyperlinks are removed from the content response. You receive plain text only — no [text](url) syntax. Useful for LLM input where links are noise.

How does the lang parameter work?

Pass a ISO 639-1 language code (e.g. en, de, fr, ja) to request the page content in that language. This sets the Accept-Language header on the request. Only works on sites that support content negotiation.

What is included in the response?

Every scrape response includes: url, content (markdown), name (page title), description (meta description), ogUrl (Open Graph URL), countCharacters (character count of content), and urls[] (all links found on the page).

Is this legal?

Scraping publicly accessible data is generally legal. Always respect robots.txt and terms of service. You are responsible for compliance with the target site's rules.

How fast is it?

Simple pages return in 1–3 seconds. JavaScript-heavy SPAs may take 5–8 seconds for full render. Cached pages are sub-second.

What does each scrape cost?

1 scrape request = 1 credit. Available on all plans.

Ready to Build Something Extraordinary?

Start with 100 free requests. No credit card. No setup fee. Ship your first AI-powered feature today.

Start Building Free View Pricing

Extract Clean DatafromAny URL