How It Works

From raw Redfin listings to enriched real estate datasets in five steps. Scrape, geocode, deduplicate, enrich, deliver.

01
Redfin search results and detail pages

Scrape Property Listings

The scraper collects property listings from Redfin based on your search parameters — location, price range, property type, listing status. Both search result summaries and full detail pages are supported, including price history, tax history, and property descriptions.

  • Configurable search filters (location, price, beds, baths, sqft, property type)
  • Search mode for listing summaries or Details mode for full property pages
  • Automatic pagination across all result pages
  • Built-in rate limiting and retry logic for reliability
Output: Raw property listings with address, price, beds, baths, sqft, and listing metadata
02
Geocodio geocoding + USPS Pub 28 standardization

Geocode and Normalize Addresses

Every address is geocoded to exact coordinates and normalized to USPS Publication 28 federal postal standards. Geocoding also returns the census tract, county FIPS code, and CBSA metro code — the geographic keys used to join government data in later steps.

  • Geocodio API returns lat/lng, census tract (11-digit GEOID), county FIPS, and CBSA
  • Addresses standardized to USPS Pub 28 format (abbreviations, directionals, unit designators)
  • Canonical address keys generated for reliable cross-dataset matching
  • Placekey POI identifiers assigned for entity resolution
Output: Geocoded coordinates, census tract, FIPS code, CBSA code, normalized address, Placekey
03
4-layer deduplication pipeline

Deduplicate Records

Properties are deduplicated using four complementary methods to ensure every record in the output is unique. This matters when scraping overlapping search areas or combining data from multiple sources in Premium mode.

  • Layer 1: Canonical key fingerprinting from normalized address components
  • Layer 2: Fuzzy string matching for address variations and typos
  • Layer 3: Spatial deduplication using coordinate proximity
  • Layer 4: Cross-source merge for Premium multi-platform records
Output: Deduplicated property records with zero duplicates guaranteed
04
48 fields from 13 government APIs

Enrich with Government Data

Each property record is automatically enriched with data from 13 government APIs. The geographic keys from Step 2 (census tract, FIPS, CBSA, ZIP, coordinates) are used to pull demographics, crime, flood risk, walkability, housing market trends, employment data, and more.

  • Census ACS: demographics, income, vacancy, age — by census tract
  • FBI Crime + FEMA: crime rates, flood zones, disaster history — by county and coordinates
  • Walk Score: walkability, transit, bike scores — by exact coordinates
  • HUD + FHFA + FRED + CFPB: rents, home prices, mortgage rates, loan data — by county and metro
  • BLS + IRS: employment, wages, income, migration — by county, metro, and ZIP
  • EPA: brownfield/superfund proximity — by coordinates
Output: 48 enrichment fields with per-field confidence scoring and granularity metadata
05
Clean, documented output ready to use

Deliver Structured JSON

The final output is a structured JSON dataset where every record contains the original listing data plus all enrichment fields. Each field includes provenance metadata so you know exactly where the data came from and at what geographic resolution.

  • Consistent JSON schema across all records
  • Per-field confidence tiers (high, medium, low) and value provenance
  • Completeness score indicating what percentage of enrichment fields are populated
  • Graceful degradation — if any single API is unavailable, other fields still populate
Output: Production-ready JSON dataset with property data + 48 enrichment fields

Technical Specifications

Runtime Apify platform (Node.js)
Output formats JSON, CSV, Excel
Deduplication 4-layer (fingerprint, fuzzy, spatial, cross-source)
Address standard USPS Publication 28
Geocoding Geocodio (2,500/day)
APIs called 13 government data sources
Enrichment fields 48 (35 data + 13 granularity metadata)
Confidence scoring Per-field with value provenance

Ready to try it?

Run your first real estate data scrape on Apify. Pay per result, no subscriptions or commitments.