How It Works
From raw Redfin listings to enriched real estate datasets in five steps. Scrape, geocode, deduplicate, enrich, deliver.
Scrape Property Listings
The scraper collects property listings from Redfin based on your search parameters — location, price range, property type, listing status. Both search result summaries and full detail pages are supported, including price history, tax history, and property descriptions.
- Configurable search filters (location, price, beds, baths, sqft, property type)
- Search mode for listing summaries or Details mode for full property pages
- Automatic pagination across all result pages
- Built-in rate limiting and retry logic for reliability
Geocode and Normalize Addresses
Every address is geocoded to exact coordinates and normalized to USPS Publication 28 federal postal standards. Geocoding also returns the census tract, county FIPS code, and CBSA metro code — the geographic keys used to join government data in later steps.
- Geocodio API returns lat/lng, census tract (11-digit GEOID), county FIPS, and CBSA
- Addresses standardized to USPS Pub 28 format (abbreviations, directionals, unit designators)
- Canonical address keys generated for reliable cross-dataset matching
- Placekey POI identifiers assigned for entity resolution
Deduplicate Records
Properties are deduplicated using four complementary methods to ensure every record in the output is unique. This matters when scraping overlapping search areas or combining data from multiple sources in Premium mode.
- Layer 1: Canonical key fingerprinting from normalized address components
- Layer 2: Fuzzy string matching for address variations and typos
- Layer 3: Spatial deduplication using coordinate proximity
- Layer 4: Cross-source merge for Premium multi-platform records
Enrich with Government Data
Each property record is automatically enriched with data from 13 government APIs. The geographic keys from Step 2 (census tract, FIPS, CBSA, ZIP, coordinates) are used to pull demographics, crime, flood risk, walkability, housing market trends, employment data, and more.
- Census ACS: demographics, income, vacancy, age — by census tract
- FBI Crime + FEMA: crime rates, flood zones, disaster history — by county and coordinates
- Walk Score: walkability, transit, bike scores — by exact coordinates
- HUD + FHFA + FRED + CFPB: rents, home prices, mortgage rates, loan data — by county and metro
- BLS + IRS: employment, wages, income, migration — by county, metro, and ZIP
- EPA: brownfield/superfund proximity — by coordinates
Deliver Structured JSON
The final output is a structured JSON dataset where every record contains the original listing data plus all enrichment fields. Each field includes provenance metadata so you know exactly where the data came from and at what geographic resolution.
- Consistent JSON schema across all records
- Per-field confidence tiers (high, medium, low) and value provenance
- Completeness score indicating what percentage of enrichment fields are populated
- Graceful degradation — if any single API is unavailable, other fields still populate
Technical Specifications
Ready to try it?
Run your first real estate data scrape on Apify. Pay per result, no subscriptions or commitments.