All products & tools
LiveDataset

Indian Addresses Raw

4.37M raw Indian address records from MCA & bank branch sources

About

indian-addresses-raw is a 4.37-million-record open dataset of unstructured Indian addresses drawn from MCA corporate filings and bank branch registries. Every record carries source_type, lifecycle_state, declared state/pincode, lat/lon coordinates where available, and full provenance metadata — making it the largest publicly available corpus of raw Indian addresses. It is the upstream source behind the gold-labeled training set and the Qwen3 Indian Address Parser model.

Features

  • 4,370,606 raw address records — largest public Indian address corpus
  • Sources: MCA corporate filings + Indian bank branch registries
  • Per-record source_type, lifecycle_state, and provenance metadata
  • Declared state, pincode, latitude, and longitude fields
  • Cross-linked with indian-addresses-gold and the Qwen3 address parser model

Tags

DatasetIndian AddressesMCANLPOpen DataGeospatial