All products & tools
LiveDataset

Indian Addresses Gold

4,834 span-labeled Indian address records — the training data behind the model

About

indian-addresses-gold is the gold-labeled training corpus behind the Qwen3-0.6B Indian Address Parser. Each of the 4,834 records contains 13 flat structured fields (house number, building, street, locality, sub-district, district, city, state, pincode, and more), canonical span-offset annotations for each field, and reviewer/review_state provenance tracking human vs. LLM review. The dataset was created by selecting and labeling a representative sample from the 4.37M-record raw corpus and is fully cross-linked with both the raw dataset and the model card on HuggingFace.

Features

  • 4,834 span-labeled records with 13 structured address fields
  • Canonical span-offset format for each field extraction
  • Reviewer & review_state provenance (human vs. LLM reviewed)
  • Training/validation/test splits used to fine-tune the Qwen3 parser
  • Cross-linked with indian-addresses-raw and the Qwen3 model card

Tags

DatasetIndian AddressesNLPLabeled DataLoRATraining Data