LiveDataset

Indian Addresses Gold

4,834 span-labeled Indian address records — the training data behind the model

Visit Indian Addresses Gold Get in touch

About

indian-addresses-gold is the gold-labeled training corpus behind the Qwen3-0.6B Indian Address Parser. Each of the 4,834 records contains 13 flat structured fields (house number, building, street, locality, sub-district, district, city, state, pincode, and more), canonical span-offset annotations for each field, and reviewer/review_state provenance tracking human vs. LLM review. The dataset was created by selecting and labeling a representative sample from the 4.37M-record raw corpus and is fully cross-linked with both the raw dataset and the model card on HuggingFace.

Features

4,834 span-labeled records with 13 structured address fields
Canonical span-offset format for each field extraction
Reviewer & review_state provenance (human vs. LLM reviewed)
Training/validation/test splits used to fine-tune the Qwen3 parser
Cross-linked with indian-addresses-raw and the Qwen3 model card

Indian Addresses Gold

About

Features

Tags