All products & tools
LiveDataset
Indian Addresses Gold
4,834 span-labeled Indian address records — the training data behind the model
About
indian-addresses-gold is the gold-labeled training corpus behind the Qwen3-0.6B Indian Address Parser. Each of the 4,834 records contains 13 flat structured fields (house number, building, street, locality, sub-district, district, city, state, pincode, and more), canonical span-offset annotations for each field, and reviewer/review_state provenance tracking human vs. LLM review. The dataset was created by selecting and labeling a representative sample from the 4.37M-record raw corpus and is fully cross-linked with both the raw dataset and the model card on HuggingFace.
Features
- 4,834 span-labeled records with 13 structured address fields
- Canonical span-offset format for each field extraction
- Reviewer & review_state provenance (human vs. LLM reviewed)
- Training/validation/test splits used to fine-tune the Qwen3 parser
- Cross-linked with indian-addresses-raw and the Qwen3 model card
Tags
DatasetIndian AddressesNLPLabeled DataLoRATraining Data