- Use Microsoft Presidio to replace examples of person names and telephone or account-a-like numbers in the dataset
- Create an unlabelled workspace from them
If you check
./data/abcd_unlabelled05.json you will see the anonymization of a sample of utterances.
Anonymization is expensive in CPU and Memory (utilises spacy large language model) running the script on the full set will take a while.
Check out the uploading data exercise Load Data for how to upload your data.