Chatbot with Privacy

Real-World Case Studies & Examples

It’s helpful to look at successful implementations or research that demonstrate privacy technologies in chatbot or AI assistants, to validate our approach:

  • Google’s Gboard (Federated Learning + DP): While not a chatbot, Google’s keyboard app is a prime example of PETs at scale. Google used federated learning combined with differential privacy to train Gboard’s next-word prediction models on user typing data without ever uploading raw keystrokes . They deployed “more than twenty Gboard language models” with formal privacy guarantees, ensuring that the trained model will not memorize specific users’ data . This showcases that large-scale language models can be trained on sensitive data (user input) in a privacy preserving way and still perform well. It’s a proof that accuracy and privacy can coexist when techniques like federated learning and DP are cleverly applied.


  • FedBot – Privacy-Preserving Chatbot via Federated Learning: FedBot is a research prototype (2023) that specifically explored an AI customer support chatbot trained in a privacy-preserving manner. The authors combined federated learning with transformer models to train on customer service data across clients, rather than pooling all data centrally . They note that traditional training would violate user privacy and data regulations, so they kept data on-device. The result was a chatbot that could “deliver personalized and efficient service that meets data privacy regulations” .

    This case study reinforces that our HR bot could likewise train on decentralized HR data if needed, and still improve over time, all while complying with privacy laws.

  • Amazon & Skyflow: Multi-Tenant Privacy-Safe Chatbot: In 2024, AWS published an architecture for a privacy-preserving chatbot using Amazon Bedrock (LLM) and Skyflow’s data privacy vault. In their design, each tenant’s (or each department’s) data goes through de-identification in the vault before being used by the AI, and the system was able to answer questions without exposing actual PII across tenants . Figure 3 in their post shows a multi-tenant Q&A system where the vault “serves as a privacy gateway, de-identifying sensitive data during the retrieval-augmented generation

    pipeline.” This real-life architecture is very aligned with what we’ve proposed: it proved that using a vault/tokenization, one can build a powerful AI assistant that is compliant even in scenarios with strict data isolation needs. While their example was multi-tenant, the same applies to our singleorganization HR bot – it ensures any PII goes through a vault, significantly reducing risk.

  • PrivateGPT (Open Source Project): The PrivateGPT project on GitHub became popular as an example of an AI assistant that runs entirely locally on your data . It lets users ask questions to their documents without any Internet connection, meaning no data ever leaves the local machine. Many hobbyists and professionals tried it out to confirm that one can use LLMs in a completely isolated way, albeit with some limitations on speed or model size. PrivateGPT’s success (it was trending on GitHub) demonstrates demand and feasibility for privacy-focused AI. For our use-case, it’s a great proof that even if we don’t use big cloud APIs, an in-house solution can be “good enough” and absolutely no data leaks. Its repository and others like it provide reference implementations for embedding documents, doing local inference, etc., which we can draw on if we go the opensource route.

  • Microsoft Presidio in Practice: Presidio, the PII anonymization framework we mentioned, has been used in various organizations to safeguard data. One public example is a demonstration of using Presidio to scrub sensitive info before feeding prompts to OpenAI models . The approach of a centralized proxy for PII removal has been vetted by practitioners to enforce compliance across all applications using an LLM API . This pattern is reflected in our plan and is proven to reduce the risk of an app accidentally sending credit card numbers or personal data to an AI. The Presidio project itself is actively maintained on GitHub (Microsoft’s repo), showing real-world commitment to

    integrating PETs with AI systems.

  • Homomorphic Encryption for AI (Emerging): While not common in deployed chatbots yet, there have been pilot projects using Fully Homomorphic Encryption (FHE) to allow AI models to operate on encrypted data. For example, researchers have used FHE to run simple machine learning inferences on sensitive medical data without decrypting it . Companies like IBM have an FHE toolkit for AI. In a chatbot context, one could envision encrypting a query such that even the server processing it can’t read it – only the result is decrypted for the user. This is bleeding-edge (and currently very slow for large models), but it’s an area to watch. The fact that FHE is being tried in AI shows the lengths we can go for privacy. In a few years, perhaps an HR chatbot could use FHE to consult a model in the cloud with mathematically guaranteed secrecy of the inputs.


  • Each of these cases – whether deployed products or research prototypes – validates one of the PET approaches we’ve included: federated learning and DP (Google, FedBot), vault-based anonymization (AWS/ Skyflow), on-prem isolation (PrivateGPT), and advanced cryptography (FHE research). They give confidence that our strategy is not just theoretical; it’s backed by successful implementations. We’ve also included links

    to repositories and references so you can explore these solutions: - Google Gboard with DP: Research paper on federated learning with DP - FedBot paper: ArXiv preprint demonstrating federated chatbot training - AWS & Skyflow blog: Architecture diagrams and discussion - PrivateGPT repo: GitHub code for local private chatbot - Microsoft Presidio: GitHub and docs for anonymization toolkit Exploring these resources will provide deeper insight into how to implement and fine-tune our HR chatbot with privacy enhancements.

Data. AI. Privacy

Navigating Data & AI with Privacy-by-Design

Organisations

Interim


© 2021-2025 YUL B.V.

| KvK: 90633334