Used Models
By default, Sonar Legal uses LLMs from four trusted providers: Microsoft Azure, OpenAI, Anthropic, and Google. This offers our users the flexibility to choose or exclude specific models based on their comfort and compliance needs.
Alternatively, Sonar Legal can optionally integrate with OpenRouter, which will automatically route your request to an alternative provider (such as Anthropic or Google API) if your primary provider is unavailable.
Below are the confidentiality guarantees from each of our four providers:
Microsoft Azure
Microsoft explicitly states that it does not use your data to train or improve its models. Data submitted to Azure OpenAI Service remains confidential and is used solely for generating the requested output.
“Your prompts (inputs) and completions (outputs), your embeddings, and your training data:
- are NOT available to other customers.
- are NOT available to OpenAI.
- are NOT used to improve OpenAI models.
- are NOT used to train, retrain, or improve Azure OpenAI Service foundation models.
- are NOT used to improve any Microsoft or 3rd party products or services without your permission or instruction.
- Your fine-tuned Azure OpenAI models are available exclusively for your use.
The Azure OpenAI Service is operated by Microsoft as an Azure service; Microsoft hosts the OpenAI models in Microsoft's Azure environment and the Service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).”
— Microsoft Azure Privacy Documentation (Data, privacy, and security for Azure OpenAI Service)
OpenAI
Despite early misconceptions stemming from the initial launch of ChatGPT (when many legal professionals primarily interacted with the web app), OpenAI has since updated its practices. By default, OpenAI no longer uses API inputs to train or improve its models. Your data is now processed solely to generate the requested responses, and robust safeguards ensure that it is not retained or repurposed for training purposes.
Many of the old concerns originated during the early days of ChatGPT, when users experienced different behaviors between the web app and the API. Although some lawyers still rely heavily on the web version (where data handling can differ), it is important to note that these worries are largely outdated. OpenAI now limits any model improvement learning to aggregated, anonymized usage patterns—similar to how other major tech companies refine their services based on broad user behavior. For example, rather than storing specific text inputs, the model uses overall patterns such as query reformulations or click-throughs to enhance performance.
“You own and control your data
- We do not train our models on your business data by default
- You own your inputs and outputs (where allowed by law)
- You control how long your data is retained (ChatGPT Enterprise)”
— OpenAI Enterprise Privacy Documentation (Enterprise privacy at OpenAI)
Google Vertex AI
Google establishes unambiguous data rights for its users, explicitely stating that it separates customer's data from the broader Google or LLM training corpus.
“Essential Commitments
- Your data is your data. The data or content generated by a Generative AI Service prompted by Customer Data (“Generated Output”) is considered Customer Data that Google only processes according to customer's instructions. We continue to maintain that customers control their data and we process it according to the agreement(s) we have with each customer.
- Your data does not train our models. We recognize that customers want their data to be private and not be shared with the broader Google or Large Language Model training corpus. We do not use data that you provide us to train our own models without your permission.
- We provide enterprise-grade privacy and security. We provide Cloud AI offerings such as Vertex AI and foundational models with enterprise-grade safety, security, and privacy baked in from the beginning.”
— Google Cloud Privacy Documentation (Generative AI, Privacy, and Google Cloud)
Anthropic
Anthropic’s policies ensure that customer data is not used for model training without explicit consent. Anthropic implements an opt-in model for data usage, with contractual prohibitions against unauthorized training.
“Customer Content. As between the parties and to the extent permitted by applicable law, Anthropic agrees that Customer owns all Outputs, and disclaims any rights it receives to the Customer Content under these Terms. Anthropic does not anticipate obtaining any rights in Customer Content under these Terms. Subject to Customer’s compliance with these Terms, Anthropic hereby assigns to Customer its right, title and interest (if any) in and to Outputs. Anthropic may not train models on Customer Content from paid Services.”
— Anthropic Terms of Service (Commercial Terms of Service)
OpenRouter (Optional Fallback)
For maximum availability, Sonar Legal can optionally integrate with OpenRouter. In this configuration, if your primary provider (e.g., Azure OpenAI Service) is unavailable, OpenRouter automatically routes your request to an alternative provider (such as Anthropic or Google API). OpenRouter itself does not store any data; it simply passes along your request in accordance with the terms of the underlying providers.
Sonar Legal has opted out of logging prompts and completions, which are used to improve anonymous analytics features like classification. We have also taken the extra measure of disabling any underlying OpenRouter provider that does not explicitly state that it does not use customer data for training or improvement of its models.
“Users have the ability to opt out of logging prompts and completions, which are used to improve anonymous analytics features like classification.”
— OpenRouter Privacy Policy (OpenRouter Privacy Policy)