Is It Safe to Send Proprietary Code to Overseas LLM APIs? What Engineering Leaders Should Know

It depends. Sending code to an overseas LLM API carries risks that range from negligible to significant depending on what's in the code, which jurisdiction the API runs in, and what your organization's obligations are. This post gives you the honest breakdown so you can make an informed decision: grounded in the actual risk, not in fear or in naive assumptions that "it's fine."

The honest answer: it depends

Treating all overseas API calls as dangerous is too blunt a heuristic. Treating them as equivalent to a US-based API call misses real distinctions.

What actually matters:

What data is in your prompts (generic algorithms vs. proprietary business logic vs. secrets)
Which jurisdiction the API runs in and what legal obligations apply there
What the provider's data handling terms say
What your organization's own policies and contractual obligations require

An individual developer using an overseas API to get help with a side project's CRUD endpoint faces a different risk profile than an engineering team at a regulated financial institution sending prompts that contain transaction logic and internal service names. These are the same technical action with very different practical implications.

What actually leaves your machine

Understanding the risk starts with understanding what LLM APIs actually receive:

The prompt text. Everything you type, paste, or have your coding tool inject into the context window. For an AI coding assistant, this typically includes file contents, function bodies, variable names, and business logic.

Metadata. Request headers, timestamps, and in some cases client IP addresses. These have limited sensitivity for most use cases.

Tool call results. If you're running an agentic coding tool, file reads and shell outputs may also flow into the prompt before being sent.

What doesn't leave: Your local file system beyond what's in the active context window, your git history beyond what the tool reads, credentials not in scope. The context window is the boundary: whatever is in that window when the API call is made is what the provider sees.

The practical implication: most of the code you're working on at any given moment will flow through the API. Over a development session, that can be a significant portion of a service or feature.

Jurisdiction and legal exposure

Different countries have different legal frameworks governing what governments can compel companies to disclose about data they process. These frameworks vary in scope, the threshold required for an access request, the degree of judicial oversight involved, and the notification requirements owed to the subject.

This is general legal reality, not a claim about any specific company's behavior. No provider is going to advertise that government access requests affect their platform, and most access requests in any jurisdiction are rare. "Rare" differs from "impossible," though, and for some categories of data (particularly data with national security or economic intelligence value) the legal frameworks in different jurisdictions treat that data very differently.

The operative question is what legal obligations a provider faces if compelled to disclose your data, and whether that framework aligns with your organization's risk tolerance.

Countries differ meaningfully on this. EU law (including GDPR's international transfer restrictions) provides procedural protections that differ materially from those in some other jurisdictions: judicial oversight thresholds, notification obligations, and restrictions on outbound transfers. For your specific situation, consult your legal team; jurisdictional legal questions carry significant nuance.

See our GDPR and LLM data residency guide for the EU-specific framework.

When it's fine, when it isn't

Generally fine:

Open-source code with no proprietary content, where the "damage" from exposure would be zero because the code is already public
Generic algorithm or data structure questions where the code is a simplified illustration, not production IP
Organizations with no regulatory obligations governing data location and no competitive sensitivity in the relevant code
Prototyping and evaluation, where you're testing a capability rather than sending real production code

Worth careful consideration:

Proprietary business logic that represents a competitive advantage
Code adjacent to authentication, authorization, or cryptographic systems
Code that processes regulated data (health records, financial data, personal data of EU subjects)
Anything where your employment agreements, client contracts, or NDAs may restrict where code can be sent for processing
Organizations in regulated industries where compliance requires documented data handling controls

Generally a hard stop without documented controls:

Secrets, API keys, or credentials in scope (this is a hygiene problem first, an API problem second; fix this regardless of where you send code)
Code covered by government or defense contracts with specific data handling requirements
Organizations that have made contractual representations to customers about data processing locations

Reducing exposure without losing capability

You don't have to choose between "use AI coding tools" and "maintain data hygiene." Some practical approaches:

Minimize context. Configure your AI coding tool to limit how much it pulls into the context window. Many tools let you restrict which files are included, or you can manually scope prompts to only the relevant function rather than the whole file.

Sanitize secrets out of context. Environment variables, API keys, and credentials should never appear in code that reaches an LLM prompt. Use .gitignore patterns and IDE configuration to keep these out of scope. This is good practice independent of the API question.

Read the terms. Most major providers have explicit no-training clauses for API usage. Verify this is in the ToS for the API tier you're using, as it sometimes differs between consumer products and the API. Also check whether there are prompt logging policies.

Match tool to task. Not all coding work needs frontier model capability. For internal tooling, documentation, or test generation that doesn't involve sensitive IP, an overseas API may be perfectly acceptable. Reserve the scrutiny for the cases where it's warranted.

Use a provider with contractual data handling commitments. Enterprise tiers from major providers often include DPAs (data processing agreements) that give you enforceable terms, not just a ToS you agreed to by clicking.

See our data sovereignty for AI coding tools post for more on structuring this systematically.

A middle path

For teams that want to use frontier open-weight models but prefer Western inference, managed inference services that run open models on Western infrastructure are the practical middle path.

Models like GLM-5.2 (from Z.ai) and Kimi K2.7 Code (from Moonshot AI) are open-weight, with their parameters publicly available. That means a provider other than the original developer can serve them from Western infrastructure at the same model quality, with a different inference location.

Sota does exactly this: GLM-5.2 and Kimi K2.7 Code run on Cloudflare's network in the US, UK, Germany, Japan, and Australia. Prompts go to Cloudflare's infrastructure, not to Z.ai's or Moonshot's native servers. For teams that want the capability profile of these models without overseas inference, this is the practical solution.

Teams that require complete air-gapped inference on their own hardware will need a different solution. For the majority of teams where the concern is jurisdiction rather than absolute isolation, inference on Cloudflare's Western network provides a clear, verifiable answer to "where does my code go?"

For a full look at open models on Western infrastructure, see our post on open models and Western infrastructure choices.

Get started with Sota to run GLM-5.2 and Kimi K2.7 Code on Cloudflare's Western network (US, UK, Germany, Japan, and Australia) without any changes to your Claude Code workflow.