Why Private Internal Search is Safer Than Public Chatbots for Drafting and Research

The Architecture Is the Argument

There is a category error embedded in how most professional services firms have absorbed generative AI. The error is treating a public chatbot — ChatGPT, Gemini, Claude.ai, the consumer Copilot at copilot.microsoft.com — as if it were a search engine, a word processor, or a research database. It is none of those things. It is a third party to which your staff is submitting communications, files, and questions in a contractually defined relationship governed by that vendor's terms of service.

Every prompt is a transmission. Every uploaded document is a disclosure. The transmission goes to the vendor, may be routed through its model providers, infrastructure hosts, and (in some cases) contracted human reviewers, and is retained according to the vendor's policies — which the vendor can change, and which a court can override.

Internal search is a different architecture entirely. A properly configured internal search system runs over your firm's own document corpus, on infrastructure you control, returning answers and excerpts drawn from your own files. The prompt does not leave the system. The documents are not transmitted to a third party. There is no subprocessor chain. There is no terms-of-service negotiation. There is nothing for a court to order the production of, because there is no third-party log to begin with.

This advisory is not an argument that public chatbots have no place in a professional services firm. It is an argument that the workflows where they are currently being used — drafting, research, summarization, client communication preparation — are the workflows where the architectural mismatch matters most, and where internal search is the more defensible default.

Risk Exposure

A prompt sent to a public chatbot is not equivalent to a Google search. It is a contractually submitted communication to a third-party processor. Under the terms of service of every major consumer AI product, the prompt and any attachments become data the vendor processes, may retain, and — in some products and tiers — may use to improve its models.

If the prompt contains client information, that information has left the firm's control before the staff member has clicked send a second time.

Five Ways a Prompt Becomes a Disclosure

The following are routine workflows observed in professional services firms in 2025 and 2026. Each describes a real pattern of use, not a hypothetical worst case. In each, the staff member is acting in good faith and in a way the firm has not explicitly prohibited.

The Associate Pastes the Opposing Brief

A junior associate working on a response brief pastes the opposing party's filed brief into a public chatbot and asks for a summary of the strongest arguments and a draft of three counter-positions. The brief is a public court filing; the prompt history, however, now contains the name of the case, the firm's draft strategy, and the associate's working notes — all submitted to a third party with no engagement-letter authorization.

case name, draft strategy, work product

The CPA Uploads the Return

A tax preparer uploads a draft return — or a portion of it — into a chatbot and asks for explanation of a deduction the client is taking, or for plain-English language to explain the result. The return contains the client's name, address, identifying numbers, dependents, and financial profile. The chatbot now has all of it. The engagement letter the client signed authorized data sharing with named subprocessors of the tax software. The chatbot was not on that list.

client identity, financial profile, identifiers, dependents

The Wealth Advisor Pastes the Statement

A wealth advisor preparing for a client meeting pastes a custodian statement into a chatbot and asks for a summary of asset allocation, performance, and notable concentrations. The statement contains the client's name, account number, balances, holdings, and the firm's branding. The advisor receives a clean summary in seconds. The chatbot vendor now has a non-public portfolio of an identified high-net-worth household. Under SEC Reg S-P as amended, this category of customer information triggers specific safeguarding obligations the advisor's firm has just allowed to be processed externally without notice.

non-public personal information, account numbers, holdings

The Paralegal Researches the Active Matter

A paralegal asks a public chatbot for a summary of case law on a specific issue arising in an active matter. To get a useful answer, they describe the facts — the parties, the jurisdiction, the procedural posture, and the issue. The facts are now stored against an account that is not subject to any data processing agreement the firm has negotiated. The query is later cited in unrelated litigation against the chatbot vendor as discoverable electronically stored information.

matter facts, parties, jurisdiction, work product

The Browser Sidebar Reads Everything

Several major browsers now ship with integrated AI sidebars — Edge's Copilot, Chrome's Gemini integration, third-party AI extensions installed by individual staff. When enabled, these features can read the contents of the current tab to provide summaries and answer questions. A staff member with a sidebar enabled who opens a client document in a web-based portal has, by default, allowed an AI vendor to read that document. No prompt was sent. The user took no action. The exposure happened because the feature was on.

portal contents, client files, browser context

What January 5, 2026 Should Have Taught the Profession

On January 5, 2026, U.S. District Judge Sidney H. Stein of the Southern District of New York affirmed a magistrate judge's order compelling OpenAI to produce twenty million de-identified ChatGPT conversation logs in the consolidated copyright litigation brought by The New York Times, the Chicago Tribune, and other news organizations. The plaintiffs had originally requested 120 million logs. OpenAI offered twenty million as a sample, representing roughly half of one percent of the tens of billions of consumer logs it had preserved under earlier court order.

The legal reasoning is the part professional services firms need to understand. OpenAI argued that producing the logs would violate user privacy. The court rejected that argument, distinguishing the case from prior precedent involving wiretapped communications. The court's reasoning was direct: ChatGPT users, unlike the subjects of a wiretap, voluntarily submitted their communications to the platform. The privacy interest of a voluntary submitter to a third-party service does not defeat the relevance of those communications in litigation against the service.

The implications go well beyond the four corners of the OpenAI case. The decision establishes that user-submitted prompts to public AI services are discoverable electronically stored information that the AI vendor can be compelled to produce. The conversations are the vendor's records. The users are voluntary submitters. The litigation triggering production need not involve any of the users whose data is produced.

The earlier indefinite preservation order in the same matter, in effect from April through September 2025, ended on September 26, 2025, and OpenAI has returned to its standard thirty-day deletion policy for new consumer ChatGPT and standard API content. But the logs preserved during that window — and a specific subset of historical April-through-September 2025 data the plaintiffs continue to demand — remain held. ChatGPT Enterprise accounts and customers using the Zero Data Retention API were excluded from preservation; consumer Free, Plus, Pro, and Team accounts were not.

Risk Exposure

Every prompt sent to a consumer-tier public chatbot between April and September 2025 is, as of this writing, retained on the vendor's systems, regardless of whether the user "deleted" the conversation. New prompts are subject to a thirty-day default retention. Both windows are subject to extension at any time by future litigation against the vendor, in any subject matter, with no notice to the user.

A firm whose staff used consumer-tier public chatbots in 2025 cannot represent to a client, a regulator, or a court that the prompts containing the client's data have been deleted. The retention status of those prompts is no longer in the firm's hands. It is in the hands of a federal docket.

Why This Matters For Your Practice

The architectural risk lands differently across the three audiences this advisory serves. Each has a regulatory or ethical framework that intersects public chatbot use directly.

For Law Firms

ABA Op. 512 The American Bar Association's Formal Opinion 512, issued in July 2024, requires lawyers using generative AI tools to obtain informed client consent before inputting client information into a tool that may retain or use that information, and to evaluate the tool's confidentiality protections. Public consumer chatbots, by their terms of service architecture, generally do not satisfy the confidentiality posture the opinion contemplates.
Confidentiality Model Rule 1.6 governs confidential information acquired during representation. A prompt that includes client identifying detail and matter facts, submitted to a third party with whom the firm has no confidentiality agreement, is a disclosure within the meaning of the rule unless an exception applies.
Supervision Model Rules 5.1 and 5.3 impose supervisory obligations on partners and managing lawyers for the conduct of subordinate lawyers and non-lawyer staff. A firm whose written AI policy is silent, ambiguous, or unenforced is supervising in name only.
Discovery Posture Prompts submitted to public chatbots are increasingly being treated by courts as discoverable electronically stored information. A firm that cannot identify what its staff has submitted to which vendors cannot answer a preservation letter accurately.

For CPA Firms and Tax Practices

IRS Pub. 4557 Tax preparers handling taxpayer information are subject to safeguarding requirements that include written information security plans and controlled disclosure of taxpayer data. A staff member pasting return data into a public chatbot is a disclosure outside the plan, unless the plan has authorized the specific tool and the engagement letter has informed the client.
FTC Safeguards The FTC Safeguards Rule, applicable to many tax and accounting practices as financial institutions under the GLBA definition, requires a written information security program covering service providers. A public chatbot to which staff submit client data is functioning as an unwritten service provider — by definition outside the program.
State Boards State accountancy boards have begun to take notice. The pattern of regulatory attention parallels the bar's: written policies, informed consent, and supervisory accountability will be the framework, and ignorance of staff conduct will not satisfy any of them.

For Wealth Advisors and RIAs

Reg S-P The SEC's amendments to Regulation S-P, with rolling compliance dates through 2026, require covered firms to maintain written policies for the safeguarding of customer information, to oversee service providers, and to notify affected customers within thirty days of a determination that unauthorized access has likely occurred. Public chatbots used to process customer information are service providers the firm has not formally engaged — which makes the oversight obligation impossible to satisfy.
FINRA FINRA has issued guidance treating generative AI tools as subject to the same supervisory and recordkeeping rules as any other technology used in connection with a member firm's business. Prompts containing customer information are records the firm cannot produce, supervised in a workflow the firm has not approved.
Fiduciary Duty For advisors operating under a fiduciary standard, the duty of care extends to the firm's handling of client information. Allowing client portfolio data to flow into a public AI vendor with no contractual relationship and no oversight is difficult to defend as care.

Risk Exposure

In every one of these regulatory frames, the controlling document is the same: a written policy, supervised in practice, that identifies which AI tools are authorized, what data may be submitted to them, and how the firm will document compliance. The absence of that policy — or its presence on paper without enforcement — is the exposure.

What Internal Search Actually Is, And What It Isn't

Internal search, in the sense this advisory uses the term, is a system that allows a user to ask natural-language questions and receive answers drawn from the firm's own document corpus — engagement files, work product, research notes, internal policies — using a retrieval architecture that keeps the firm's data on infrastructure the firm controls. The underlying technique is often called retrieval-augmented generation, though the term matters less than the architecture.

The architectural commitments that make internal search defensible are specific and worth stating plainly:

No External Prompt When a user submits a query, the query and the retrieved document excerpts are processed on infrastructure the firm controls — on-premises hardware, a private cloud tenancy with strict configuration, or an air-gapped environment for highly sensitive work. The prompt is not transmitted to a public AI vendor.
No Training Use The firm's documents are used to answer the firm's questions. They are not used to train a vendor's model. There is no terms-of-service ambiguity to manage, because the firm is not party to a consumer terms-of-service relationship for this workflow.
No Hidden Subprocessors An internal search system has a known and finite set of components, all of which the firm has chosen and can audit. There is no chain of model providers, infrastructure hosts, and reviewer contractors to map after the fact.
Documented Access Queries, retrieved documents, and answers are logged in a system the firm administers. The audit trail required by the FTC Safeguards Rule, Reg S-P, and the bar's supervisory rules is generated as a byproduct of normal use, not as a reconstruction after a question is raised.
Bounded Corpus The system can only retrieve and reason over documents the firm has placed in its corpus. It cannot invent facts about cases or clients that were not in the source documents to begin with. The hallucination surface is dramatically smaller than that of a general-purpose chatbot.

What internal search is not: it is not a substitute for legal research databases like Westlaw or Lexis, which provide access to authoritative external sources under their own license terms. It is not a substitute for tax research platforms, which are licensed reference works. It is a complement to those tools — a way for the firm to ask questions about its own institutional knowledge, work product, and accumulated expertise without exporting that knowledge to a third party in the process.

It is also worth being precise about enterprise-tier AI products. Microsoft 365 Copilot, ChatGPT Enterprise, Google Workspace's Gemini, and similar enterprise offerings ship with stronger contractual commitments than their consumer counterparts — typically no training on customer data, narrower retention, and a data processing agreement the firm signs. These products substantially close the gap, but they do not eliminate it. The firm's data is still processed on the vendor's infrastructure, still subject to subprocessor chains in the vendor's contract, and still potentially within the reach of legal process directed at the vendor. Enterprise-tier AI is a meaningful improvement over consumer-tier AI for many workflows. It is not equivalent to internal search.

What a Basic Internal Standard Looks Like

The remediation is not "ban AI." That position is neither realistic nor advisable; staff will use the tools regardless, and a flat ban produces shadow usage the firm cannot supervise. The remediation is a written, supervised standard that identifies authorized tools, prohibited inputs, and required documentation.

Conduct an inventory of which AI tools — including browser sidebars and personal accounts used on firm devices — are currently in use across the firm. The inventory will surface workflows leadership did not know existed. That is the point.
Issue a written generative AI use policy that names approved tools, prohibits the submission of client-identifying information to consumer-tier products, and assigns a named owner responsible for reviewing new tools before staff adopt them.
Migrate research, drafting, and summarization workflows that currently use public chatbots into internal search or a contracted enterprise-tier alternative with a signed data processing agreement and a documented subprocessor list.
Update engagement letters and client disclosures to address third-party processing of client information by AI tools, in a form consistent with ABA Opinion 512 for law firms and analogous fiduciary or regulatory standards for accounting and wealth practices.
Establish a quarterly review of the AI tool inventory, the subprocessor lists of authorized vendors, and the firm's policy itself. The technology and the regulatory framework are both moving; an annual review is too slow.

None of these steps requires expensive software. The first three require attention, time, and a person willing to ask staff what they are actually doing. The fourth and fifth require a working relationship between operations and counsel. The cost of doing this work is small relative to the cost of not having done it the first time a client, a regulator, or a court asks.

About OccuNX

OccuNX is a privacy-first systems and risk consultancy. We work with small and mid-sized professional services firms to map data flows, identify vendor exposure, and reduce unnecessary digital risk. We are not an IT company, a software vendor, or a managed service provider. We do not promise perfect security. We help organizations understand how their data actually moves — and reduce the places it should not go.

Relevant For This Advisory

Law Firms — confidentiality, ABA Opinion 512 alignment, supervisory framework for AI use
CPA & Tax Practices — FTC Safeguards Rule alignment, IRS Pub. 4557 written information security plan integration
Wealth Advisors & RIAs — Reg S-P customer information safeguarding, FINRA supervisory and recordkeeping posture

Relevant Services

AI Data Exposure Analysis — inventory of AI tools in use, identification of unauthorized data flows, written policy framework
Subprocessor Mapping — vendor and subprocessor chain documentation for the firm's authorized AI and SaaS environment
Internal Search Architecture Advisory — guidance on private retrieval systems and on-premise AI infrastructure aligned to professional services firm constraints

Request an AI Data Exposure Analysis or Privacy Risk Review for your firm: occunx.com

This document is for informational purposes only and does not constitute legal, compliance, tax, securities, or professional advice. Consult qualified counsel and advisors for guidance specific to your jurisdiction and practice. References to court orders, regulatory rules, and bar opinions reflect publicly available information as of the issue date; readers should verify current status before relying on any specific point.

Get in touch

Public Chatbots Are Not
Safe Research Tools.
Internal search is. The difference is your confidentiality.

For Law Firms

For CPA Firms and Tax Practices

For Wealth Advisors and RIAs

About OccuNX