SENISdigital

Copilot Guide for Tax Team

Microsoft 365 Copilot · 13 modules · Tier 1 Hygiene + Tier 2 Fluency

Microsoft 365 Copilot

Practical guide for tax professionals — foundational practices for safe and effective use, then task-specific patterns for speed.

How to use this guide

Data security rule: Use Microsoft 365 Copilot Chat in work mode only (purple shield icon). Never paste client data into consumer Copilot, ChatGPT, or personal accounts.

Modules

Tier 1 — Hygiene
H1 · Prompting basics H2 · Grounding Copilot in your work content H3 · Failure modes — what goes wrong H4 · Output handling — from Copilot to client-ready
Tier 2 — Fluency
F0 · Task-fit judgment (cross-cutting) F1 · Memo and advisory drafting F2 · Due diligence report sections F3 · Tax review documentation F4 · Client letters and emails F5 · Comparables and data analysis F6 · Word → PPT transformation F7 · Table and data extraction from documents F8 · Multi-document synthesis

Tier 1 — Hygiene

H1 · Prompting basics

Goal: write prompts that get useful answers on the first or second try, instead of vague responses that need three rewrites.

1.1 Why vague prompts fail

Copilot does not know what you are working on, what your client expects, or what "good" looks like for your output. If you don't tell it, it defaults to generic responses that look professional but lack substance.

1.2 The four-part prompt pattern

Most effective prompts contain four elements:

1.3 Worked example

Task: summarize a meeting recording for a tax review engagement.

Weak prompt
"Summarize this meeting."
Strong prompt
"Act as a tax consultant. Summarize this Teams meeting recording for a corporate income tax review engagement. Focus on: tax positions discussed, open follow-ups assigned to our team, client commitments with deadlines. Output as three bullet lists under those headings. Maximum 200 words."
Why the strong version works: Role tells Copilot the lens to apply. Context (engagement type) filters what's relevant. Format constraints prevent rambling output.

1.4 Applied exercise — pick a real task

  1. Choose a real task from your current engagement (e.g., summarize a client email thread, draft a section of a memo, list issues from a meeting note).
  2. Write your prompt using all four elements.
  3. Run it in Copilot Chat (work mode).
  4. Note: did you need to iterate? What was missing?
  5. Share with one peer for feedback on the prompt itself, not the output.

1.5 Sources


H2 · Grounding Copilot in your work content

Goal: get Copilot to base its answers on your actual files, emails, and meetings — not on general knowledge or guesses.

2.1 What grounding means

By default, Copilot Chat answers from general knowledge. When you reference specific files, emails, meetings, or SharePoint sites, Copilot reads those sources and answers based on them. This is called grounding.

Grounded answers are more accurate and can cite where the information came from. Ungrounded answers are guesses dressed in confident language.

2.2 How to ground in M365 Copilot Chat

2.3 Worked example

Task: extract tax warranties from an SPA stored in SharePoint.

Ungrounded (will fail)
"What tax warranties are typically in an SPA for a Lithuanian target?"
Grounded (uses your file)
"/[SPA_ProjectAlpha.docx] Extract every tax warranty and tax indemnity clause from this SPA. List each as: clause number, one-sentence summary, exposure type (CIT, VAT, WHT, transfer pricing, other). Output as a table."
When grounding misleads: Copilot indexes what it has access to. If a SharePoint site has outdated drafts alongside the final version, Copilot may cite the wrong one. Always verify which file Copilot used. Copilot cannot read files you don't have permission to — you'll get "I couldn't find this file," not an error.

2.4 Applied exercise — extract from a real engagement file

  1. Choose one real document from a current engagement: a memo, meeting note, contract excerpt, or report.
  2. Ask Copilot the same factual question two ways: first ungrounded (no file reference), then grounded (with /reference).
  3. Compare the answers. Note specifics, accuracy, and citations.
  4. Verify the grounded answer against the source document yourself.

2.5 Sources


H3 · Failure modes — what goes wrong and how to catch it

Goal: recognize the specific ways Copilot fails on tax work, so you don't deliver wrong answers to clients.

3.1 The five failure modes that matter most

Hallucinated citations Copilot will invent statute references, case names, and ruling numbers that look plausible but don't exist. Highest-stakes failure for tax work. Happens most when you ask about specific legal authority without grounding.
Stale or wrong jurisdiction Copilot's general knowledge has a cutoff date and is not jurisdiction-aware by default. Rates, thresholds, and rules change. "What is the CIT rate in Lithuania" without grounding can return outdated numbers stated with full confidence.
Confident-but-wrong reasoning Copilot writes fluently. A wrong analysis sounds as authoritative as a correct one. There is no signal in the tone that distinguishes them. Reviewers must verify substance, not style.
Stale Graph data Copilot indexes M365 content with some delay. Documents saved minutes ago may not be searchable. If you reference a file directly, not an issue. If you ask Copilot to search a SharePoint site, recent additions may be missed.
Selective summarization When summarizing long documents, Copilot can omit material points without flagging them. The summary looks complete. Always verify against source for anything client-facing.
Hard rule — verify every legal citation: Every statute, case, ruling, rate, or threshold Copilot produces must be verified against an authoritative source before it leaves your hands. This applies even if Copilot "cited" a source — it can hallucinate citations and invent source names.

Authoritative sources: e-tar.lt (Lithuanian legal acts register), EUR-Lex (EU law), official VMI publications, OECD official documents, your firm's verified knowledge base.

3.3 Worked example

Task: ask Copilot for the WHT rate on dividends paid from Lithuania to a Polish parent.

A typical Copilot response will state a rate, may reference "the Lithuania-Poland tax treaty," and may even cite an article number. Some of this will be correct. Some may not be. Things to verify:

3.4 Applied exercise — find a failure on purpose

  1. Ask Copilot a specific tax-technical question without grounding (e.g., a treaty article, a domestic rule citation, a deadline).
  2. Verify every factual claim in the answer against an authoritative source.
  3. Note: how many claims were correct, partial, or wrong? Were any hallucinated?
  4. Share the example anonymously in the team channel — failure examples are training material.

3.5 Sources


H4 · Output handling — from Copilot answer to client-ready

Goal: turn Copilot output into work product that meets your professional standard, with a clean audit trail.

4.1 The review pass

Treat every Copilot output as a junior draft. Three things to check before using it:

4.2 Citation verification workflow

For any Copilot output containing a legal or factual citation:

  1. Locate the cited source in an authoritative repository (e-tar.lt, EUR-Lex, official VMI page, your firm's library).
  2. Confirm the article, paragraph, or ruling exists.
  3. Confirm it says what Copilot claims it says.
  4. Confirm it is current — no amendments or repeals since the date Copilot referenced.
  5. Replace Copilot's paraphrase with your own verified wording in the final output.
Chat history hygiene: Copilot Chat retains conversation history within the M365 tenant under enterprise data protection. For sensitive engagement work: review and delete chats from highly sensitive conversations once the work product is finalized. Chats are subject to retention, eDiscovery, and audit policies set by your organization. Treat the chat as engagement work product, not as scratch paper.

4.4 Worked example

Task: Copilot drafted a paragraph for a tax memo. Making it client-ready:

4.5 Applied exercise — produce a client-ready output

  1. Use Copilot to draft a real piece of engagement output: a memo paragraph, a client email, a section of a report.
  2. Run all three review passes. Track every change you made.
  3. Note: how much of the original Copilot draft survived?
  4. Discuss with a peer: was Copilot a useful starting point, or did manual rework outweigh the time saved?

4.6 Sources


Tier 2 — Fluency
What changes at fluency tier: Hygiene measures correctness. Fluency measures efficiency. Each module is task-type focused and independent — pick the task you do most. Every exercise is time-boxed. Log: baseline manual estimate, AI-assisted actual time, iteration count, verdict (repeat / adjust / skip). Some tasks won't speed up — recognising task-fit is part of fluency.

F0 · Task-fit judgment (cross-cutting)

Goal: decide before starting whether AI will speed up this task — not after spending 20 minutes on prompts.

Where Copilot speeds up tax work

Where Copilot slows you down

Four-question check — before you start:
  1. Is the structure conventional? If no, AI is unlikely to help.
  2. Are the inputs already in M365 (Word, SharePoint, Outlook, Teams)? If no, grounding is hard, output will be weak.
  3. Does the output need verified citations? If yes, budget verification time and decide whether net-time still wins.
  4. Estimated manual time vs estimated AI-assisted time? If AI is not at least 30% faster including verification, skip.
Honest opt-out: Skipping AI for a task is a valid outcome. Document why you skipped — that is data for the team. Patterns of opt-out reveal where the tool genuinely doesn't fit, and prevent forced use that produces low-quality output and wasted time.

Applied exercise — weekly task review

  1. List all your tasks from one working week.
  2. For each, mark: AI used (yes/no), AI saved time (yes/no/unclear), would-use-AI-again (yes/no).
  3. Look for patterns: which task types consistently win, which consistently lose?
  4. Share patterns with the team — your data shapes which fluency modules others prioritize.

F1 · Memo and advisory drafting

Goal: produce a first-draft tax memo section in less time than typing it from scratch.

Prompt patterns that hit the standard fast

Weak prompt
"Write a tax memo on the planned restructuring."
Strong prompt
"/[Engagement_Notes.docx] /[Memo_Template_2024.docx] Adapt the memo template using the engagement notes. Sections: Background, Issues, Analysis, Recommendation. Lithuanian CIT and PIT focus. Tone: advisory to a sophisticated GC. Cite article numbers as placeholders [Art X] — I will verify and replace."
Grounds in two real files (notes and template) so structure and substance are anchored. Names the audience and tone so prose lands at the right register. Forces placeholder citations rather than fabricated ones — verification stays your job.

Time-boxed exercise — re-do a memo you've already drafted

  1. Pick a memo or memo section you would normally draft in 60–90 minutes.
  2. Time-box the AI-assisted version: 30 minutes including verification.
  3. Log: baseline estimate, actual time, iteration count, citations needing replacement, % of AI draft surviving in final output.
  4. Verdict: repeat this approach / adjust / skip for this task type. Note what made it fast or slow.

F2 · Due diligence report sections

Goal: convert raw findings into structured DD report sections faster than manual write-up, without losing rigor.

Prompt patterns that hit the standard fast

Weak prompt
"Write the tax DD section."
Strong prompt
"/[DataRoom_Tax_folder] /[DD_Report_Template.docx] Draft the Tax section of the DD report. Sub-sections: CIT, VAT, WHT, transfer pricing, payroll, historical positions. Per finding: one-paragraph description, qualitative exposure (low/medium/high), recommendation. Source-cite to data room file names. Output as Word using the firm's heading styles."

Time-boxed exercise — DD section side-by-side

  1. Pick one DD sub-section (e.g., VAT findings) you have already drafted manually on a real engagement.
  2. Re-do it using AI on the same source materials, time-boxed to half the original time.
  3. Compare both outputs side-by-side: completeness, accuracy, structure.
  4. Log: which findings were caught, missed, or invented by AI? What's the failure pattern?

F3 · Tax review documentation

Goal: produce review working papers and findings memos faster, while preserving audit trail and reviewer-defensibility.

Prompt patterns that hit the standard fast

Weak prompt
"Help me write up the CIT review."
Strong prompt
"/[CIT_Review_Notes.xlsx] /[Workpaper_Template.docx] Draft the CIT review working paper. Per area, document: procedures performed, observations, conclusions. Areas: revenue recognition, deductible expenses, related-party transactions, loss carry-forwards, R&D incentives. Tone: factual, non-advocate. Place [VERIFY] tags wherever a number or rule citation appears."

Time-boxed exercise — write up a real review section

  1. Pick a CIT or VAT review section you would normally write up in working papers.
  2. Time-box the AI-assisted version.
  3. Log: time saved, [VERIFY] tags reviewed, errors caught at verification.
  4. Verdict: which review areas suit AI write-up, which don't? Document the pattern.

F4 · Client letters and emails

Goal: produce client correspondence in the right tone and length, faster than typing from scratch.

Prompt patterns that hit the standard fast

Weak prompt
"Write an email to the client about the deadline."
Strong prompt
"Draft a short email to the CFO. Bullets to convey: VMI extension granted to 30 November; we will deliver draft return by 23 November for their review; one open item — confirmation of dividend payment date. Tone: matter-of-fact, no hedging. Max 120 words. No 'I hope this finds you well.'"
Specifying "no hedging" and a word limit eliminates the most common AI email failures: over-cushioning and over-length. Banning the standard opener forces a direct, professional first line.

Time-boxed exercise — three real client emails

  1. Pick three real client emails you would normally write today.
  2. AI-draft each in under 5 minutes total; review and send.
  3. Log: time saved per email; how many sent without rewriting; client engagement (replies, action taken).
  4. Verdict: which email types suit AI drafting? Which need to stay manual?

F5 · Comparables and data analysis

Goal: faster handling of structured data — extraction, cleaning, comparison, summary tables.

Prompt patterns that hit the standard fast

Weak prompt
"Compare these companies."
Strong prompt
"/[Comparables_List.xlsx] Evaluate each company against transfer pricing comparability criteria: independence (no shareholding >25%), industry (NACE 4-digit match to tested party), geography (EU), data availability (last 3 years). Output: original list + 4 columns marking each criterion (yes/no/insufficient data) + comment column. No additions to the list. Do not invent missing data — use 'insufficient data'."
Critical instruction: do not invent. AI tools default to filling cells. For transfer pricing comparables work, fabricated data is worse than no data. Always include explicit instructions like "do not invent missing data" and "use 'insufficient data' for unknowns."

Time-boxed exercise — real data task with full verification

  1. Pick a real data task: a comparables shortlist, a transaction sample for testing, a list of contracts to categorise.
  2. AI-assist the task; verify every output row against source.
  3. Log: time saved, error rate, type of errors (made-up data, miscategorisation, omission).
  4. Verdict: which data tasks pass the verification overhead test? Which don't?

F6 · Word → PPT transformation

Goal: turn an existing Word memo or report into a presentable slide deck without re-writing everything.

Prompt patterns that hit the standard fast

Weak prompt
"Make a presentation from this memo."
Strong prompt
"Create a 7-slide deck from /[Restructuring_Memo.docx] for a client steering committee. Slides: 1 title, 1 executive summary, 3 issues (one per slide, each with a recommendation), 1 next steps with owner and date, 1 Q&A. Bullets only, max 4 per slide. Speaker notes required. Use the firm's PPT template if referenced."
Common failure mode: bloat. PowerPoint Copilot tends to produce too many slides and too much text per slide. Constraining slide count and bullets per slide upfront saves more time than fixing afterward.

Time-boxed exercise — Word memo to deck

  1. Take a Word memo or report you have already delivered.
  2. Generate a deck via Copilot; spend max 20 minutes adjusting.
  3. Compare to what you would have built manually.
  4. Log: time saved, restructuring needed, what AI got right and wrong about hierarchy and emphasis.

F7 · Table and data extraction from documents

Goal: pull structured data out of unstructured documents (contracts, emails, scanned text) faster than manual reading.

Prompt patterns that hit the standard fast

Weak prompt
"Get the key info from these contracts."
Strong prompt
"/[Contracts_Folder] For each contract, extract: counterparty name, contract date, term, governing law, tax-related clauses (presence yes/no, clause numbers), termination notice period, change-of-control provision. Output: one row per contract, columns as listed. Mark missing fields as 'not specified' — do not infer or guess."

Time-boxed exercise — real document set extraction

  1. Pick a real document set (contracts, emails, financial statements, prior year filings).
  2. AI-extract a defined field list to a table.
  3. Verify a sample (e.g., 20%) against source documents.
  4. Log: extraction accuracy by field type, time vs manual, types of errors. Some fields will be reliable; others won't. Document the pattern.

F8 · Multi-document synthesis

Goal: synthesize across many sources (data room, prior memos, email threads) into one coherent output.

Prompt patterns that hit the standard fast

Weak prompt
"Summarize all these documents."
Strong prompt
"/[Engagement_Folder_2023] /[Engagement_Folder_2024] Build a chronological timeline of all tax positions taken for this client across both engagements. Per entry: date, position taken, supporting authority cited, document source (file name + section). Flag any contradictions or shifts between 2023 and 2024 positions."
Verification overhead grows with scale. Multi-document synthesis is where AI most often appears to save time but doesn't. Each output point is a claim about source. With 50+ files, verifying every claim can exceed manual reading time.

Use multi-document synthesis when you would otherwise not do the synthesis at all (too expensive manually) — that's where it adds genuine value, even with verification overhead.

Time-boxed exercise — synthesis you've done manually before

  1. Pick a multi-source synthesis task you have done manually before (e.g., a position paper, a recurring client memo, a status summary).
  2. AI-assist; verify against sources.
  3. Log: completeness vs your manual version, missed points, fabricated points, time delta.
  4. Verdict: at what document count does AI synthesis become net-negative? Verification grows with scale; the break-even is the data point.

Sources