Microsoft Purview Information Protection and Classification: A Practical Guide

Every Microsoft 365 tenant has a hidden data problem.

Over time, SharePoint sites, OneDrive folders, Teams channels, Exchange mailboxes, and shared workspaces collect contracts, customer records, HR files, financial reports, intellectual property, and old spreadsheets that should probably not be sitting where they are.

Most of that content is not labeled.

A lot of it is not reviewed.

And when a tenant-to-tenant migration, compliance audit, merger, divestiture, or AI readiness project begins, the problem becomes very visible.

You cannot protect data you cannot see.

You also cannot migrate sensitive data safely if you do not know what it is, where it lives, who owns it, and how it should be handled.

That is where Microsoft Purview Information Protection becomes important.

This article explains how Microsoft Purview helps with data discovery, classification, sensitivity labels, Content Explorer, Activity Explorer, and migration readiness in Microsoft 365.

What is Microsoft Purview Information Protection?

Microsoft Purview Information Protection helps organizations discover, classify, label, and protect sensitive data across Microsoft 365.

It is the modern direction for information protection in Microsoft 365 and brings together capabilities such as:

Sensitivity labels
Data classification
Data loss prevention
Content visibility
Activity visibility
Policy-based protection

For Microsoft 365 admins, the value is simple: Purview helps answer questions like:

Where is our sensitive data?

Who is accessing it?

Is it labeled?

Is it protected?

Will it remain protected after migration?

Purview works across key Microsoft 365 workloads such as SharePoint, OneDrive, Exchange, Teams, and endpoints.

It is especially useful because protection can travel with the file. A sensitivity label can apply encryption, watermarks, access restrictions, and other controls even after the document moves outside its original location.

Data discovery vs data classification

Data discovery and data classification are related, but they are not the same thing.

Data discovery is the process of finding content across your Microsoft 365 environment.

It helps identify what exists across:

SharePoint sites
OneDrive accounts
Exchange mailboxes
Teams content
Endpoints
Other connected locations

Data classification is the process of categorizing that content based on sensitivity, business value, or compliance requirements.

For example, content may be classified as:

Public
Internal
Confidential
Highly Confidential
Restricted

Together, discovery and classification answer two important questions:

What sensitive information do we have?
How should that information be handled?

Microsoft Purview helps with both.

It can start surfacing sensitive and labeled content before every policy is fully built. This gives admins an early view of risk across Microsoft 365 and helps them make decisions based on real data instead of assumptions.

The core building blocks of Microsoft Purview classification

Purview classification is not one single feature.

It is a set of connected capabilities that work together.

The most important building blocks are:

Sensitive Information Types
Trainable Classifiers
Exact Data Match
Sensitivity Labels

Let us look at each one.

1. Sensitive Information Types

Sensitive Information Types, often called SITs, are pattern-based classifiers.

They detect data using things like:

Regular expressions
Keyword lists
Checksums
Proximity rules

Microsoft provides many built-in SITs for common sensitive data patterns such as government IDs, financial data, health-related identifiers, and other regulated information.

SITs are useful when the data has a recognizable format.

For example:

Credit card numbers
Passport numbers
Tax IDs
Bank account numbers
National identification numbers
Employee IDs
Customer reference numbers

You can also create custom SITs for your organization-specific identifiers, such as internal project codes, customer numbers, or employee numbers.

Use SITs when the question is:

Does this content contain a specific pattern of sensitive data?

2. Trainable Classifiers

Not all sensitive data follows a simple pattern.

Some documents are sensitive because of what they are about, not because they contain a predictable number or format.

Examples include:

Legal contracts
Source code
Resumes
HR documents
Financial planning documents
Customer complaint records
Policy documents

This is where Trainable Classifiers help.

Trainable Classifiers use machine learning to identify content based on examples. Instead of looking only for a pattern, they learn from sample documents and classify similar content.

Microsoft provides several pre-trained classifiers, and organizations can also build custom classifiers.

Use Trainable Classifiers when the question is:

Is this document about a specific topic or business process?

3. Exact Data Match

Sensitive Information Types can detect data that looks like a pattern.

Exact Data Match, or EDM, goes further.

EDM helps detect whether content contains values from a known source of truth, such as a customer database, employee list, patient record system, or CRM export.

For example, a number may look like a customer ID, but EDM can confirm whether that number is actually one of your real customer IDs.

This is useful for high-precision detection where false positives are costly.

Typical EDM use cases include:

Customer records
Patient IDs
Employee identifiers
Account numbers
Membership numbers
Regulated business data

Use EDM when the question is:

Does this content contain one of our actual sensitive records?

4. Sensitivity Labels

Sensitivity Labels are the protection layer.

Classifiers identify sensitive data.

Labels help protect it.

A sensitivity label can apply controls such as:

Encryption
Headers
Footers
Watermarks
Access restrictions
External sharing restrictions
Privacy settings for Teams, Groups, and SharePoint sites

Labels can be applied manually by users, recommended by Office apps, or applied automatically when Purview detects sensitive content.

A simple way to remember the relationship is this:

Classifiers find the data. Labels protect the data.

For many organizations, a simple label taxonomy works best at the start.

For example:

Public
Internal
Confidential
Restricted

A smaller label set is easier to explain, easier to apply, and easier to govern.

Check licensing before planning

Before building a Purview classification plan, check your licensing.

Some basic classification and manual labeling capabilities are broadly available, but advanced features usually require higher-level licensing.

Capabilities such as automatic labeling, Trainable Classifiers, Exact Data Match, Content Explorer, and Activity Explorer may require Microsoft 365 E5, the E5 Compliance add-on, or equivalent licensing.

This matters during planning.

A migration playbook that assumes auto-labeling and EDM are available will fail if the tenant is only licensed for basic capabilities.

Before committing to a rollout plan, confirm what the tenant can actually use.

Content Explorer: seeing what Purview found

Once classification is active, admins need a way to inspect the results.

That is where Content Explorer helps.

Content Explorer gives visibility into classified content across the tenant.

It can show items that have:

Sensitivity labels
Retention labels
Sensitive Information Type matches

Admins can use it to understand where sensitive data lives, which workloads contain the most risk, and whether classification is working as expected.

In practical terms, Content Explorer can help with:

Finding sensitive files across SharePoint and OneDrive
Reviewing labeled content
Checking sensitive information matches
Filtering by location, label, workload, or information type
Validating whether classification results are accurate

This is especially useful before audits and migrations.

It gives the team evidence instead of guesswork.

Data Classification Content Viewer role

Content Explorer access must be handled carefully because it can expose sensitive information.

Microsoft separates access into role groups.

The two important role groups are:

Content Explorer List Viewer
Content Explorer Content Viewer

The List Viewer role allows a user to see classified items and their locations, but not open the content.

The Content Viewer role allows a user to open and read the actual file content.

This distinction is important.

A reporting analyst may only need list-level visibility.

A compliance investigator may need content-level access.

The Content Viewer role should be treated as a privileged role. Assign it only to named users who need it, log its use, and review membership regularly.

Do not give broad access just because someone is part of the compliance team.

Activity Explorer: seeing what happens to sensitive data

Content Explorer answers:

What sensitive data do we have?

Activity Explorer answers:

What is happening to that data?

Activity Explorer shows user and system activity related to sensitive and labeled content.

This can include actions such as:

Label applied
Label changed
Label removed
File shared externally
File downloaded
DLP rule matched
Sensitive content accessed

This is useful because classification is not only about inventory.

It is also about behavior.

A file marked Confidential is one thing.

Knowing that it was shared externally, downloaded to an unmanaged device, or had its label removed is far more useful for risk management.

Together, Content Explorer and Activity Explorer give admins both inventory and activity context.

Why classification matters before Microsoft 365 migration

Most Purview discussions focus on steady-state compliance.

But classification becomes even more important during tenant-to-tenant migration.

During migration, organizations often move:

Mailboxes
OneDrive accounts
SharePoint sites
Teams content
Groups
Planner data
Power Platform assets
Power BI content

If sensitive data is not classified before migration, the target tenant may inherit the same unmanaged risk from the source tenant.

A migration is a good opportunity to clean up, classify, archive, delete, and re-govern content before it lands in the new environment.

The goal is not just to move data.

The goal is to move the right data with the right protection.

Phase 1: Pre-migration discovery

Before moving content, run discovery in the source tenant.

A practical approach is to run Purview classification for at least a few weeks before migration planning is finalized.

During this phase:

Enable relevant built-in Sensitive Information Types
Add custom SITs for business-specific identifiers
Publish a small sensitivity label taxonomy
Use Content Explorer to find sensitive data locations
Identify high-risk SharePoint sites, OneDrive accounts, and mailboxes
Review external sharing and access patterns

This phase helps answer a key question:

What is the blast radius if this migration goes wrong?

Many organizations discover that most sensitive data is concentrated in a smaller number of sites, accounts, or shared mailboxes.

That discovery helps prioritize migration planning.

Phase 2: Cleanup and scoping

Once you know where sensitive data lives, decide what should happen to it.

Not everything should be migrated.

Some content should move.

Some should be archived.

Some should be deleted.

Some should be reviewed by business owners before any migration begins.

This is the right time to:

Remove stale content
Archive legacy data
Apply retention labels
Reduce oversharing
Fix ownership gaps
Confirm sensitivity labels
Define what content is out of scope

For existing content at rest, service-side auto-labeling is especially useful because it can classify SharePoint, OneDrive, and Exchange data in the background.

If you wait until after migration to label content, you may lose important chain-of-custody context.

Labeling at the source gives you better visibility before content moves.

Phase 3: Migration and label handling

Sensitivity labels need special planning during tenant-to-tenant migration.

Labels and encryption are often tied to the source tenant’s identities, policies, and protection configuration.

Because of this, labels do not always transfer cleanly across tenants.

In many migration scenarios, the practical approach is:

Move the content.
Preserve metadata, permissions, and version history where possible.
Recreate the label taxonomy in the target tenant.
Map source labels to target labels in the migration runbook.
Re-apply or validate labels in the target using Purview policies or scripted approaches.
Validate access behavior after cutover.

Before cutover, confirm these items:

The target tenant has the same or equivalent label taxonomy.
Source labels are mapped to target labels.
Files with user-assigned permissions are identified separately.
Auto-labeling policies in the target are scoped carefully.
Protected content is tested after migration.
Validation includes access behavior, not only file counts.

This is important because a migration that reports successful file counts may still fail from a compliance perspective if protection does not work correctly in the target tenant.

Phase 4: Post-migration validation

After migration, use Content Explorer and Activity Explorer in the target tenant to validate classification coverage.

Compare source and target results.

Look for gaps such as:

Missing labels
Incorrect label mapping
Files that lost metadata
Permissions that no longer match business rules
Sensitive content moved to unexpected locations
External sharing behavior that changed after migration

Plan a stabilization period after cutover.

For the first 30 days, review classification and activity reports weekly.

Migration projects often surface edge cases, such as:

Legacy files that were never classified
Service accounts that bypassed expected controls
Third-party connectors that changed metadata
Files protected in a way that does not map cleanly to the target tenant

Post-migration validation is where you prove that the data was not just moved, but moved safely.

Choosing the right data discovery and classification stack

Microsoft Purview is the right foundation for most Microsoft 365 environments.

But in real migration projects, Purview is usually one part of a larger stack.

A practical stack may include:

Microsoft Purview for classification, labels, DLP, and visibility
A Microsoft 365 migration tool for moving workloads and preserving metadata
Power BI or reporting tools for audit and governance dashboards
Security or compliance workflows for investigation and review

When evaluating tools, ask these questions:

Does the tool preserve Microsoft Information Protection metadata?
Does it support the workloads you actually use?
Can it handle SharePoint, OneDrive, Teams, Exchange, Planner, and Power Platform data?
Does it scale without creating throttling problems?
Does it produce audit evidence your compliance team can trust?
Does it support validation before and after migration?

The goal is not only migration completion.

The goal is governed migration completion.

A practical 30-day Purview classification checklist

If you are starting with Microsoft Purview classification, keep the first 30 days simple.

Here is a practical starting checklist:

Publish a small sensitivity label set.
Use clear label names that business users can understand.
Enable the built-in Sensitive Information Types that match your compliance needs.
Create one or two custom SITs for your internal identifiers.
Run auto-labeling in simulation mode first.
Review matches in Content Explorer before enabling enforcement.
Assign the Content Viewer role only to selected named users.
Review Activity Explorer weekly for the first month.
Document label ownership and escalation paths.
Use classification data to guide migration, audit, and AI-readiness planning.

Do not overcomplicate the first version.

A smaller label taxonomy that people actually use is better than a complex model nobody understands.

Final thoughts

Classification is not a one-time project.

It is an ongoing practice.

The organizations that handle migrations, audits, security reviews, and AI rollouts more smoothly are usually the ones that started labeling early.

They know where sensitive data lives.

They know who can access it.

They know which content should move, which content should stay, and which content should be retired.

Microsoft Purview gives Microsoft 365 admins the visibility and control needed to make that possible.

Start small.

Classify what matters.

Keep the label model simple.

Use Content Explorer and Activity Explorer regularly.

And if you are preparing for a Microsoft 365 tenant migration, do not wait until cutover to think about classification.

By then, it is already too late.