Every Microsoft 365 tenant has a hidden data problem.
Over time, SharePoint sites, OneDrive folders, Teams channels, Exchange mailboxes, and shared workspaces collect contracts, customer records, HR files, financial reports, intellectual property, and old spreadsheets that should probably not be sitting where they are.
Most of that content is not labeled.
A lot of it is not reviewed.
And when a tenant-to-tenant migration, compliance audit, merger, divestiture, or AI readiness project begins, the problem becomes very visible.
You cannot protect data you cannot see.
You also cannot migrate sensitive data safely if you do not know what it is, where it lives, who owns it, and how it should be handled.
That is where Microsoft Purview Information Protection becomes important.
This article explains how Microsoft Purview helps with data discovery, classification, sensitivity labels, Content Explorer, Activity Explorer, and migration readiness in Microsoft 365.
What is Microsoft Purview Information Protection?
Microsoft Purview Information Protection helps organizations discover, classify, label, and protect sensitive data across Microsoft 365.
It is the modern direction for information protection in Microsoft 365 and brings together capabilities such as:
- Sensitivity labels
- Data classification
- Data loss prevention
- Content visibility
- Activity visibility
- Policy-based protection
For Microsoft 365 admins, the value is simple: Purview helps answer questions like:
Where is our sensitive data?
Who is accessing it?
Is it labeled?
Is it protected?
Will it remain protected after migration?
Purview works across key Microsoft 365 workloads such as SharePoint, OneDrive, Exchange, Teams, and endpoints.
It is especially useful because protection can travel with the file. A sensitivity label can apply encryption, watermarks, access restrictions, and other controls even after the document moves outside its original location.
Data discovery vs data classification
Data discovery and data classification are related, but they are not the same thing.
Data discovery is the process of finding content across your Microsoft 365 environment.
It helps identify what exists across:
- SharePoint sites
- OneDrive accounts
- Exchange mailboxes
- Teams content
- Endpoints
- Other connected locations
Data classification is the process of categorizing that content based on sensitivity, business value, or compliance requirements.
For example, content may be classified as:
- Public
- Internal
- Confidential
- Highly Confidential
- Restricted
Together, discovery and classification answer two important questions:
- What sensitive information do we have?
- How should that information be handled?
Microsoft Purview helps with both.
It can start surfacing sensitive and labeled content before every policy is fully built. This gives admins an early view of risk across Microsoft 365 and helps them make decisions based on real data instead of assumptions.
The core building blocks of Microsoft Purview classification
Purview classification is not one single feature.
It is a set of connected capabilities that work together.
The most important building blocks are:
- Sensitive Information Types
- Trainable Classifiers
- Exact Data Match
- Sensitivity Labels
Let us look at each one.
1. Sensitive Information Types
Sensitive Information Types, often called SITs, are pattern-based classifiers.
They detect data using things like:
- Regular expressions
- Keyword lists
- Checksums
- Proximity rules
Microsoft provides many built-in SITs for common sensitive data patterns such as government IDs, financial data, health-related identifiers, and other regulated information.
SITs are useful when the data has a recognizable format.
For example:
- Credit card numbers
- Passport numbers
- Tax IDs
- Bank account numbers
- National identification numbers
- Employee IDs
- Customer reference numbers
You can also create custom SITs for your organization-specific identifiers, such as internal project codes, customer numbers, or employee numbers.
Use SITs when the question is:
Does this content contain a specific pattern of sensitive data?
2. Trainable Classifiers
Not all sensitive data follows a simple pattern.
Some documents are sensitive because of what they are about, not because they contain a predictable number or format.
Examples include:
- Legal contracts
- Source code
- Resumes
- HR documents
- Financial planning documents
- Customer complaint records
- Policy documents
This is where Trainable Classifiers help.
Trainable Classifiers use machine learning to identify content based on examples. Instead of looking only for a pattern, they learn from sample documents and classify similar content.
Microsoft provides several pre-trained classifiers, and organizations can also build custom classifiers.
Use Trainable Classifiers when the question is:
Is this document about a specific topic or business process?
3. Exact Data Match
Sensitive Information Types can detect data that looks like a pattern.
Exact Data Match, or EDM, goes further.
EDM helps detect whether content contains values from a known source of truth, such as a customer database, employee list, patient record system, or CRM export.
For example, a number may look like a customer ID, but EDM can confirm whether that number is actually one of your real customer IDs.
This is useful for high-precision detection where false positives are costly.
Typical EDM use cases include:
- Customer records
- Patient IDs
- Employee identifiers
- Account numbers
- Membership numbers
- Regulated business data
Use EDM when the question is:
Does this content contain one of our actual sensitive records?
4. Sensitivity Labels
Sensitivity Labels are the protection layer.
Classifiers identify sensitive data.
Labels help protect it.
A sensitivity label can apply controls such as:
- Encryption
- Headers
- Footers
- Watermarks
- Access restrictions
- External sharing restrictions
- Privacy settings for Teams, Groups, and SharePoint sites
Labels can be applied manually by users, recommended by Office apps, or applied automatically when Purview detects sensitive content.
A simple way to remember the relationship is this:
Classifiers find the data. Labels protect the data.
For many organizations, a simple label taxonomy works best at the start.
For example:
- Public
- Internal
- Confidential
- Restricted
A smaller label set is easier to explain, easier to apply, and easier to govern.
Check licensing before planning
Before building a Purview classification plan, check your licensing.
Some basic classification and manual labeling capabilities are broadly available, but advanced features usually require higher-level licensing.
Capabilities such as automatic labeling, Trainable Classifiers, Exact Data Match, Content Explorer, and Activity Explorer may require Microsoft 365 E5, the E5 Compliance add-on, or equivalent licensing.
This matters during planning.
A migration playbook that assumes auto-labeling and EDM are available will fail if the tenant is only licensed for basic capabilities.
Before committing to a rollout plan, confirm what the tenant can actually use.
Content Explorer: seeing what Purview found
Once classification is active, admins need a way to inspect the results.
That is where Content Explorer helps.
Content Explorer gives visibility into classified content across the tenant.
It can show items that have:
- Sensitivity labels
- Retention labels
- Sensitive Information Type matches
Admins can use it to understand where sensitive data lives, which workloads contain the most risk, and whether classification is working as expected.
In practical terms, Content Explorer can help with:
- Finding sensitive files across SharePoint and OneDrive
- Reviewing labeled content
- Checking sensitive information matches
- Filtering by location, label, workload, or information type
- Validating whether classification results are accurate
This is especially useful before audits and migrations.
It gives the team evidence instead of guesswork.
Data Classification Content Viewer role
Content Explorer access must be handled carefully because it can expose sensitive information.
Microsoft separates access into role groups.
The two important role groups are:
- Content Explorer List Viewer
- Content Explorer Content Viewer
The List Viewer role allows a user to see classified items and their locations, but not open the content.
The Content Viewer role allows a user to open and read the actual file content.
This distinction is important.
A reporting analyst may only need list-level visibility.
A compliance investigator may need content-level access.
The Content Viewer role should be treated as a privileged role. Assign it only to named users who need it, log its use, and review membership regularly.
Do not give broad access just because someone is part of the compliance team.
Activity Explorer: seeing what happens to sensitive data
Content Explorer answers:
What sensitive data do we have?
Activity Explorer answers:
What is happening to that data?
Activity Explorer shows user and system activity related to sensitive and labeled content.
This can include actions such as:
- Label applied
- Label changed
- Label removed
- File shared externally
- File downloaded
- DLP rule matched
- Sensitive content accessed
This is useful because classification is not only about inventory.
It is also about behavior.
A file marked Confidential is one thing.
Knowing that it was shared externally, downloaded to an unmanaged device, or had its label removed is far more useful for risk management.
Together, Content Explorer and Activity Explorer give admins both inventory and activity context.
Why classification matters before Microsoft 365 migration
Most Purview discussions focus on steady-state compliance.
But classification becomes even more important during tenant-to-tenant migration.
During migration, organizations often move:
- Mailboxes
- OneDrive accounts
- SharePoint sites
- Teams content
- Groups
- Planner data
- Power Platform assets
- Power BI content
If sensitive data is not classified before migration, the target tenant may inherit the same unmanaged risk from the source tenant.
A migration is a good opportunity to clean up, classify, archive, delete, and re-govern content before it lands in the new environment.
The goal is not just to move data.
The goal is to move the right data with the right protection.
Phase 1: Pre-migration discovery
Before moving content, run discovery in the source tenant.
A practical approach is to run Purview classification for at least a few weeks before migration planning is finalized.
During this phase:
- Enable relevant built-in Sensitive Information Types
- Add custom SITs for business-specific identifiers
- Publish a small sensitivity label taxonomy
- Use Content Explorer to find sensitive data locations
- Identify high-risk SharePoint sites, OneDrive accounts, and mailboxes
- Review external sharing and access patterns
This phase helps answer a key question:
What is the blast radius if this migration goes wrong?
Many organizations discover that most sensitive data is concentrated in a smaller number of sites, accounts, or shared mailboxes.
That discovery helps prioritize migration planning.
Phase 2: Cleanup and scoping
Once you know where sensitive data lives, decide what should happen to it.
Not everything should be migrated.
Some content should move.
Some should be archived.
Some should be deleted.
Some should be reviewed by business owners before any migration begins.
This is the right time to:
- Remove stale content
- Archive legacy data
- Apply retention labels
- Reduce oversharing
- Fix ownership gaps
- Confirm sensitivity labels
- Define what content is out of scope
For existing content at rest, service-side auto-labeling is especially useful because it can classify SharePoint, OneDrive, and Exchange data in the background.
If you wait until after migration to label content, you may lose important chain-of-custody context.
Labeling at the source gives you better visibility before content moves.
Phase 3: Migration and label handling
Sensitivity labels need special planning during tenant-to-tenant migration.
Labels and encryption are often tied to the source tenant’s identities, policies, and protection configuration.
Because of this, labels do not always transfer cleanly across tenants.
In many migration scenarios, the practical approach is:
- Move the content.
- Preserve metadata, permissions, and version history where possible.
- Recreate the label taxonomy in the target tenant.
- Map source labels to target labels in the migration runbook.
- Re-apply or validate labels in the target using Purview policies or scripted approaches.
- Validate access behavior after cutover.
Before cutover, confirm these items:
- The target tenant has the same or equivalent label taxonomy.
- Source labels are mapped to target labels.
- Files with user-assigned permissions are identified separately.
- Auto-labeling policies in the target are scoped carefully.
- Protected content is tested after migration.
- Validation includes access behavior, not only file counts.
This is important because a migration that reports successful file counts may still fail from a compliance perspective if protection does not work correctly in the target tenant.
Phase 4: Post-migration validation
After migration, use Content Explorer and Activity Explorer in the target tenant to validate classification coverage.
Compare source and target results.
Look for gaps such as:
- Missing labels
- Incorrect label mapping
- Files that lost metadata
- Permissions that no longer match business rules
- Sensitive content moved to unexpected locations
- External sharing behavior that changed after migration
Plan a stabilization period after cutover.
For the first 30 days, review classification and activity reports weekly.
Migration projects often surface edge cases, such as:
- Legacy files that were never classified
- Service accounts that bypassed expected controls
- Third-party connectors that changed metadata
- Files protected in a way that does not map cleanly to the target tenant
Post-migration validation is where you prove that the data was not just moved, but moved safely.
Choosing the right data discovery and classification stack
Microsoft Purview is the right foundation for most Microsoft 365 environments.
But in real migration projects, Purview is usually one part of a larger stack.
A practical stack may include:
- Microsoft Purview for classification, labels, DLP, and visibility
- A Microsoft 365 migration tool for moving workloads and preserving metadata
- Power BI or reporting tools for audit and governance dashboards
- Security or compliance workflows for investigation and review
When evaluating tools, ask these questions:
- Does the tool preserve Microsoft Information Protection metadata?
- Does it support the workloads you actually use?
- Can it handle SharePoint, OneDrive, Teams, Exchange, Planner, and Power Platform data?
- Does it scale without creating throttling problems?
- Does it produce audit evidence your compliance team can trust?
- Does it support validation before and after migration?
The goal is not only migration completion.
The goal is governed migration completion.
A practical 30-day Purview classification checklist
If you are starting with Microsoft Purview classification, keep the first 30 days simple.
Here is a practical starting checklist:
- Publish a small sensitivity label set.
- Use clear label names that business users can understand.
- Enable the built-in Sensitive Information Types that match your compliance needs.
- Create one or two custom SITs for your internal identifiers.
- Run auto-labeling in simulation mode first.
- Review matches in Content Explorer before enabling enforcement.
- Assign the Content Viewer role only to selected named users.
- Review Activity Explorer weekly for the first month.
- Document label ownership and escalation paths.
- Use classification data to guide migration, audit, and AI-readiness planning.
Do not overcomplicate the first version.
A smaller label taxonomy that people actually use is better than a complex model nobody understands.
Final thoughts
Classification is not a one-time project.
It is an ongoing practice.
The organizations that handle migrations, audits, security reviews, and AI rollouts more smoothly are usually the ones that started labeling early.
They know where sensitive data lives.
They know who can access it.
They know which content should move, which content should stay, and which content should be retired.
Microsoft Purview gives Microsoft 365 admins the visibility and control needed to make that possible.
Start small.
Classify what matters.
Keep the label model simple.
Use Content Explorer and Activity Explorer regularly.
And if you are preparing for a Microsoft 365 tenant migration, do not wait until cutover to think about classification.
By then, it is already too late.



