Unable to view scanned JSON and Parquet Metadata from AWS S3 in MS Purview

Question

Unable to view scanned JSON and Parquet Metadata from AWS S3 in MS Purview

Subhash Katipelly 0

Hello Team,

We are currently scanning JSON and Parquet files from AWS S3 into MS Purview using the AWS S3 connector.

Please need a help on this.

Regards

Subhash

0 comments

2 answers

Your answer

Answer 1

Hi Subhash Katipelly,

Thank you for reaching out! We understand how frustrating it can be when your JSON and Parquet metadata isn't showing up in Purview after scanning from AWS S3. Let us walk you through the most common causes and how to fix them.

First, the good news — Microsoft Purview's Amazon S3 Multicloud Scanning Connector does fully support both JSON and Parquet file types for scanning and schema extraction. So the feature is there; we just need to make sure a few things are configured correctly on your end.

Complex or nested data types Purview's scanner cannot extract schema for Parquet files that contain complex data types like MAP, LIST, or STRUCT. The asset will still be discovered, but the schema/metadata tab will appear blank. Please verify whether your files contain any nested structures.
S3 storage class Files stored in Glacier storage classes are not supported for schema extraction, classification, or sensitivity labels. Please ensure your files are in S3 Standard storage.
Integration Runtime (IR) setup If you're using a Self-Hosted Integration Runtime (SHIR), you must install the 64-bit JRE 11 or OpenJDK on the SHIR machine — without this, Parquet schema extraction will fail silently. Also, make sure your SHIR is updated to the latest version. If you're using the AWS auto-resolve IR, no additional setup is needed.
Network access The S3 connector does not support Purview private endpoints. Your environment must have public internet access to communicate with the Purview service. Please check that no firewall or network rules are blocking outbound connectivity.
Parquet compression format For compressed Parquet files, only Snappy compression is supported for schema extraction. If your files use GZIP, LZ4, or ZSTD, the schema won't be extracted.
Scan status and logs Navigate to Data Map → Data Sources → your S3 source → Scans tab and check whether the scan completed with "Succeeded" status. Look for any warnings like "Schema extraction not supported for complex types". For SHIR issues, check the Windows Event Viewer under the Integration Runtime section.

To help us narrow this down further, could you share:

Is your scan completing successfully, or are there any warnings/errors?
Which IR are you using — auto-resolve or Self-Hosted?
What S3 storage class are your files in?
Do your files contain nested/complex data types?
Are the assets appearing in the Data Map at all (even with an empty schema)?

the inability to view JSON and Parquet metadata from AWS S3 in Purview is typically caused by one of the above factors — complex data types in your files, an unsupported storage class, a missing Java runtime on your SHIR, or a compression format other than Snappy. We'd encourage you to go through each checkpoint above, and once you share the additional details, we'll be happy to assist you further in resolving this completely.
References:

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

Manoj Kumar Boyini 14,415 Reputation points Microsoft External Staff Moderator

2026-05-09T06:29:35.65+00:00

Hi @Subhash Katipelly

I hope you had a chance to review the information shared earlier, and I hope this information has been helpful! If you still have questions, please let us know what is needed in the comments so the question can be answered.
Subhash Katipelly 0 Reputation points

2026-05-11T16:41:11.84+00:00

@SAI JAGADEESH KUDIPUDI Can we connect on this to resolve the issue?

Regards

Subhash

Answer 2

JSON and Parquet files in Amazon S3 can be scanned and classified by the Microsoft Purview Multicloud Scanning Connector, but only metadata and classification results are brought into Purview, not the raw file contents.

To verify and explore the scan results for JSON and Parquet files from AWS S3:

Confirm the scan completed successfully
- In the Microsoft Purview portal, go to Data Map.
- Select the registered Amazon S3 data source (or the specific AWS account/bucket).
- Open the Scans tab and verify that the scan has finished and shows as completed. If the scan was just started, note that it can take up to 24 hours before results are fully available in the catalog and Insight reports.
Check that the scan rule set includes JSON/Parquet
- When creating or editing the scan, on Select a scan rule set, either use the default AmazonS3 rule set or a custom rule set.
- If using a custom rule set, ensure that under Select file types the relevant file types (for example, JSON and Parquet extensions) are included. If needed, create a New file type with the correct extension and description so those files are scanned.
Explore scanned assets in the catalog
- After the scan completes and the 24‑hour processing window has passed, go to the Microsoft Purview Unified Catalog.
- Search and filter for the specific S3 bucket or path where the JSON/Parquet files reside.
- The catalog will show assets discovered in that bucket, with metadata and classification information derived from the scan.
Use Insight reports to review classifications
- Open Insight reports in Purview.
- Use the filters (including the Amazon S3 asset type) to view statistics for classification, sensitivity labels, and file types across the scanned S3 content.
- JSON and Parquet files that were successfully scanned will contribute to these statistics.
If assets still do not appear, validate AWS-side configuration
- Ensure the AWS IAM role used for scanning has at least AmazonS3ReadOnlyAccess (or equivalent minimum read permissions) and, if the bucket is encrypted with KMS, KMS Decrypt permissions.
- Confirm that the S3 bucket policy and any SCP policies are not blocking AssumeRole, GetBucketLocation, GetObject, ListBucket, and related calls from the Microsoft Purview scanner service.
- Verify the S3 bucket URL in Purview is correctly configured as s3://<BucketName> for the bucket being scanned.

If all of the above are correctly configured and the scan still does not surface JSON/Parquet assets in the catalog or Insight reports, the guidance is to contact Microsoft support for further investigation.

References:

Amazon S3 Multicloud Scanning Connector for Microsoft Purview

Share via

Unable to view scanned JSON and Parquet Metadata from AWS S3 in MS Purview

2 answers

Your answer