Share via

Unable to view scanned JSON and Parquet Metadata from AWS S3 in MS Purview

Subhash Katipelly 0 Reputation points
2026-05-07T17:55:14.6766667+00:00

Hello Team,

We are currently scanning JSON and Parquet files from AWS S3 into MS Purview using the AWS S3 connector.

Please need a help on this.

Regards

Subhash

Microsoft Security | Microsoft Purview
0 comments No comments

2 answers

Sort by: Most helpful
  1. SAI JAGADEESH KUDIPUDI 3,115 Reputation points Microsoft External Staff Moderator
    2026-05-08T00:32:24.3233333+00:00

    Hi Subhash Katipelly,

    Thank you for reaching out! We understand how frustrating it can be when your JSON and Parquet metadata isn't showing up in Purview after scanning from AWS S3. Let us walk you through the most common causes and how to fix them.

    First, the good news — Microsoft Purview's Amazon S3 Multicloud Scanning Connector does fully support both JSON and Parquet file types for scanning and schema extraction. So the feature is there; we just need to make sure a few things are configured correctly on your end.

    1. Complex or nested data types Purview's scanner cannot extract schema for Parquet files that contain complex data types like MAP, LIST, or STRUCT. The asset will still be discovered, but the schema/metadata tab will appear blank. Please verify whether your files contain any nested structures.
    2. S3 storage class Files stored in Glacier storage classes are not supported for schema extraction, classification, or sensitivity labels. Please ensure your files are in S3 Standard storage.
    3. Integration Runtime (IR) setup If you're using a Self-Hosted Integration Runtime (SHIR), you must install the 64-bit JRE 11 or OpenJDK on the SHIR machine — without this, Parquet schema extraction will fail silently. Also, make sure your SHIR is updated to the latest version. If you're using the AWS auto-resolve IR, no additional setup is needed.
    4. Network access The S3 connector does not support Purview private endpoints. Your environment must have public internet access to communicate with the Purview service. Please check that no firewall or network rules are blocking outbound connectivity.
    5. Parquet compression format For compressed Parquet files, only Snappy compression is supported for schema extraction. If your files use GZIP, LZ4, or ZSTD, the schema won't be extracted.
    6. Scan status and logs Navigate to Data Map → Data Sources → your S3 source → Scans tab and check whether the scan completed with "Succeeded" status. Look for any warnings like "Schema extraction not supported for complex types". For SHIR issues, check the Windows Event Viewer under the Integration Runtime section.

    To help us narrow this down further, could you share:

    • Is your scan completing successfully, or are there any warnings/errors?
    • Which IR are you using — auto-resolve or Self-Hosted?
    • What S3 storage class are your files in?
    • Do your files contain nested/complex data types?
    • Are the assets appearing in the Data Map at all (even with an empty schema)?

    the inability to view JSON and Parquet metadata from AWS S3 in Purview is typically caused by one of the above factors — complex data types in your files, an unsupported storage class, a missing Java runtime on your SHIR, or a compression format other than Snappy. We'd encourage you to go through each checkpoint above, and once you share the additional details, we'll be happy to assist you further in resolving this completely.
    References:

    1. Amazon S3 Multicloud Scanning Connector for Microsoft Purview
    2. Supported data sources and file types in Microsoft Purview
    3. Create and manage a self-hosted integration runtime
    4. Troubleshoot scans and connections in Microsoft Purview

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    Was this answer helpful?


  2. Q&A Assist
    2026-05-07T17:55:27.08+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    JSON and Parquet files in Amazon S3 can be scanned and classified by the Microsoft Purview Multicloud Scanning Connector, but only metadata and classification results are brought into Purview, not the raw file contents.

    To verify and explore the scan results for JSON and Parquet files from AWS S3:

    1. Confirm the scan completed successfully
      • In the Microsoft Purview portal, go to Data Map.
      • Select the registered Amazon S3 data source (or the specific AWS account/bucket).
      • Open the Scans tab and verify that the scan has finished and shows as completed. If the scan was just started, note that it can take up to 24 hours before results are fully available in the catalog and Insight reports.
    2. Check that the scan rule set includes JSON/Parquet
      • When creating or editing the scan, on Select a scan rule set, either use the default AmazonS3 rule set or a custom rule set.
      • If using a custom rule set, ensure that under Select file types the relevant file types (for example, JSON and Parquet extensions) are included. If needed, create a New file type with the correct extension and description so those files are scanned.
    3. Explore scanned assets in the catalog
      • After the scan completes and the 24‑hour processing window has passed, go to the Microsoft Purview Unified Catalog.
      • Search and filter for the specific S3 bucket or path where the JSON/Parquet files reside.
      • The catalog will show assets discovered in that bucket, with metadata and classification information derived from the scan.
    4. Use Insight reports to review classifications
      • Open Insight reports in Purview.
      • Use the filters (including the Amazon S3 asset type) to view statistics for classification, sensitivity labels, and file types across the scanned S3 content.
      • JSON and Parquet files that were successfully scanned will contribute to these statistics.
    5. If assets still do not appear, validate AWS-side configuration
      • Ensure the AWS IAM role used for scanning has at least AmazonS3ReadOnlyAccess (or equivalent minimum read permissions) and, if the bucket is encrypted with KMS, KMS Decrypt permissions.
      • Confirm that the S3 bucket policy and any SCP policies are not blocking AssumeRole, GetBucketLocation, GetObject, ListBucket, and related calls from the Microsoft Purview scanner service.
      • Verify the S3 bucket URL in Purview is correctly configured as s3://<BucketName> for the bucket being scanned.

    If all of the above are correctly configured and the scan still does not surface JSON/Parquet assets in the catalog or Insight reports, the guidance is to contact Microsoft support for further investigation.


    References:

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.