Confluence Data Connector Set Up

The BigPanda Unified Data Connector syncs data from your Confluence pages to provide context and insight for the AI Incident Assistant, AI Incident Prevention, and AI Detection and Response. Ingested data is securely stored and made available in the IT Knowledge Graph, powering accurate answers, deep analytics, trend analysis, and advanced capabilities.

The connector supports both Confluence Cloud and Confluence Data Center / Server instances.

Data Redaction

You can redact sensitive information ingested via the Unified Data Connector, including PII, PHI, and PCI, giving you confidence that your confidential data remains exclusively within your approved channels. See the Data Redaction documentation for details on managing sensitive data.

Authentication

The Confluence connector supports two authentication methods: HTTP basic authentication with an API token, and Atlassian JSON Web Token (JWT).

Basic authentication

To begin, reach out to BigPanda support and provide credentials for your Confluence instance. The user account provided must have at least read permission access to all of the pages that will be synced from Confluence.

These credentials are required:

Field	Description
URL	The URL of your Confluence instance (`https://your-domain.atlassian.net`)
Username	Username of the account used to access Confluence.
API Token	The API token of the account used to access Confluence.
Cloud	Whether or not your instance is Confluence Cloud. (True or False)

The BigPanda team will use your Confluence credential information to set up the data connector. Certain fields and options are customizable depending upon your organization's preferences and requirements.

Atlassian JWT authentication

You can also choose to authenticate the connection using Atlassian's JSON Web Token (JWT).

Select the Atlassian JWT radio button
Fill in the required fields:
1. Key: JWT public key
2. Shared secret: JWT shared secret

Sync Preferences

Provide the following information about your sync preferences to BigPanda:

Option	Default	Description
Space keys	—	The unique key(s) of your Confluence space(s). The key can be found in your Confluence URL after `/spaces/` (for example: `https://yourinstance.atlassian.net/wiki/spaces/<spacekey>/`).
Mode	—	Determine whether you'd like to set up an incremental or historical sync. Most organizations will need to use both modes.
Endpoints fetched	—	The connector can fetch `pages`, `spaces`, `users`, `groups`, `content`, and `templates`.
`content_types`	`[page]`	List of content types to extract. Add `attachment` to also extract attachment metadata into a separate `page_attachments` resource. The default `[page]` preserves the previous page-only behavior.
`max_attachment_size_mb`	`50`	Maximum attachment size (in MB) eligible for binary upload. Files larger than this skip binary upload; attachment metadata is still emitted.
Start date	—	If you're setting up a historical sync, provide the date of when you'd like the sync to begin. We recommend syncing one year of data.
End Date	—	Used for historical syncs only. Provide an end date to backfill historical syncs. When `end_date` is provided alongside `start_date`, the connector automatically uses an isolated pipeline name (`{pipeline_name}_backfill`) to prevent the backfill from corrupting the incremental sync cursor.
Page size	`50`	The number of items requested from Confluence in a single call.
Query limit	`60`	How many requests can be made to Confluence per minute.
Rate Limit Timeout (MS)	`1000`	The timeout period in milliseconds after the query limit has been reached.
CQL filters (Optional)	—	You can use Confluence Query Language (CQL) to set up content filters. This allows you to extract only the content that matches your criteria.
Disable compression	—	Compression can be disabled to ensure exact accuracy of ingested data.

Validate space_keys before backfilling

If your Confluence sync returns far fewer records than expected, verify that every space you intend to ingest is listed inSpace keysand that the Confluence service account has read access to each of those spaces. The connector only ingests spaces that are both listed in the configuration and accessible to the service account.

Data model considerations

Resources

The Confluence Data Connector creates the following resources:

space_content: The per-page content resource. Use this to drive page-level downstream processing.
page_attachments: When content_types includes attachment. Per-attachment metadata for Confluence Cloud and Data Center. Use this to correlate attachment context with the parent page.

Existing pipelines that do not include attachment in content_types continue to emit only space_content and are unaffected by attachment extraction.

Page rows

When both page and attachment are configured in content_types, page rows can include:

attachment_count: The number of attachments associated with the page that the connector has uploaded binaries for.
attachment_s3_uris: The list of S3 URIs of the uploaded attachment binaries associated with the page.

When content_types is left at its default ([page]), page rows do not carry attachment-context fields and no page_attachments resource is emitted.

Attachments

Attachment rows (page_attachments) can include binary-reference fields such as s3_bucket, s3_key, and s3_uri when the binary upload completes successfully.

Attachments larger than max_attachment_size_mb skip binary upload. Their metadata still flows into the page_attachments resource, but those rows do not carry s3_bucket, s3_key, or s3_uri. Page rows reflect only the attachments whose binaries were successfully uploaded in their attachment_count and attachment_s3_uris.

Cross-chunk attachments

Historical syncs are typically processed in chunks. The Confluence connector tracks the mapping between pages and their attachments across chunks, so:

Pages in chunked historical syncs include S3 URIs for attachments that were uploaded in another chunk of the same historical sync. You will not see empty attachment_s3_uris and a zero attachment_count purely because a page and its attachments were processed in different chunks.
Repeated or overlapping runs keep one copy of each attachment S3 reference per page before attachment_count is calculated, so replays do not inflate the count.

Troubleshooting

401 Unathorized when testing or running the connector

Confirm the service account is still active and has not been rotated, locked, or had its password reset since the connector was configured.
Re-test using the credentials directly against the Confluence REST API. If that call also returns 401, the issue is on the Confluence side, not the connector.
Special characters in secrets, including backticks (`), ampersands (&), and other shell-meaningful characters, can be URL-encoded inconsistently by some clients. If your API token or shared secret contains these, regenerate it without them and re-submit.
If you connect through a reverse proxy with a static-IP requirement, confirm the proxy IP has not changed since the connector was set up.

Connector authenticates but returns zero records

A 0-record result usually means authentication succeeded but the service account lacks read access to the target spaces. Confirm the service account has read permission on every space referenced by the pipeline configuration (space_keys).

Grant space-level read access to the BigPanda service account and re-test the connection.

Sudden drop or gap in record volume

Open a support ticket including the affected connector, the date and time the change was first observed, and any source-side changes deployed in the same window (firewall, IP allow-list, ACL, password rotation, Confluence upgrade). Most volume regressions trace back to a source-side change of this kind.

page_attachments row is present but s3_uri is empty

The attachment metadata was extracted successfully, but the binary upload did not complete. Common causes:

The file exceeded max_attachment_size_mb(default 50). Raise the limit if the file is expected, or accept metadata-only coverage for very large files.
The destination storage was not reachable from the connector run. Re-run the pipeline after confirming the destination is healthy.

Page rows reflect only attachments whose binaries were successfully uploaded, so a missing s3_uri on the attachment row also explains a lower-than-expected attachment_count on the parent page row.

FAQs

How does the Unified Data Connector protect sensitive data and PII?

You can create filters for your spaces and pages to ensure that sensitive data is not included in the sync. Additionally, the Data Redaction feature can automatically scan for PII and remove it, and an optional Skyflow redaction stream can be layered on top for context-aware detection of PII and PHI.

How long does the initial sync take?

The initial sync can take anywhere between a few hours to multiple weeks. It depends on the rate limit in place, the amount of data synced, and the sync start date.

How often does the data sync?

The default sync frequency is set to 2 minutes, but it can be modified. You can set the frequency from 1 minute to 24 hours.

In this section: