Confluence reference and URL handling

Last modified by Raphaël Jakse on 2026/03/23 13:00

Explanation

Warning

This document probably needs to be split into several documents, and also some of it probably needs to be moved to the dev documentation.

Introduction

XWiki and Confluence reference documents quite differently. XWiki references document using their full names, whereas Confluence have several ways of doing this:

  • from its space key and its page title, in a case insensitive way
  • from its (stable) page id
  • using special @ keywords like @home, @self, @parent.

Spaces can also be referenced, through their space key or through special keywords like currentSpace() (and actually, since we migrated Confluence space home pages to the target XWiki space's WebHome page, @home requires exactly the same resolution as currentSpace())

Attachments are occasionally referenced through their id as well (which we mostly don't currently handle).

This makes things difficult to handle at import time as well as when displaying the imported content, that may reference documents and spaces in macro parameters (including inside CQL queries) and links we failed to convert.

In this document, we describe how we handle Confluence references and what are our strategies for tackling the difficulties

Information

We can sometime see the use of @global or @all in Confluence macro parameters. They respectively refer to "the wiki globally", or "all the spaces". We don't have an equivalent in XWiki. We don't currently have a feature to list all the migrated spaces (although it should not be impossible to do), and it's not even sure it would be the right thing to do (maybe the related feature ought to take other XWiki content in account as well). As for using the main wiki, it would only work if nothing is imported into subwikis. 

Confluence references

Overview

Sometimes, Confluence XML can't convert a Confluence reference to a proper XWiki reference. Usually, that's because we lack the information needed to build the correct reference. This information is:

  • where the space in which the referenced page lives is migrated (it is not necessarily at the root of the main wiki, because of the root parameter).
  • the hierarchy between the root of the space and the page (all its ancestor pages)

This information can be missing when, for instance, a link references a page in a space that has not been imported yet.

In this case, Confluence XML issues references. There are currently 3 kinds of Confluence references:

  • confluenceSpace: a reference to a confluence space, to be converted to the XWiki space reference where the Confluence space was imported
  • confluencePage: a reference to a confluence page, to be converted to the corresponding XWiki document reference
    • confluencePage:page:SPACE.PAGETITLE for a page with the given space key and page title
    • confluencePage:id:ID@FILENAME for a page with the given page id
  • confluenceAttach: a reference to a Confluence attachment
    • confluenceAttach:spaceHome:SPACE@FILENAME for an attachment on the home page of the given space 
    • confluenceAttach:page:SPACE.PAGETITLE@FILENAME for an attachment on a page with the given space key and page title
    • confluenceAttach:id:ID@FILENAME for an attachment on a page with the given page id

Current implementation

Confluence references are currently implemented in the confluence-resource-reference-type-parsers module as resource reference type parsers, which means that they are transparently converted to XWiki reference whenever possible when parsing a XWiki document. This has a few implications:

  • successfully converted Confluence references do not show up in the the document's XDOM, it's like if the corresponding XWiki references were always there and not the Confluence references.
  • parsing a XWiki document with Confluence references is potentially costly, because confluence resolvers are called and Solr queries are fired by the default confluence resolvers
  • when saving a document containing successfully converted Confluence references, the conversions will happen automatically (documents are somehow automatically "fixed"), and the conversions will appear in the diff.

Alternative implementation

It would be possible to handle Confluence references differently and have it work anyway. For instance, instead of implementing resource reference type parsers, we could have implemented XHTMLLinkRenderer, which converts links when rendering to HTML, not when not parsing the XDOM. This has a few (blocking) drawbacks:

  • XHTMLLinkRenderer is internal and should not be relied upon has since it's internal, it could disappear
  • The pages won't self heal when edited and saved, the confluence references are staying (although it can be seen as an advantage too)
  • The handling is specific to the HTML output, you would need to implement something similar and specific for each output format

Should you want to go ahead and implement such a HTML link renderer, you could take inspiration from this untested implementation. You will also need to provide a minimal, alternative implementation for them with a higher priority to override the standard ones (which you can also make sure not to install), such as:

@Component
@Named("confluenceAttach")
@Singleton
public class CustomConfluenceAttachResourceReferenceTypeParser implements ResourceReferenceTypeParser
{
    private static final ResourceType CONFLUENCE_ATTACH = new ResourceType("confluenceAttach");

    @Override
    public ResourceType getType()
    {
        return CONFLUENCE_ATTACH;
    }

    @Override
    public ResourceReference parse(String reference)
    {
        return new ResourceReference(reference, CONFLUENCE_ATTACH);
    }
}

This would need to be done for each of the three Confluence reference types.

Getting rid of Confluence references

It's not nice to have Confluence references in XWiki content. They do mostly work, but they are costly, are not handled by a lot of stuff and are simply not idiomatic. We also do issue Confluence references at places where they are not supported, like in the reference parameter of the display macro.

We advise getting rid of Confluence when possible. No tool exists for this in standard XWiki or in a contrib extension, but the rough idea on how to do it is:

  • Browse your documents and parse them as XDOM
  • In each document, look for references beginning with confluencePage:, confluenceSpace: or confluenceAttach: inside references or macro parameters
  • Parse them, following the description detailed in the overview section
  • Call the confluence resolvers
  • Update the document
Information

If you have the Confluence resource reference type parsers installed, there is no need to browse references, because the ones that can be fixed are fixed at parse time. You only need to look inside macro parameters. To know whether you need to save documents, you can compare the number of occurrence of "confluence" in the source, and then in the serialized XDOM.

Information

You may also want to find Confluence absolute URLs and to try converting them as well

You can take inspiration from this implementation.

Confluence resolvers

We do our best not to rely on the way Confluence references documents and spaces after importing the content, but that's not completely avoidable:

  • often, it is not possible to convert links at import times.
  • Some macros and bridges reference Confluence stuff the Confluence way (space key + page title, or id, or special keywords). Including macros using CQL, which has such properties as "ancestor", "space", "parent", "title".
  • we have a strategy to handle old Confluence links that could be hanging out in people's emails, chat history and bookmarks.

At import time, links to other spaces also often exist, and resolving links we can resolve at import time is useful too.

We therefore provide interfaces to query the wiki for Confluence things. Best is to have a look at the Javadoc, but in short:

  • ConfluencePageIdResolver defines a getDocumentById(long id) method to find a document using its Confluence id
  • ConfluencePageTitleResolver defines a getDocumentByTitle(String spaceKey, String title) method to find a document using its Confluence space key and Confluence page title
  • ConfluenceSpaceKeyResolver defines a getSpaceByKey(String spaceKey) method to find where a Confluence space was imported and returns a reference of type EntityType.SPACE
  • ConfluenceSpaceResolver defines two methods:
    • getSpace(EntityReference reference) return the root of the Confluence space which the given document belongs to
    • getSpaceKey(EntityReference reference) returns the Confluence space key of the space in which the given document lives

 The default implementation of each of these interfaces loops over the all the available implementations and stops as soon as one finds a result.

An implementation for all these interfaces relying on Confluence.Code.ConfluencePageClass is provided. These objects are, by default, available for each document imported using Confluence XML (but this can be disabled). They store important information like the page id, the space key and the title of documents coming from Confluence.

This means that by default, you should be fully able to inject these interfaces as components in your own components and query the wiki and expect this to work.

Warning

This won't work with content imported with the "Store confluence details" parameter of Confluence XML set to false.

This was notably the case for all contents imported before mid-2024, except for content imported using the unmaintained application-confluence-migrator starting from version 1.0-rc-2 released in November 2021.

Information

Confluence resolvers are designed to be extended and to welcome alternative implementations, as well as to support multiple implementations at the same time. This means that you can provide your custom way of resolving things. For instance, you can write an implementation that uses data from extracted from the Confluence database. You might be able to bring support for content imported a while ago for instance.

Since Confluence XML calls Confluence resolvers at import time too, you can also help Confluence XML convert links at import time for spaces that have not been imported yet this way, if you know in advance where each space is going to be imported.

Warning

Confluence is generally case insensitive, but the default implementation of the Confluence resolvers is case sensitive. This is a big limitation that should be easy to lift as soon as we start indexing string properties case insensitively in Solr.

Confluence resolvers formerly used HQL which does have a lower function, but:

  • this doesn't work well in a multi wiki setting (we would have to loop over all the wikis)
  • lower ruins performance anyway, because it invalidates the use of indexes

Confluence URL conversion

At import time, Confluence XML goes out of it way to convert Confluence absolute URLs to proper XWiki references.

Like confluence resolvers, this is extensible and outside code can provide URL converters by implementing ConfluenceURLConverter. Confluence XML provides a convenient base class for doing this, AbstractConfluenceURLConverter.

A URL converter implements the convertURL method that takes a strings, and return the corresponding resource reference, or null if the converter does not know how to convert this URL. The default implement loops over all the available converters and stops as soon as one can convert the URL. Confluence XML comes with a converter that converts the widespread standard Confluence URLs.

Macro converters can use ConfluenceURLConverter to convert URL. They need to make sure to handle the null case when the URL could not be converted, so as to not fortuitously drop the URL.

Get Connected