Confluence reference and URL handling
Explanation
Introduction
XWiki and Confluence reference documents quite differently. XWiki references document using their full names, whereas Confluence have several ways of doing this:
- from its space key and its page title, in a case insensitive way
- from its (stable) page id
- using special @ keywords like @home, @self, @parent.
Spaces can also be referenced, through their space key or through special keywords like currentSpace() (and actually, since we migrated Confluence space home pages to the target XWiki space's WebHome page, @home requires exactly the same resolution as currentSpace())
Attachments are occasionally referenced through their id as well (which we mostly don't currently handle).
This makes things difficult to handle at import time as well as when displaying the imported content, that may reference documents and spaces in macro parameters (including inside CQL queries) and links we failed to convert.
In this document, we describe how we handle Confluence references and what are our strategies for tackling the difficulties
Confluence references
Overview
Sometimes, Confluence XML can't convert a Confluence reference to a proper XWiki reference. Usually, that's because we lack the information needed to build the correct reference. This information is:
- where the space in which the referenced page lives is migrated (it is not necessarily at the root of the main wiki, because of the root parameter).
- the hierarchy between the root of the space and the page (all its ancestor pages)
This information can be missing when, for instance, a link references a page in a space that has not been imported yet.
In this case, Confluence XML issues references. There are currently 3 kinds of Confluence references:
- confluenceSpace: a reference to a confluence space, to be converted to the XWiki space reference where the Confluence space was imported
- confluencePage: a reference to a confluence page, to be converted to the corresponding XWiki document reference
- confluencePage:page:SPACE.PAGETITLE for a page with the given space key and page title
- confluencePage:id:ID@FILENAME for a page with the given page id
- confluenceAttach: a reference to a Confluence attachment
- confluenceAttach:spaceHome:SPACE@FILENAME for an attachment on the home page of the given space
- confluenceAttach:page:SPACE.PAGETITLE@FILENAME for an attachment on a page with the given space key and page title
- confluenceAttach:id:ID@FILENAME for an attachment on a page with the given page id
Current implementation
Confluence references are currently implemented in the confluence-resource-reference-type-parsers module as resource reference type parsers, which means that they are transparently converted to XWiki reference whenever possible when parsing a XWiki document. This has a few implications:
- successfully converted Confluence references do not show up in the the document's XDOM, it's like if the corresponding XWiki references were always there and not the Confluence references.
- parsing a XWiki document with Confluence references is potentially costly, because confluence resolvers are called and Solr queries are fired by the default confluence resolvers
- when saving a document containing successfully converted Confluence references, the conversions will happen automatically (documents are somehow automatically "fixed"), and the conversions will appear in the diff.
Alternative implementation
It would be possible to handle Confluence references differently and have it work anyway. For instance, instead of implementing resource reference type parsers, we could have implemented XHTMLLinkRenderer, which converts links when rendering to HTML, not when not parsing the XDOM. This has a few (blocking) drawbacks:
- XHTMLLinkRenderer is internal and should not be relied upon has since it's internal, it could disappear
- The pages won't self heal when edited and saved, the confluence references are staying (although it can be seen as an advantage too)
- The handling is specific to the HTML output, you would need to implement something similar and specific for each output format
Should you want to go ahead and implement such a HTML link renderer, you could take inspiration from . You will also need to provide a minimal, alternative implementation for them with a higher priority to override the standard ones (which you can also make sure not to install), such as:
@Component
@Named("confluenceAttach")
@Singleton
public class CustomConfluenceAttachResourceReferenceTypeParser implements ResourceReferenceTypeParser
{
private static final ResourceType CONFLUENCE_ATTACH = new ResourceType("confluenceAttach");
@Override
public ResourceType getType()
{
return CONFLUENCE_ATTACH;
}
@Override
public ResourceReference parse(String reference)
{
return new ResourceReference(reference, CONFLUENCE_ATTACH);
}
}This would need to be done for each of the three Confluence reference types.
Getting rid of Confluence references
It's not nice to have Confluence references in XWiki content. They do mostly work, but they are costly, are not handled by a lot of stuff and are simply not idiomatic. We also do issue Confluence references at places where they are not supported, like in the reference parameter of the display macro.
We advise getting rid of Confluence when possible. No tool exists for this in standard XWiki or in a contrib extension, but the rough idea on how to do it is:
- Browse your documents and parse them as XDOM
- In each document, look for references beginning with confluencePage:, confluenceSpace: or confluenceAttach: inside references or macro parameters
- Parse them, following the description detailed in the overview section
- Call the confluence resolvers
- Update the document
You can take inspiration from this implementation.
Confluence resolvers
We do our best not to rely on the way Confluence references documents and spaces after importing the content, but that's not completely avoidable:
- often, it is not possible to convert links at import times.
- Some macros and bridges reference Confluence stuff the Confluence way (space key + page title, or id, or special keywords). Including macros using CQL, which has such properties as "ancestor", "space", "parent", "title".
- we have a strategy to handle old Confluence links that could be hanging out in people's emails, chat history and bookmarks.
At import time, links to other spaces also often exist, and resolving links we can resolve at import time is useful too.
We therefore provide interfaces to query the wiki for Confluence things. Best is to have a look at the Javadoc, but in short:
- ConfluencePageIdResolver defines a getDocumentById(long id) method to find a document using its Confluence id
- ConfluencePageTitleResolver defines a getDocumentByTitle(String spaceKey, String title) method to find a document using its Confluence space key and Confluence page title
- ConfluenceSpaceKeyResolver defines a getSpaceByKey(String spaceKey) method to find where a Confluence space was imported and returns a reference of type EntityType.SPACE
- ConfluenceSpaceResolver defines two methods:
- getSpace(EntityReference reference) return the root of the Confluence space which the given document belongs to
- getSpaceKey(EntityReference reference) returns the Confluence space key of the space in which the given document lives
The default implementation of each of these interfaces loops over the all the available implementations and stops as soon as one finds a result.
An implementation for all these interfaces relying on Confluence.Code.ConfluencePageClass is provided. These objects are, by default, available for each document imported using Confluence XML (but this can be disabled). They store important information like the page id, the space key and the title of documents coming from Confluence.
This means that by default, you should be fully able to inject these interfaces as components in your own components and query the wiki and expect this to work.
Confluence URL conversion
At import time, Confluence XML goes out of it way to convert Confluence absolute URLs to proper XWiki references.
Like confluence resolvers, this is extensible and outside code can provide URL converters by implementing ConfluenceURLConverter. Confluence XML provides a convenient base class for doing this, AbstractConfluenceURLConverter.
A URL converter implements the convertURL method that takes a strings, and return the corresponding resource reference, or null if the converter does not know how to convert this URL. The default implement loops over all the available converters and stops as soon as one can convert the URL. Confluence XML comes with a converter that converts the widespread standard Confluence URLs.
Macro converters can use ConfluenceURLConverter to convert URL. They need to make sure to handle the null case when the URL could not be converted, so as to not fortuitously drop the URL.