Wiki source code of Confluence Import Process

Last modified by Vincent Massol on 2026/04/08 13:49

Manage
- Copy
Actions
- Export
- Print Preview
Viewers
- Source
- Children
- Content
- Attachments
- History
- Information
- Likes

author	version	line-number	content
		1	Confluence XML is implemented as an input filter stream. [[ConfluenceInputFilterStream>>https://github.com/xwiki-contrib/confluence/blob/master/confluence-xml/src/main/java/org/xwiki/contrib/confluence/filter/internal/input/ConfluenceInputFilterStream.java]] is instantiated with [[input properties>>https://github.com/xwiki-contrib/confluence/tree/master/confluence-xml/src/main/java/org/xwiki/filter/confluence/input/ConfluenceInputProperties.java]] describing what to import and how.
		2
		3	ConfluenceInputFilterStream then sets up [[ConfluenceXMLPackage>>https://github.com/xwiki-contrib/confluence/tree/master/confluence-xml/src/main/java/org/xwiki/contrib/confluence/filter/input/ConfluenceXMLPackage.java]], which will extract the confluence package and index it.
		4
		5	ConfluenceXMLPackage is built in such a way it is able to handle huge export package:
		6
		7	* the XML parsing is streamed
		8	* instead of keeping everything in RAM, individual objects are written in individual Apache Commons Configuration Properties files in a temporary directory (we could probably use some database engine like SQLite for this, it would be possibly even more efficient)
		9
		10	ConfluenceInputFilterStream is built in the same spirit: it browses things from the package and send them streamed using the filter stream API. If the output filter that is used is also built like this, the whole process is a pipeline that can handle huge imports, and it is normally the case.
		11
		12	More precisely, ConfluenceInputFilterStream:
		13
		14	* Imports users
		15	* Imports groups
		16	* Browses spaces
		17
		18	For each space:
		19
		20	* imports the home page
		21	* imports the orphans (which are pages with no parents which are not the home page)
		22	* imports the space blog
		23	* imports the space templates
		24	* import permissions from the space permissions and the home page permission
		25
		26	For each page:
		27
		28	* imports revisions
		29	** imports page metadata (dates, author, title, ...)
		30	** imports page permissions
		31	** imports content (by instantiating the corresponding syntax, see [[details about the Confluence XHTML Parser>>doc:xwiki:documentation.extensions.dev.confluence.xhtml-parsing.WebHome]], it is also possible for a page to be in old confluence syntax or to contain plain text that's not to be converted (for pages that are used as some data storage)
		32	** imports comments
		33	** imports attachments
		34	** imports labels as tags
		35	* imports children (recursive operation)
		36
		37	And then, the confluence package is closed (property files are removed, by default synchronously but a parameter can make this process asynchronous so you don't need to wait for the clean up to end for the job to end. There is also a parameter to avoid the cleanup altogether, this is useful for debugging purposes mostly: this lets you inspect the extracted properties and skip the parsing phase if you are to run several imports of the same package).
		38
		39	The starting point is the [[##read## method in ##ConfluenceInputFilterStream##>>https://github.com/xwiki-contrib/confluence/blob/ecd749e56f3671df9d08ca57912d49d4d38ff42a/confluence-xml/src/main/java/org/xwiki/contrib/confluence/filter/internal/input/ConfluenceInputFilterStream.java#L268]]. This is where any new developer should probably go to start hacking on Confluence XML.

Wiki source code of Confluence Import Process

Documentation for Product:

Documentation for...

Extensions for Administrators

About

About

Support

Platform

User Guide

Admin Guide

Developer Guide

Projects

XWiki

Extensions

Other

Contribute

Status

Practices

Under the Hood

Get Involved

Get Connected