Wiki source code of Confluence Import Process

Last modified by Vincent Massol on 2026/04/08 13:49

Manage
- Copy
Actions
- Export
- Print Preview
Viewers
- Source
- Children
- Content
- Attachments
- History
- Information
- Likes

version	line-number	content
1.1	1	Confluence XML is implemented as an input filter stream. [[ConfluenceInputFilterStream>>https://github.com/xwiki-contrib/confluence/blob/master/confluence-xml/src/main/java/org/xwiki/contrib/confluence/filter/internal/input/ConfluenceInputFilterStream.java]] is instantiated with [[input properties>>https://github.com/xwiki-contrib/confluence/tree/master/confluence-xml/src/main/java/org/xwiki/filter/confluence/input/ConfluenceInputProperties.java]] describing what to import and how.
	2
	3	ConfluenceInputFilterStream then sets up [[ConfluenceXMLPackage>>https://github.com/xwiki-contrib/confluence/tree/master/confluence-xml/src/main/java/org/xwiki/contrib/confluence/filter/input/ConfluenceXMLPackage.java]], which will extract the confluence package and index it.
	4
	5	ConfluenceXMLPackage is built in such a way it is able to handle huge export package:
	6
	7	* the XML parsing is streamed
	8	* instead of keeping everything in RAM, individual objects are written in individual Apache Commons Configuration Properties files in a temporary directory (we could probably use some database engine like SQLite for this, it would be possibly even more efficient)
	9
	10	ConfluenceInputFilterStream is built in the same spirit: it browses things from the package and send them streamed using the filter stream API. If the output filter that is used is also built like this, the whole process is a pipeline that can handle huge imports, and it is normally the case.
	11
	12	More precisely, ConfluenceInputFilterStream:
	13
	14	* Imports users
	15	* Imports groups
	16	* Browses spaces
	17
	18	For each space:
	19
	20	* imports the home page
	21	* imports the orphans (which are pages with no parents which are not the home page)
	22	* imports the space blog
	23	* imports the space templates
	24	* import permissions from the space permissions and the home page permission
	25
	26	For each page:
	27
	28	* imports revisions
	29	** imports page metadata (dates, author, title, ...)
	30	** imports page permissions
4.3	31	** imports content (by instantiating the corresponding syntax, see [[details about the Confluence XHTML Parser>>doc:xwiki:documentation.extensions.dev.confluence.xhtml-parsing.WebHome]], it is also possible for a page to be in old confluence syntax or to contain plain text that's not to be converted (for pages that are used as some data storage)
1.1	32	** imports comments
	33	** imports attachments
	34	** imports labels as tags
	35	* imports children (recursive operation)
	36
3.1	37	And then, the confluence package is closed (property files are removed, by default synchronously but a parameter can make this process asynchronous so you don't need to wait for the clean up to end for the job to end. There is also a parameter to avoid the cleanup altogether, this is useful for debugging purposes mostly: this lets you inspect the extracted properties and skip the parsing phase if you are to run several imports of the same package).
1.1	38
	39	The starting point is the [[##read## method in ##ConfluenceInputFilterStream##>>https://github.com/xwiki-contrib/confluence/blob/ecd749e56f3671df9d08ca57912d49d4d38ff42a/confluence-xml/src/main/java/org/xwiki/contrib/confluence/filter/internal/input/ConfluenceInputFilterStream.java#L268]]. This is where any new developer should probably go to start hacking on Confluence XML.

Wiki source code of Confluence Import Process

Documentation for Product:

Documentation for...

Extensions for Administrators

About

About

Support

Platform

User Guide

Admin Guide

Developer Guide

Projects

XWiki

Extensions

Other

Contribute

Status

Practices

Under the Hood

Get Involved

Get Connected