Tuesday, December 27, 2011

Using XSL Formatting Objects (XSL-FO) to create PDF's in a .NET process (Part 1)

UPDATE Because I think there is a lack of affordable, easy-to-use, XSL-FO processors, I decided to create a FO processor web service. Note that it is not fully functional yet, I am currently working on the SOAP implementation.
UPDATE 2012-02-13 Because Apache FOP configuration has already been described perfectly elsewhere, I decided not to write a Part 2 of this post for now. If you have questions about configuring Apache FOP, please leave a comment at the bottom of this post.
A year ago I was looking at our Dutch recipe management website and thought: Wouldn't it be great if our users could order their own recipes in a printed book? I had discovered a few Print On Demand (POD) providers that accepted PDF uploads. However, our user's recipes were stored in a SQL Server database. How to get them into nicely formatted PDF files?
I knew of an XML format that could be converted to PDF: The XSL Formatting Objects (XSL-FO) standard allowed for the definition of complex page layouts for print. If I could put the recipe data in XSL-FO, a Formatting Objects processor would probably be able to create PDF's from them.
When you are looking for such a processor you will probably evaluate the following products:
Antenna House Formatter
RenderX XEP
Ibex PDF Creator

But those are really expensive! In my case I needed an affordable solution that runs server-side on the .NET framework and also implements a large part of the XSL-FO Version 1.1 W3C Recommendation. I then tested Apache FOP, and because it was free and looked stable, I decided to try if I could run it as a .NET process.
The .NET Process class is a useful tool for starting and controlling external applications like Apache FOP. Since FOP runs on the Java Runtime Environment (JRE) we will actually be starting the JRE from .NET. This is kind of funny since the two platforms are considered to be rivals.

Let's assume that I have downloaded and installed both Apache FOP and the JRE. FOP is in the C:\Fop-1.0 folder on my Windows machine and the JRE is in C:\Program Files\Java\jre6. First we make sure that FOP will be able to find the JRE. Find the line in C:\Fop-1.0\fop.bat that starts with 'set LOCAL_FOP_HOME=' and add the following line before it:
set JAVA_HOME=C:\Program Files\Java\jre6

We can pass the following options to the fop.bat file:
-c <path-to-configuration-file> -fo <path-to-xsl-fo-file> <path-to-pdf-output-file>

We assume that all input files are in C:\Fop-1.0. We now create an instance of the ProcessStartInfo class in our .NET application to specify the same command-line arguments:
ProcessStartInfo starter = new ProcessStartInfo("C:\Fop-1.0\fop.bat",
  "-c C:\Fop-1.0\fop.conf -fo C:\Fop-1.0\Hello-world.fo C:\Fop-1.0\Hello-world.pdf"); 
starter.CreateNoWindow =  true;
starter.UseShellExecute = false;
starter.RedirectStandardError = true;
starter.RedirectStandardOutput = true;
Finally, to run the Apache FOP .bat file and create C:\Fop-1.0\Hello-world.pdf out of C:\Fop-1.0\Hello-world.fo, we start a Process:
Process process = Process.Start(starter);
string errorMessage = process.StandardError.ReadToEnd();
process.WaitForExit();
if (!String.IsNullOrEmpty(errorMessage))
{
    // An error occured. The error message is in errorMessage . 
}
process.Close();
If there are no (syntax) errors in the Hello-world.fo file it will be converted to a full-blown PDF!
In Part 2 of this post I will give more details about the FOP configuration (fop.conf) file and embedding fonts in the generated PDF.

No comments:

Post a Comment