I recently took on a task where I had to use an HTML web form, whose back-end is PHP, to fill out a PDF form. The entire process is extremely annoying, and I was really hoping to use some kind of self-contained library that I could drop into a shared webhost for a client and be done with it. Unfortunately, at the tail end of 2022, there doesn’t seem to be any such library that handles everything. Below is my research as well as the methodology I ended up going with.
If you’d like, you can skip right to the good stuff and download a set of files here – just note that you’ll need to install PDFtk on your server to use it (spoilers?)
I tried a few different options, and unfortunately the following didn’t really work for me. Maybe this list will prevent others from going down the same rabbit holes.
As you can see, the landscape is a little bleak. Lots of projects that deal with PDFs in one way or another, some with forms in mind, but none that do exactly what I want.
My only solution was to use a system library, PDFtk, which would do the job for me. Basically, the goal is to dump the form data into a file format called FDF, and then to use the PDFtk library to combine the data from the FDF with the PDF form. The main caveat is that PDFtk needs to be installed on the server you’re running the code on. If you have sudo
access to your system, this is easy, but if you don’t, your shared host might actually have it installed already (my client’s BlueHost server did). Just check with tech support.
So, the goal is as follows:
You can download and run the sample project from here. Note that you’ll need to install PDFtk – you can find more information about it here, or you can Google “PDFtk install” + your operating system of choice.
I created a sample PDF using Adobe Acrobat Pro. It’s a very simple file that contains each of the types of inputs a form can have in a PDF: a textbox, checkboxes, radio buttons, and a dropdown (called a list box within Acrobat).
Of important note is the names of these fields – this is how they will be identified later on. It’s easiest not to use spaces in the names.
Next, I need to make a corresponding HTML file file that matches the naming scheme of the PDF file. Easy enough, check out index.html in the sample project. It’s a boilerplate HTML5 document, with a bunch of input fields. See the snippet below:
...
<div>
<input type="checkbox" name="Checkbox_2_checkbox">
<label for="Checkbox_2_checkbox">Checkbox 2</label>
</div>
<div>
<input type="radio" name="Radio_Group_1_radio" value="Choice_1">
<label for="Radio_Group_1_radio">Choice 1</label>
</div>
<div>
<input type="radio" name="Radio_Group_1_radio" value="Choice_2">
<label for="Radio_Group_1_radio">Choice 2</label>
</div>
...
Of note here is that each field’s name ends in _fieldtype
, whether that’s _textbox
, _checkbox
, _radio
, or _dropdown
. This makes it easier to process in the next step.
At this point, if we were to dump the contents of $_POST
in pdfFiller.php, it would look like this:
Array
(
[Name_textbox] => Jake Binstein
[Checkbox_1_checkbox] => on
[Checkbox_2_checkbox] => off
[Radio_Group_1] => Choice_1
[Dropdown_dropdown] => Option_2
)
So, what we want to do is loop through these and dump their data into an FDF file. But what format is the FDF file in? The easiest way to determine that is actually to fill out the PDF yourself, hit save, and run the following command, replacing the data where necesary:
.\pdftk.exe .\yourPDF.pdf generate_fdf output filledOutData.fdf
That line was written for Windows (hence pdftk.exe
), but otherwise should work on all platforms supported by PDFtk. The output, once cleaned up a bit, is as follows:
%FDF-1.2
1 0 obj << /FDF << /Fields [
<< /V (Jake Binstein) /T (Name) >>
<< /V /Yes /T (Checkbox_1) >>
<< /V / /T (Checkbox_2) >>
<< /V /Choice_1 /T (Radio_Group_1) >>
<< /V (Option_2) /T (Dropdown) >>
] >> >> endobj trailer
<< /Root 1 0 R >>
%%EOF
We can ignore the first 2 lines and the final 3 lines, because they are just boilerplate FDF information. We’ll be sure to include them in our code, though. The real thing to focus on are the individual fields. The format appears to be, in most cases:
<< /V (Value) /T (Field Name) >>
Exceptions are checkboxes and radio buttons, which use the format /Value
instead of (Value)
. Also note that checking a checkbox requires /Yes
whereas unchecking it requires just /
.
With that in mind, we can quickly create a basic PHP file that will output an FDF file.
What we need to do now is loop through the different key/value pairs passed ot us in $_POST, and depending on the type of field, add a string to FDF data in the correct format. Below is an example regarding a textbox, but the code files I provide cover all of the field types.
// Loop through the $_POST data, creating a new row in the FDF file for each key/value pair
$fdf = "";
foreach($_POST as $key => $value) {
// If the user filled nothing in the field, like a text field, just skip it.
// Note that if the PDF you provide already has text in it by default,
// doing this will leave the text as-is.
// If you prefer to remove the text, you should remove the lines below so you
// overwrite the text with nothing.
if($value == "") {
continue;
}
// Figure out what kind of field it is by its name,
// which should be in the format name_fieldtype.
// Textbox
if(stringEndsWith($key, "_textbox")) {
$key = str_replace("_textbox", "", $key);
// Format:
// << /V (Text) /T (Fieldname) >>
// Backslashes in the value are encoded as double backslashes
$value = str_replace("\\", "\\\\", $value);
// Parentheses are encoded using \'s in front
$value = str_replace("(", "\(", $value);
$value = str_replace(")", "\)", $value);
$fdf .= "<< /V (" . $value . ")" . " /T (" . $key . ") >>" . "\r\n";
}
// Checkbox
else if(stringEndsWith($key, "_checkbox")) { ... }
// Radio Button
else if(stringEndsWith($key, "_radio")) { ... }
// Dropdown
else if(stringEndsWith($key, "_dropdown")) { ... }
}
Once we have our FDF data stored safely in $fdf
, we’ll need to output it to a file. I created a folder named output
– you may need to change the permissions or ownership of this folder in order to allow PHP to write to it. In addition to the text in $fdf
, we’ll also need the boilerplate information that I mentioned above. I hid those away in functions to make things a bit cleaner. Finally, we’ll need a filename – because I don’t want the file to overwrite itself every time the script is ran, I used a timestamp in the name of the file. You may want to add other unique elements to the file (like data from the form), or some extra checks to make sure that no such file exists already.
// Set location for FDF and PDF files
$outputLocation = "output/";
// Dump FDF data to file
$timestamp = time();
$outputFDF = $outputLocation . $timestamp . ".fdf";
$outputPDF = $outputLocation . $timestamp . ".pdf";
file_put_contents($outputFDF, $fdf);
Finally, we get to the good part – using PDFtk to combine this FDF form data with the original PDF and outputting a new PDF. If you haven’t yet installed PDFtk, now’s the time – you can find more information about it here, or you can Google “PDFtk install” + your operating system of choice.
The shell command for PDFtk is as follows:
pdftk originalForm.pdf fill_form formData.fdf output filledFormWithData.pdf
Roughly translated, this means: with the PDFtk program, use the file originalForm.pdf. We want to fill that form, and specifically we want to fill it using the FDF data in formData.fdf. Once the form is filled, save its output in filledFormWithData.pdf
In PHP, we can execute this shell command by way of the exec()
function. Finally, to make debugging a bit easier, I output not only where the PDF is on the server, but also a link back home (to easily fill the form out again), and I embed the new PDF as an iframe so it’s easy to see. Any of this can be easily customized.
You should be very careful any time you use the exec()
function, as it is executing a shell script on the webserver itself. This may also require additional permissions from your webhost’s tech support if they have configured PHP to not allow the exec()
function for security reasons.
// Location of original PDF form
$pdfLocation = "Example.pdf";
// Generate the PDF
exec("pdftk " . $pdfLocation . " fill_form " . $outputFDF . " output " . $outputPDF);
echo "<p>Done! Your application will be reviewed shortly.</p>";
echo "<p>It is stored in: " . $outputPDF . "</p>";
echo "<p><br/><a href='/'>Home</a></p>";
echo "<iframe src='" . $outputPDF . "' width='100%' height='100%'></iframe>";
Although it’s unfortunate that there’s no pure PHP, drop-in solution, I found PDFtk to be very easy to work with. Check out the sample project here and let me know what you think. I rearranged the code a bit to make it read nicely, and over-commented so it’s very obvious what I’m doing.
This tutorial is magnificent
I wish FPDM was updated to work with PHP 8.2, as all I am trying to do is fill out existing fields in a PDF and flatten it.
I spent awhile trying different solutions, and what I found that fits my needs is a php library based around the PDF Toolkit (pdftk) used in this tutorial.
https://github.com/mikehaertl/php-pdftk
If you don’t want to execute commands from PHP like myself, give this library a shot. There may be better solutions as more libraries are updated to work with PHP 8.2, but I found this the best for now. The only issue it has, and this may be a deal breaker for some people, is if you need UTF-8 encoding for non-ASCII characters, it may not be able to flatten PDFs correctly.
I asked my support guys (hosting company) to install PDFtk, got confirmation from them, then uploaded your Sample Project AS IS (https://core.fragolan.com/pdf), updated permissions to the output/ folder to 777… for some reason I just get the FDF file but not the PDF file… ideas?
I see the PDF file at that link just fine!
Do you know how you can replace an image form field in a PDF using this application?
I can replace text easy enough but can’t get an image form field to accept an image from my scripts.