Build a Node.js Interactive Web-Based Simple PDF Merger

Written by

in

In today’s digital world, we often deal with multiple PDF documents that need to be combined into a single file. Whether it’s compiling reports, merging invoices, or organizing research papers, the ability to merge PDFs efficiently is a valuable skill. This tutorial will guide you through building a simple, interactive web-based PDF merger using Node.js, Express, and a few other handy libraries. This project is perfect for beginners and intermediate developers looking to expand their Node.js knowledge and create a practical tool.

Why Build a PDF Merger?

While various online PDF merging tools exist, building your own offers several advantages:

  • Learning Experience: You’ll gain hands-on experience with Node.js, Express, and key libraries for file handling and PDF manipulation.
  • Customization: You can tailor the tool to your specific needs, adding features or functionalities that aren’t available in standard tools.
  • Privacy: You control your data. Your files stay on your server, ensuring privacy and security.
  • Efficiency: Automating the process can save time and effort, especially when dealing with frequent PDF merging tasks.

Prerequisites

Before we begin, ensure you have the following installed on your system:

  • Node.js and npm: Node.js is the JavaScript runtime environment, and npm (Node Package Manager) is used to manage project dependencies. You can download them from nodejs.org.
  • A Code Editor: A code editor like Visual Studio Code, Sublime Text, or Atom will make writing and managing your code easier.

Project Setup

Let’s set up our project directory and install the necessary dependencies.

1. Create a Project Directory

Open your terminal or command prompt and create a new directory for your project:

mkdir pdf-merger-app
cd pdf-merger-app

2. Initialize npm

Initialize a new Node.js project using npm. This will create a package.json file to manage your project’s dependencies.

npm init -y

The -y flag accepts the default settings for the project initialization.

3. Install Dependencies

We’ll need the following dependencies:

  • express: A web application framework for Node.js.
  • multer: Middleware for handling multipart/form-data, primarily used for uploading files.
  • pdf-lib: A library for creating and modifying PDF files.

Install these dependencies using npm:

npm install express multer pdf-lib

Building the Server-Side Application (Node.js & Express)

Now, let’s build the server-side application that will handle file uploads, PDF merging, and serving the merged PDF.

1. Create the Server File

Create a file named app.js (or server.js) in your project directory. This file will contain the core server logic.

2. Import Modules and Configure Express

Add the following code to app.js:

const express = require('express');
const multer = require('multer');
const { PDFDocument } = require('pdf-lib');
const fs = require('fs').promises; // Use fs.promises for async file operations
const path = require('path');

const app = express();
const port = 3000; // Or any port you prefer

// Configure multer for file uploads
const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Store uploaded files in an 'uploads' directory
  },
  filename: (req, file, cb) => {
    cb(null, Date.now() + '-' + file.originalname); // Generate unique filenames
  },
});

const upload = multer({ storage: storage });

// Create the 'uploads' directory if it doesn't exist
fs.mkdir('uploads', { recursive: true })
  .then(() => {
    console.log('Uploads directory created or already exists.');
  })
  .catch(err => {
    console.error('Error creating uploads directory:', err);
  });


app.use(express.static('public')); // Serve static files (HTML, CSS, JS) from the 'public' directory

Explanation:

  • We import the necessary modules: express for creating the web server, multer for handling file uploads, pdf-lib for PDF manipulation, fs.promises for asynchronous file system operations, and path for working with file paths.
  • We create an Express application instance (app) and define the port on which the server will listen (port).
  • We configure multer to handle file uploads. The diskStorage option specifies where to store uploaded files (in an ‘uploads’ directory) and how to name them.
  • The `fs.mkdir` function creates the ‘uploads’ directory if it doesn’t already exist. The `{ recursive: true }` option ensures that parent directories are also created if they are missing.
  • `app.use(express.static(‘public’))` serves static files (like your HTML, CSS, and JavaScript) from a ‘public’ directory.

3. Create the Upload Route

Add the following code to app.js to handle file uploads and PDF merging:


app.post('/merge', upload.array('pdfFiles', 10), async (req, res) => {
  try {
    if (!req.files || req.files.length === 0) {
      return res.status(400).send('No files were uploaded.');
    }

    const pdfDoc = await PDFDocument.create();

    for (const file of req.files) {
      const pdfBytes = await fs.readFile(file.path);
      const pdf = await PDFDocument.load(pdfBytes);
      const copiedPages = await pdfDoc.copyPages(pdf, pdf.getPageIndices());
      copiedPages.forEach(page => pdfDoc.addPage(page));
    }

    const mergedPdfBytes = await pdfDoc.save();

    // Set headers for the response to trigger a download
    res.setHeader('Content-Type', 'application/pdf');
    res.setHeader('Content-Disposition', 'attachment; filename=merged.pdf');

    res.send(mergedPdfBytes);

    // Clean up uploaded files (optional, but recommended)
    req.files.forEach(file => {
      fs.unlink(file.path).catch(err => console.error('Error deleting file:', err));
    });
  } catch (error) {
    console.error('Error merging PDFs:', error);
    res.status(500).send('Error merging PDFs.');
  }
});

Explanation:

  • We define a POST route at /merge.
  • We use upload.array('pdfFiles', 10) to handle the file uploads. The ‘pdfFiles’ argument is the name attribute used in your HTML form’s file input. The ’10’ argument limits the number of files that can be uploaded (in this case, up to 10).
  • Inside the route handler, we check if any files were uploaded. If not, we return an error.
  • We create a new PDF document using PDFDocument.create().
  • We loop through the uploaded files (req.files). For each file:
  • We read the file’s contents into a buffer using fs.readFile().
  • We load the PDF from the buffer using PDFDocument.load().
  • We copy all the pages from the loaded PDF using pdfDoc.copyPages().
  • We add the copied pages to the merged PDF using pdfDoc.addPage().
  • After processing all files, we save the merged PDF to a byte array using pdfDoc.save().
  • We set the appropriate headers in the response to indicate that we’re sending a PDF file and to prompt the user to download it with a filename of “merged.pdf”.
  • We send the merged PDF bytes as the response.
  • We include an optional cleanup step to delete the uploaded files from the ‘uploads’ directory after merging.
  • We include error handling to catch any exceptions and return an appropriate error response.

4. Start the Server

Add the following code to the end of app.js to start the server:

app.listen(port, () => {
  console.log(`Server listening on port ${port}`);
});

This code starts the Express server and listens for incoming requests on the specified port.

5. Complete app.js

Here’s the complete app.js file:

const express = require('express');
const multer = require('multer');
const { PDFDocument } = require('pdf-lib');
const fs = require('fs').promises; // Use fs.promises for async file operations
const path = require('path');

const app = express();
const port = 3000; // Or any port you prefer

// Configure multer for file uploads
const storage = multer.diskStorage({
  destination: (req, file, cb) => {
    cb(null, 'uploads/'); // Store uploaded files in an 'uploads' directory
  },
  filename: (req, file, cb) => {
    cb(null, Date.now() + '-' + file.originalname); // Generate unique filenames
  },
});

const upload = multer({ storage: storage });

// Create the 'uploads' directory if it doesn't exist
fs.mkdir('uploads', { recursive: true })
  .then(() => {
    console.log('Uploads directory created or already exists.');
  })
  .catch(err => {
    console.error('Error creating uploads directory:', err);
  });


app.use(express.static('public')); // Serve static files (HTML, CSS, JS) from the 'public' directory

app.post('/merge', upload.array('pdfFiles', 10), async (req, res) => {
  try {
    if (!req.files || req.files.length === 0) {
      return res.status(400).send('No files were uploaded.');
    }

    const pdfDoc = await PDFDocument.create();

    for (const file of req.files) {
      const pdfBytes = await fs.readFile(file.path);
      const pdf = await PDFDocument.load(pdfBytes);
      const copiedPages = await pdfDoc.copyPages(pdf, pdf.getPageIndices());
      copiedPages.forEach(page => pdfDoc.addPage(page));
    }

    const mergedPdfBytes = await pdfDoc.save();

    // Set headers for the response to trigger a download
    res.setHeader('Content-Type', 'application/pdf');
    res.setHeader('Content-Disposition', 'attachment; filename=merged.pdf');

    res.send(mergedPdfBytes);

    // Clean up uploaded files (optional, but recommended)
    req.files.forEach(file => {
      fs.unlink(file.path).catch(err => console.error('Error deleting file:', err));
    });
  } catch (error) {
    console.error('Error merging PDFs:', error);
    res.status(500).send('Error merging PDFs.');
  }
});

app.listen(port, () => {
  console.log(`Server listening on port ${port}`);
});

Building the Client-Side Application (HTML & JavaScript)

Now, let’s create the HTML and JavaScript code for the user interface.

1. Create the HTML File

Create a directory named public in your project directory. Inside the public directory, create an HTML file named index.html. This file will contain the form for uploading PDF files.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>PDF Merger</title>
  <style>
    body {
      font-family: sans-serif;
      margin: 20px;
    }
    input[type="file"] {
      margin-bottom: 10px;
    }
    button {
      padding: 10px 20px;
      background-color: #4CAF50;
      color: white;
      border: none;
      cursor: pointer;
    }
    #status {
      margin-top: 10px;
      font-weight: bold;
    }
  </style>
</head>
<body>
  <h2>PDF Merger</h2>
  <form id="mergeForm" enctype="multipart/form-data">
    <input type="file" name="pdfFiles" multiple accept=".pdf"><br>
    <button type="submit">Merge PDFs</button>
  </form>
  <div id="status"></div>
  <script>
    const form = document.getElementById('mergeForm');
    const status = document.getElementById('status');

    form.addEventListener('submit', async (event) => {
      event.preventDefault();
      status.textContent = 'Merging...';

      const formData = new FormData(form);

      try {
        const response = await fetch('/merge', {
          method: 'POST',
          body: formData,
        });

        if (!response.ok) {
          throw new Error(`HTTP error! status: ${response.status}`);
        }

        const blob = await response.blob();
        const url = window.URL.createObjectURL(blob);
        const a = document.createElement('a');
        a.href = url;
        a.download = 'merged.pdf';
        document.body.appendChild(a);
        a.click();
        document.body.removeChild(a);
        window.URL.revokeObjectURL(url);

        status.textContent = 'Merged successfully! Downloading...';

      } catch (error) {
        console.error('Error merging PDFs:', error);
        status.textContent = `Error: ${error.message}`;
      }
    });
  </script>
</body>
</html>

Explanation:

  • We create a basic HTML structure with a title and some styling.
  • We include a form with the enctype="multipart/form-data" attribute, which is essential for uploading files.
  • The form includes a file input (<input type="file" name="pdfFiles" multiple accept=".pdf">) that allows the user to select multiple PDF files. The multiple attribute enables multiple file selection, and the accept=".pdf" attribute restricts the file selection to PDF files.
  • A submit button triggers the merging process.
  • A div element with the ID “status” is used to display status messages to the user.
  • The JavaScript code listens for the form’s submit event.
  • When the form is submitted:
  • We prevent the default form submission behavior (page reload).
  • We create a FormData object to collect the form data, including the uploaded files.
  • We use the fetch API to send a POST request to the /merge endpoint.
  • We handle the response:
  • If the request is successful (status code 200-299), we create a blob from the response, create a download link, and trigger the download.
  • If there’s an error, we display an error message in the status div.

2. Create a Simple CSS File (Optional)

For better styling, create a file named style.css in the public directory and add the following CSS:

body {
  font-family: sans-serif;
  margin: 20px;
}

input[type="file"] {
  margin-bottom: 10px;
}

button {
  padding: 10px 20px;
  background-color: #4CAF50;
  color: white;
  border: none;
  cursor: pointer;
}

#status {
  margin-top: 10px;
  font-weight: bold;
}

Then, link this CSS file in your index.html file within the <head> section:

<link rel="stylesheet" href="style.css">

Running the Application

Now, let’s run the application and test it.

1. Start the Server

Open your terminal, navigate to your project directory (pdf-merger-app), and run the following command to start the Node.js server:

node app.js

You should see a message in the console indicating that the server is running (e.g., “Server listening on port 3000”).

2. Access the Application in Your Browser

Open your web browser and go to http://localhost:3000. You should see the PDF merger application’s user interface.

3. Test the Application

Click the “Choose Files” button and select one or more PDF files. Then, click the “Merge PDFs” button. After a short processing time, the merged PDF file should automatically download to your computer.

Common Mistakes and Troubleshooting

Here are some common mistakes and how to fix them:

  • Incorrect File Paths: Double-check that your file paths in multer.diskStorage and fs.readFile are correct. Relative paths are relative to where you run the Node.js process.
  • Missing Dependencies: Make sure you’ve installed all the required dependencies using npm (express, multer, and pdf-lib).
  • CORS Issues: If you’re accessing your application from a different domain, you might encounter Cross-Origin Resource Sharing (CORS) issues. You can fix this by enabling CORS in your Express app using the cors middleware. Install it with npm install cors, and then add this line at the beginning of your app.js file: const cors = require('cors'); app.use(cors());
  • Incorrect Form Encoding: Remember to include enctype="multipart/form-data" in your HTML form.
  • File Size Limits: By default, multer has a file size limit. You can configure this in the multer options. For example, to set a maximum file size of 10MB:
const upload = multer({
  storage: storage,
  limits: {
    fileSize: 10 * 1024 * 1024, // 10MB
  },
});
  • Error Handling: Implement robust error handling to catch potential issues during file uploads, PDF processing, and file system operations.
  • Permissions Issues: Ensure the Node.js process has the necessary permissions to read and write files in the ‘uploads’ directory.

Key Takeaways

  • You learned how to set up a basic Node.js and Express server.
  • You gained experience using multer for handling file uploads.
  • You learned how to use pdf-lib to read, merge, and save PDF files.
  • You built a simple and practical web application.

FAQ

1. Can I merge PDFs from URLs instead of uploading them?

Yes, you can modify the code to fetch PDFs from URLs using libraries like node-fetch or the built-in http or https modules. You would need to add a way for the user to input the URLs, fetch the PDFs, and then merge them using pdf-lib.

2. How can I add a progress bar to show the merging progress?

You can implement a progress bar by sending progress updates from the server to the client using Server-Sent Events (SSE) or WebSockets. As each PDF is processed, the server can send an event to the client to update the progress bar’s state.

3. How can I add more advanced features like page selection or PDF optimization?

You can extend the functionality by adding features like:

  • Page Selection: Allow users to specify which pages to merge from each PDF. You’ll need to modify the HTML form to include input fields for page ranges.
  • PDF Optimization: Use libraries like pdf-lib or other PDF optimization tools to reduce the file size of the merged PDF.
  • Password Protection: Add functionality to set passwords for the output PDF.

4. Why am I getting an error related to “Cannot find module ‘pdf-lib’”?

This typically means that you haven’t installed the pdf-lib package correctly. Make sure you’ve run npm install pdf-lib in your project directory.

5. How can I deploy this application to the cloud?

You can deploy this application to cloud platforms like Heroku, AWS, Google Cloud, or Azure. You’ll need to configure the deployment process to include your Node.js application, dependencies, and any necessary environment variables. Each platform has its own specific instructions for deployment.

Building a PDF merger is a great way to learn and apply Node.js skills. This project provides a solid foundation for understanding file handling, server-side development, and PDF manipulation. By following this tutorial and experimenting with the code, you can build a useful tool and enhance your development abilities. Remember to practice, experiment, and don’t be afraid to explore the many possibilities that Node.js offers.