Newer
Older
cepedac
committed
# Split-pdf
Split PDF files: read all the pages, detect the pages with QR codes (which contain json objects) and create PDF files with the pages between them.
The QR codes should be in the top-left quarter of your pages.
cepedac
committed
The name of the split PDF files is determined by the values in the json object of the previous QR code.
The new files are grouped in sub-directories by a defined attribute.
## Compile the project
Install Maven, then go to the project directory and:
``` bash
mvn clean install
```
## Run the project
With Maven, from the compiled project directory:
```
mvn exec:java -Dexec.mainClass="fr.limos.splitpdf.SplitQRCodePages" -Dexec.args="[arg1] [arg2]"
```
with:
- arg1: input PDF file or input directory (will recursively read all the PDF files in the directory and sub-directories)
- arg2: output directory
- arg3 (optional): json array of strings = list of json attributes in QR codes, default = ["student","model","class"]
- arg4 (optional): index of the attribute by which the program will group the split PDF files, possible values: 0 to size-1 (from arg3 list), default = 2 (for "class")
cepedac
committed
Deleting useless first pages in the content is possible. After splitting the PDF files, there are PDF files whose names start with "second_pages_", which contain only the first pages of all split PDF files. There are also CSV files whose names start with "second_pages_mapping", which contain two columns: the page number (of PDF file of first pages) and the name of the split PDF file that contains the corresponding first page.
If you want to delete the first pages of all the split PDF files, you can mark exceptions in the CSV file: simply add a character like 'X' on the third column within the adequate line. The files of these lines will not be modified.
Then, with Maven, launch this command:
```
mvn exec:java -Dexec.mainClass="fr.limos.splitpdf.SecondPagesHandler" -Dexec.args="[arg1] [arg2]"
```
with:
- arg1: input CSV file or input directory (will read all the CSV files whose names start with "second_pages_mapping" in this)
- arg2: input directory, containing all the split PDF files whose first pages will be deleted
- arg3 (optional): prefix of the sub-directories names, default = 'c'
- arg4 (optional): index of the element in the filenames (separator = '-') that is used for the sub-directories names, default = 2