The article shows a few simple examples which use python to edit PDF.

Copy And Encrypt PDF

Just copy an existing PDF and generate a new file which is encrypted

import PyPDF2

fileName = "/Users/weiyang/Desktop/Test.pdf"
newFileName = "/Users/weiyang/Desktop/NewTest.pdf"
file = open( fileName, 'rb' )
reader = PyPDF2.PdfFileReader( file )
writer = PyPDF2.PdfFileWriter()
for pageIndex in range( reader.numPages ):
    writer.addPage( reader.getPage( pageIndex ) )

writer.encrypt( 'bell' ) #passwd
newFile = open( newFileName, "wb" )
writer.write( newFile )
newFile.close()
file.close()



If you want to just encrypt the origin PDF, import the module OS and add os.rename( newFileName, fileName ) in the end at the above code snippet.
The PdfFileReader object can decrypt one encrypted PDF.

Extract The String Content From PDF

I use the PDF file in the last example for a test. PyPDF2 can help us to extract only the text string.
The extracted content is not perfect, I miss a line string.

import PyPDF2
import os

fileName = "/Users/weiyang/Desktop/Test.pdf"
file = open( fileName, 'rb' )
reader = PyPDF2.PdfFileReader( file )
page = reader.getPage( 0 )
content = page.extractText()
print content
file.close()

Combine Two Different PDF Files

Sometimes we want to combine different PDFs to only one file, this is easy to get done if we use PyPDF2.
In the following example, I combine two files Test1.pdf and Test2.pdf to a new file NewTest.pdf.

import PyPDF2

fileName1 = "/Users/weiyang/Desktop/Test1.pdf"
fileName2 = "/Users/weiyang/Desktop/Test2.pdf"
newFileName = "/Users/weiyang/Desktop/NewTest.pdf"

file1 = open( fileName1, 'rb' )
file2 = open( fileName2, 'rb' )

reader1 = PyPDF2.PdfFileReader( file1 )
reader2 = PyPDF2.PdfFileReader( file2 )

writer = PyPDF2.PdfFileWriter()

for pageIndex in range( reader1.numPages ):
    writer.addPage( reader1.getPage( pageIndex ) )

for pageIndex in range( reader2.numPages ):
    writer.addPage( reader2.getPage( pageIndex ) )

newFile = open( newFileName, "wb" )
writer.write( newFile )
newFile.close()
file1.close()
file2.close()

Add Watermark For PDF

I created a watermark PDF file by Microsoft office tool, then use PyPDF2 to add it on every page of NewTest.pdf.



import PyPDF2

fileName = "/Users/weiyang/Desktop/NewTest.pdf"
fileName2 = "/Users/weiyang/Desktop/WaterMark.pdf"
fileName3 = "/Users/weiyang/Desktop/Result.pdf"
file = open( fileName, 'rb' )

reader = PyPDF2.PdfFileReader( file )
waterMarkReader = PyPDF2.PdfFileReader( open( fileName2, "rb" ) )
writer = PyPDF2.PdfFileWriter()

for pageIndex in range( reader.numPages ):
    pageObj = reader.getPage( pageIndex )
    pageObj.mergePage( waterMarkReader.getPage( 0 ) )
    writer.addPage( pageObj )

resultFile = open( fileName3, "wb" )
writer.write( resultFile )
resultFile.close()
file.close()

Result:



Categories: PythonTool

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments

Tex To PDF
: convert the Latex file which suffix is tex to a PDF file

X
0
Would love your thoughts, please comment.x
()
x