SOLVED Batch extract pages from Multiple pdf files


Joined
Feb 22, 2020
Messages
9
Reaction score
0
My association has 1400 multi-page pdf files in a research library, single directory. Our on-line search engine works with simple txt files (no images), linked back to the PDF file. I'm searching for an application that will either:
* batch extract all pages from the pdf files into individual pdf pages, from which the single page txt files can be created,
* OR batch extract all pages in multiple PDF files directly into single page txt files (no images), retaining corresponding file names.

I'm running an iMac 27 2013 High Sierra

Thoughts of others?
 
Ad

Advertisements

Joined
Jan 25, 2017
Messages
922
Reaction score
63
Automator has the necessary components to extract text from a pdf. It should be possible to write an app that will do what you want.
 
Joined
Jan 25, 2017
Messages
922
Reaction score
63
Open Automator, select 'New Document', select 'Application', in the leftmost column double click on 'Files & Folders' and the in the adjacent column double click on "Ask for Finder Items', tick 'Allow Multiple Selection' go back to the leftmost column and select 'PDFs', and in the adjacent column double click 'Extract PDF Text', select 'Rich Text' for the output, choose where to save your text file in the 'Save Output to' box. Save the file, make sure the file format is 'Application'.
When you open the app you just need to navigate to the files you need to process, highlite them and click choose.
 
Joined
Feb 22, 2020
Messages
9
Reaction score
0
Very interesting. Never used this utility before. I worked my way through it, came down to "running", and it comes down to "ask for finder items". Here I'm stumped, not sure what it needs. Your thoughts?
Regards,
Screen Shot 2020-02-22 at 11.03.24 AM.png

Terry
 
Ad

Advertisements

Joined
Jan 25, 2017
Messages
922
Reaction score
63
Have you 'Saved' the file as an app yet? if so double click the app, and when the Finder box opens navigate to where your PDFs are and select the nones you want to convert. The "Ask for Finder Items' is basically the computer asking you too select the files to convert.
 
Joined
Feb 22, 2020
Messages
9
Reaction score
0
VOILA! As if by magic! Thank you for your tutorial on Automator. I've been working with a Mac for some 30+ years and never used this utility. I made one change to your formula, changed rtf to txt. To my surprise, the resulting app worked its way through the source PDF directory and deposited an equal number of TXT files in the target directory. That it went directly to TXT without extracting individual pages is, I believe, a bonus. I suspect my webmaster will be pleased. Your support is very much appreciated. For my 88 year old brain, it is a fine opportunity to hop up onto yet another learning curve. This has been a fine example of the internet at its finest! Stay tuned.
 
Joined
Feb 22, 2020
Messages
9
Reaction score
0
As a secondary question about Automator..... Can Automator generate a script that can be viewed and analyzed?
 
Joined
Jan 25, 2017
Messages
922
Reaction score
63
I suggested using .rtf as this maintains the formatting of the document, if the format is not necessary to your process converting to .txt is fine.
I am not aware of any way to view the automator action at script level.
 
Ad

Advertisements

Joined
Feb 22, 2020
Messages
9
Reaction score
0
Suspicions confirmed about no script capability. For our purposes, the TXT version is quite OK, because format is not a requirement. Again, many thanks.
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top