Get both file contents AND file name via automator


Giraf

New Member
Joined
Jun 15, 2017
Messages
9
Reaction score
0
Hello,

I have about 1000+ word files in a folder. It is easy to combine the text of all of these files in one big word file but I would like automator to ALSO add the respective file names to this big file. Hence, I will know from which file the text originates.

So, how do you tell automator to extract the text from each file and also always add the file name right after the text (or before it) in the new file that combines all this information ?

Thanks for any help,

Cheers,

Phil
 
Ad

Advertisements

Anakin Arya

New Member
Joined
Jun 17, 2017
Messages
8
Reaction score
0
use shell script in automator. You can find numerous examples of doing this in bash shell script. Atleast try once and if not successful, I will write you a shell script


Well here is the shell script, just run it in same folder containing the files you want to join, run it as SHELL SCRIPT in automator

ls -ltr | awk '{print $9}' | xargs head.

ls -ltr | awk '{print $9}' | xargs head > output_file
 
Last edited:

Giraf

New Member
Joined
Jun 15, 2017
Messages
9
Reaction score
0
Thanks so much for your answer! I appreciate it.

On the basis of your 1st message I started to write a script (as a newbie) and came up with this:
find . -type f -name '*.txt' -print | while read filename; do
echo "$filename"
cat "$filename"
done > output.txt

But when seeing your 2nd message I ran your script above and it works. I see the file name and the contents in the output. Super!

Yet, when I look at the output, the issue I am facing now is that I deal with word files instead of txt files. So, I have to find a way to extend your script to extract only the plain text. I guess this is getting a bit complicated... Thanks in advance for any insights.
 

Anakin Arya

New Member
Joined
Jun 17, 2017
Messages
8
Reaction score
0
Here is the way to convert doc to txt : EASY WAY

http://therandymon.com/index.php?/archives/152-Convert-Word-Docs-to-Text-Using-Mac-Automator.html

http://macscripter.net/viewtopic.php?id=33214

I would recommend the shell way as following, automator workflows are good, but too much work for simple tasks, I suggest use it only when BASH SCRIPT cannot do the job.

textutil -convert txt -encoding Mac crap.doc

What you want can be done in numerous ways in bash script. Try the suggested link, if you face trouble will compile the solution.
 

Giraf

New Member
Joined
Jun 15, 2017
Messages
9
Reaction score
0
Great! => I went to http://macscripter.net/viewtopic.php?id=33214 and both scripts work. The one of Mcusr seems the best because it does not require Word to open but it converts only one file at the time. The script of Trash Man converts multiple files but requires word to open and save.

When I first apply the script of Trash Man and then your "awk" script above, I got what I need. I tried this with four files. So, many many thanks! I am super happy.

To optimize the process, 2 further questions:
1. how do I adapt the script of Mcusr to convert multiple files at the same time?
2. In the text files, I would also like to delete hard returns that might be left from the Word doc. they are still there in the middle of the text.

Phil
 

Anakin Arya

New Member
Joined
Jun 17, 2017
Messages
8
Reaction score
0
For the doing it on multiple at same time, you can do it two ways:
a. do everything in background so that word don't open file ( I have to check the AppleScript dictionary to be able to do that)

b. open multiple files same time on word and perform action, you can limit it to 2-5 threads, that way your ram and resources won't be exhausted. For threads you can just make copy paste the same code, but change the input to subsequent name of files.

e.g.: make a list of 5 items per repeat loop, and pass the sebsequent 5 and so forth to next loop. just creat like 2-5 threads and each thread does 5 files and last you can make the last loop to check for the left over files and do them.



Again for removing the carriage return use script command

for removing carriage return only: tr -d '\r' < infile > outfile
for removing carriage return as well as newline : tr -d '\r \n' < infile > outfile

it will remove all the carriage return.
 
Last edited:
Ad

Advertisements

Anakin Arya

New Member
Joined
Jun 17, 2017
Messages
8
Reaction score
0
I would have opted for :

textutil -convert txt -encoding Mac crap.doc

this method. using this you can do multiple commands, in background with changing the fileName variable;

textutil -convert txt -encoding Mac crap1.doc &
textutil -convert txt -encoding Mac crap2.doc &
textutil -convert txt -encoding Mac crap3.doc &
textutil -convert txt -encoding Mac crap4.doc &
textutil -convert txt -encoding Mac crap5.doc &
textutil -convert txt -encoding Mac crap-etc.doc &


All your commands can be in one single Bash Script, and result saved in one single file, neatly. Out covering can be provided with AppleScript. In a way that you just have to drag and drop only the doc files, and it will churn out the output TEXT file on desktop
 

Giraf

New Member
Joined
Jun 15, 2017
Messages
9
Reaction score
0
I understand. I will experiment with the different methods you propose.

Regarding tr -d '\r' < infile > outfile

Can I also do this for multiple (1000+) files in a folder at the same time? Note that the last carriage return (in fact the newline) has to stay.
 

Anakin Arya

New Member
Joined
Jun 17, 2017
Messages
8
Reaction score
0
I suggest don't do it for all 1000 at same time, your system might crash depending on the size of the files. I suggest iterate it through loop of say 50-100 at same time and then moving forward. So at any given time only 50 processes of command tr should be running.

for combining text files, you can do all 1000 at same time, since it will add then one by one, but for searching and removing carriage return, it will be RAM intensive.
 

Giraf

New Member
Joined
Jun 15, 2017
Messages
9
Reaction score
0
Dear Anakin,

Two weeks these two commands in a shell script worked like a charm
ls -ltr | awk '{print $9}' | xargs head.
ls -ltr | awk '{print $9}' | xargs head > output_file

but now suddenly I get the following error message:
Run shell script failed 1 error head: Error reading installShield

Can you help me? Why is it suddenly not converting the files anymore?

thanks
Phil
 
Ad

Advertisements

Anakin Arya

New Member
Joined
Jun 17, 2017
Messages
8
Reaction score
0
Are you still using the MS word. Or just the script. Please check the license for MS word, Secondly, change script permission as well as the files and folder permissions to 755, chmod +755 file-names.

What exactly are you doing?
 

Giraf

New Member
Joined
Jun 15, 2017
Messages
9
Reaction score
0
I want to run these 2 commands in a shell script in automator on about 2000 txt files that are all on in one folder.
 

Anakin Arya

New Member
Joined
Jun 17, 2017
Messages
8
Reaction score
0
in terminal go to the folder containing all the 2000 files and then type:

ls -ltr | awk '{print $9}' | xargs head > output_file

and you should be set.
 

Giraf

New Member
Joined
Jun 15, 2017
Messages
9
Reaction score
0
Thanks

I will try that when I am back at my office.
 
Ad

Advertisements

Giraf

New Member
Joined
Jun 15, 2017
Messages
9
Reaction score
0
Dear Anakin,
Thanks so much. It worked like a charm!:)
Have a nice day.
Phil
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top