Get both file contents AND file name via automator

Discussion in 'Automator' started by Giraf, Jun 15, 2017.

  1. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    Hello,

    I have about 1000+ word files in a folder. It is easy to combine the text of all of these files in one big word file but I would like automator to ALSO add the respective file names to this big file. Hence, I will know from which file the text originates.

    So, how do you tell automator to extract the text from each file and also always add the file name right after the text (or before it) in the new file that combines all this information ?

    Thanks for any help,

    Cheers,

    Phil
     
    Giraf, Jun 15, 2017
    #1
    1. Advertisements

  2. Giraf

    Anakin Arya New Member

    Joined:
    Jun 17, 2017
    Messages:
    8
    Likes Received:
    0
    use shell script in automator. You can find numerous examples of doing this in bash shell script. Atleast try once and if not successful, I will write you a shell script


    Well here is the shell script, just run it in same folder containing the files you want to join, run it as SHELL SCRIPT in automator

    ls -ltr | awk '{print $9}' | xargs head.

    ls -ltr | awk '{print $9}' | xargs head > output_file
     
    Last edited: Jun 17, 2017
    Anakin Arya, Jun 17, 2017
    #2
    1. Advertisements

  3. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    Thanks so much for your answer! I appreciate it.

    On the basis of your 1st message I started to write a script (as a newbie) and came up with this:
    find . -type f -name '*.txt' -print | while read filename; do
    echo "$filename"
    cat "$filename"
    done > output.txt

    But when seeing your 2nd message I ran your script above and it works. I see the file name and the contents in the output. Super!

    Yet, when I look at the output, the issue I am facing now is that I deal with word files instead of txt files. So, I have to find a way to extend your script to extract only the plain text. I guess this is getting a bit complicated... Thanks in advance for any insights.
     
    Giraf, Jun 17, 2017
    #3
  4. Giraf

    Anakin Arya New Member

    Joined:
    Jun 17, 2017
    Messages:
    8
    Likes Received:
    0
    Here is the way to convert doc to txt : EASY WAY

    http://therandymon.com/index.php?/archives/152-Convert-Word-Docs-to-Text-Using-Mac-Automator.html

    http://macscripter.net/viewtopic.php?id=33214

    I would recommend the shell way as following, automator workflows are good, but too much work for simple tasks, I suggest use it only when BASH SCRIPT cannot do the job.

    textutil -convert txt -encoding Mac crap.doc

    What you want can be done in numerous ways in bash script. Try the suggested link, if you face trouble will compile the solution.
     
    Anakin Arya, Jun 17, 2017
    #4
  5. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    Great! => I went to http://macscripter.net/viewtopic.php?id=33214 and both scripts work. The one of Mcusr seems the best because it does not require Word to open but it converts only one file at the time. The script of Trash Man converts multiple files but requires word to open and save.

    When I first apply the script of Trash Man and then your "awk" script above, I got what I need. I tried this with four files. So, many many thanks! I am super happy.

    To optimize the process, 2 further questions:
    1. how do I adapt the script of Mcusr to convert multiple files at the same time?
    2. In the text files, I would also like to delete hard returns that might be left from the Word doc. they are still there in the middle of the text.

    Phil
     
    Giraf, Jun 17, 2017
    #5
  6. Giraf

    Anakin Arya New Member

    Joined:
    Jun 17, 2017
    Messages:
    8
    Likes Received:
    0
    For the doing it on multiple at same time, you can do it two ways:
    a. do everything in background so that word don't open file ( I have to check the AppleScript dictionary to be able to do that)

    b. open multiple files same time on word and perform action, you can limit it to 2-5 threads, that way your ram and resources won't be exhausted. For threads you can just make copy paste the same code, but change the input to subsequent name of files.

    e.g.: make a list of 5 items per repeat loop, and pass the sebsequent 5 and so forth to next loop. just creat like 2-5 threads and each thread does 5 files and last you can make the last loop to check for the left over files and do them.



    Again for removing the carriage return use script command

    for removing carriage return only: tr -d '\r' < infile > outfile
    for removing carriage return as well as newline : tr -d '\r \n' < infile > outfile

    it will remove all the carriage return.
     
    Last edited: Jun 17, 2017
    Anakin Arya, Jun 17, 2017
    #6
  7. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    ok, will do so. Thanks.
     
    Giraf, Jun 17, 2017
    #7
  8. Giraf

    Anakin Arya New Member

    Joined:
    Jun 17, 2017
    Messages:
    8
    Likes Received:
    0
    I would have opted for :

    textutil -convert txt -encoding Mac crap.doc

    this method. using this you can do multiple commands, in background with changing the fileName variable;

    textutil -convert txt -encoding Mac crap1.doc &
    textutil -convert txt -encoding Mac crap2.doc &
    textutil -convert txt -encoding Mac crap3.doc &
    textutil -convert txt -encoding Mac crap4.doc &
    textutil -convert txt -encoding Mac crap5.doc &
    textutil -convert txt -encoding Mac crap-etc.doc &


    All your commands can be in one single Bash Script, and result saved in one single file, neatly. Out covering can be provided with AppleScript. In a way that you just have to drag and drop only the doc files, and it will churn out the output TEXT file on desktop
     
    Anakin Arya, Jun 17, 2017
    #8
  9. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    I understand. I will experiment with the different methods you propose.

    Regarding tr -d '\r' < infile > outfile

    Can I also do this for multiple (1000+) files in a folder at the same time? Note that the last carriage return (in fact the newline) has to stay.
     
    Giraf, Jun 17, 2017
    #9
  10. Giraf

    Anakin Arya New Member

    Joined:
    Jun 17, 2017
    Messages:
    8
    Likes Received:
    0
    I suggest don't do it for all 1000 at same time, your system might crash depending on the size of the files. I suggest iterate it through loop of say 50-100 at same time and then moving forward. So at any given time only 50 processes of command tr should be running.

    for combining text files, you can do all 1000 at same time, since it will add then one by one, but for searching and removing carriage return, it will be RAM intensive.
     
    Anakin Arya, Jun 17, 2017
    #10
  11. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    Dear Anakin,

    Two weeks these two commands in a shell script worked like a charm
    ls -ltr | awk '{print $9}' | xargs head.
    ls -ltr | awk '{print $9}' | xargs head > output_file

    but now suddenly I get the following error message:
    Run shell script failed 1 error head: Error reading installShield

    Can you help me? Why is it suddenly not converting the files anymore?

    thanks
    Phil
     
    Giraf, Jul 2, 2017
    #11
  12. Giraf

    Anakin Arya New Member

    Joined:
    Jun 17, 2017
    Messages:
    8
    Likes Received:
    0
    Are you still using the MS word. Or just the script. Please check the license for MS word, Secondly, change script permission as well as the files and folder permissions to 755, chmod +755 file-names.

    What exactly are you doing?
     
    Anakin Arya, Jul 2, 2017
    #12
  13. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    I want to run these 2 commands in a shell script in automator on about 2000 txt files that are all on in one folder.
     
    Giraf, Jul 3, 2017
    #13
  14. Giraf

    Anakin Arya New Member

    Joined:
    Jun 17, 2017
    Messages:
    8
    Likes Received:
    0
    in terminal go to the folder containing all the 2000 files and then type:

    ls -ltr | awk '{print $9}' | xargs head > output_file

    and you should be set.
     
    Anakin Arya, Jul 3, 2017
    #14
  15. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    Thanks

    I will try that when I am back at my office.
     
    Giraf, Jul 3, 2017
    #15
  16. Giraf

    Giraf New Member

    Joined:
    Jun 15, 2017
    Messages:
    9
    Likes Received:
    0
    Dear Anakin,
    Thanks so much. It worked like a charm!:)
    Have a nice day.
    Phil
     
    Giraf, Jul 4, 2017
    #16
    1. Advertisements

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Similar Threads
  1. Ric
    Replies:
    1
    Views:
    9,831
    icantlactate
    Sep 27, 2010
  2. Newsbot
    Replies:
    0
    Views:
    715
    Newsbot
    Nov 16, 2007
  3. nospam
    Replies:
    0
    Views:
    362
    nospam
    Feb 19, 2004
  4. Yandle
    Replies:
    1
    Views:
    238
    Michelle Steiner
    Jan 9, 2008
  5. D.M. Procida

    Automator "Send File..." via Bluetooth

    D.M. Procida, Sep 11, 2011, in forum: UK Macs
    Replies:
    0
    Views:
    1,226
    D.M. Procida
    Sep 11, 2011
  6. alexransome
    Replies:
    1
    Views:
    375
    Kaveman
    May 28, 2013
  7. Mojomikefsu
    Replies:
    2
    Views:
    573
  8. Stephanie Smith

    Help with file renaming via Automator

    Stephanie Smith, Aug 14, 2016, in forum: Automator
    Replies:
    1
    Views:
    993
    Cory Cooper
    Aug 14, 2016
Loading...