Tip: post-processing Comments with grep

One of Dorico’s useful features is adding comments to the project, which can then be exported as a webpage. This is really useful for creating a critical commentary when editing music. But how to get the text from the HTML page to your word processor, in a format that you want?

  1. Open the HTML in a browser. Dorico does this automatically when you export the comments.

  2. Select the entire text, copy it and paste it into a text editor.

  3. Here’s the fun part. Using a sophisticated text editor (like BBEdit on a Mac), I’m going to use a “Grep” search and replace to perform several operations on the text. I’ll remove the Author and Date fields, and swap the bar number and instrument. I’ll also remove the tabs that separate the fields with spaces and other punctuation.

So here’s the raw text pasted into my text editor:

Creed

Author	Date	Instrument	Bars	Comment
Ben	Nov 1, 2021, 1:52:03 PM	Bass	26	Dec Bass has two minims instead of semibreve, with underlay 'all things were made'.

It’s easy enough to delete the ‘heading’ lines of “Author Date Instrument…”, which will appear at the start of each Flow’s comments.

Now use the following Grep search command: ^.+M\t
This will find “More than one character between the start of a line and the letter M, followed by a Tab”.
In other words, it will remove everything up the Tab after the M of PM or AM in the Date. (Make sure case sensitive is ON.)

Replace it with nothing, to delete the fields we don’t want.

Now we have “Bass 26”, separated by tabs. In my editorial notes, I want the bar number first, followed by the instrument name, without tabs.

So the grep code is: ^(.+)\t(\d+)\t
which means “start of line, one or more characters, followed by a tab, then one or more numbers, followed by a tab.”
We’ve put some bits in brackets, so we can refer to them in the replace field, which is:
\2, \1: (There should be a space at the end.)

This will swap the order of the bits in brackets, skip the tabs, and replace them with our own choice of punctuation.

Screenshot

The processed text can then be pasted into a DTP app or word processor, for styling and layout.

Screenshot

Job done!

5 Likes

I do not have a file with comments in it, so I cannot try this here, but would pasting the raw text into a spreadsheet allow one to delete columns or give one a total mishmash?

Ha!
That’s a nice visual approach, but I think it still requires more manual intervention. (Though I was worried someone would point out an easier way…)

The Flow titles are in the same (first) column as the “Author” field, so you’d have to delete the unwanted columns for each flow individually; or manually restore the Flow titles. Switching the column order is easy, but you’ve still got to replace the tabs with different punctuation after you’ve copied it to your DTP app.

Thank you for the reply.

I got into GREP earlier this year in InDesign and it blew my mind. The sky’s the limit…

If you are going to use a program like Grep, why no import the text into EMACS in the first place? I dont have MAC, but I believe the OS is based on UNIX, and EMACS should be available in a terminal window.

David

1 Like

Firstly, I can get the text out of the HTML with copy and paste in the GUI, which would otherwise be another step on the command line.

Next, as can be seen from the images, it’s possible to use Grep patterns within a GUI text editor, which may be easier for those unfamiliar with command line text editors. You still have to get your text into a DTP app, after all.

And for all I know, there may well be text editors for Windows with grep functionality.

And of course, lastly, because vi is the one true Unix editor. :grin:

BBEdit has an excellent grep interface. And it keeps improving. I’ve been using it (or TextWrangler, during its life) longer than any other software, I think – probably 25 years. But of course there are plenty of text editors with grep capability.

Nice tutorial, Ben! Since I have experience with grep and html, I go ahead and work directly on the html, partly because it has become the most convenient way for me to display formatted text that can be read anywhere.

Emacs! Yes!

vi - no!

Ha ha!