xml file crashes Dorico 3

This file opens OK in Sibelius, but crashes Dorico 3. It was created by PDF to Music Pro.
Piano Concerto.zip (295 KB)

If it’s any consolation, it crashed Dorico for me, as well. Out of curiosity I tried opening it in Finale and got the following error messages:

XML error in file /Users/vaughan/Downloads/Piano Concerto.xml at line 1,394:
cvc-complex-type.2.4.a: Invalid content was found starting with element ‘beats’. One of ‘{beat-type}’ is expected.

XML error in file /Users/vaughan/Downloads/Piano Concerto.xml at line 2,313:
cvc-complex-type.2.4.a: Invalid content was found starting with element ‘beats’. One of ‘{beat-type}’ is expected.

XML error in file /Users/vaughan/Downloads/Piano Concerto.xml at line 2,335:
cvc-complex-type.2.4.a: Invalid content was found starting with element ‘beats’. One of ‘{beat-type}’ is expected.

XML error in file /Users/vaughan/Downloads/Piano Concerto.xml at line 2,389:
cvc-complex-type.2.4.a: Invalid content was found starting with element ‘beats’. One of ‘{beat-type}’ is expected.

XML error in file /Users/vaughan/Downloads/Piano Concerto.xml at line 2,480:
cvc-complex-type.2.4.a: Invalid content was found starting with element ‘beats’. One of ‘{beat-type}’ is expected.

Further XML validation errors will be ignored.

Cannot handle more than 4 layers in measure 174, staff 6. Notes in higher layers will be ignored. May also occur later in the score.

Finale did finally load the piano concerto but there were a few passages which would need some work. There are obviously some errors in the original file which you should correct in Sibelius before trying to export this piece again. I suspect that some of the problems have to do with beaming over barlines.

The XML file is corrupt. I don’t think the import result in Sibelius is practically usable … look at measures 9, 61…96, 248…252 (contents does not fit into declared measure). There are lots of expressions like this (bar 9) which are invalid:

<time>
	<beats>2</beats>
	<beat-type>8</beat-type>
	<beats>3</beats>
	<beats>8</beats>
</time>

Dorico shouldn’t crash, of course, and we’ll prevent this crash in the next update, but the MusicXML file is indeed invalid, and it would be worth you letting the developers at Myriad Software know that PDFtoMusic Pro is exporting invalid MusicXML in this way.

Good idea Daniel! thanks!

Absolutely. PDFtoMusic is potentially a great and very useful product. But it fails miserably so frequently as to not be trusted. We can’t force Myriad to improve their product. But one hopes that if they hear from substantial numbers of users, they might decide it is worth the effort to make the product commercial-quality.

IMO part of the bigger picture is the state of the technical documentation on MusicXML. It is better than it used to be in the sense that there is more of it, but it is very hard to get any global picture of how all the hundreds of individual data elements are supposed to fit together and interact with each other.

Common sense suggests that the snippet posted is invalid in the sense that it doesn’t make sense semantically, but I think you would be struggling to find anything in the documentation that says either “you must define the number of beats exactly once in a “time” block” or “if you define it more than once, the last (or maybe the first?) value is used and the rest are ignored”. Repeat that level of imprecision thousands of times (literally) in the specification, and it’s not surprising that things are in the state they are.

Here is another example that Sibelius imports quite well, and Dorico just hangs. This was just a test, I don’t need this file, but I thought you should know about it, especially since the competition (Sibelius) succeeds at it.
BelkinAdagioSymphonique2.7z (73.7 KB)

Our current development builds of Dorico will open this MusicXML file, but they can’t make much sense of it, I’m afraid. Most of the music fails to import. I know I have inadvertently offended the lovely people at Myrid before by being critical of PDFtoMusic Pro’s output, and it is certainly true that Dorico should be more forgiving where possible of idiosyncratic MusicXML encoding, given that (as Rob rightly points out) the standard is very loosely specified, but I don’t think I am speaking far out of turn in saying that PDFtoMusic Pro’s approach to MusicXML encoding is so dissimilar to that of e.g. Sibelius or Finale that it would require a very large amount of additional work for us to be able to import it reliably, and just at the current moment we do not have the time and resources available to do this.

If you find that Finale and/or Sibelius do a better job of importing MusicXML from PDFtoMusic Pro, then my recommendation would be to use those programs as intermediaries.

FWIW Musescore 3 throws up some errors about incomplete bars (including weird things like “expected bar length 33/64, actual length 105/192”), but it gets to the end without crashing if you tell it to ignore them.

The end result has obviously gone astray here for example where it complains about “expected bar length 5/8, actual length 1/1”. Note the gray “phantom rests” and the “+” marks showing some incomplete bars. Not to mention the two slurs over nothing in Vln.2, which actually cover about 40 bars, and the scrambled tempo marking.

I’ve had mixed results trying to use Finale as an intermediary (I haven’t tried Alan Belkin’s file). Often, it apparently reads the MusicXML OK in the sense that it looks right within Finale, but if you export it again you get the same type of problems that were in the original.

I guess Finale’s “laid back” attitude to rhythmic issues is hiding the problems, but not fixing them.

I don’t doubt that the XML standard has some ambiguities, but surely we can all agree no software (Myriad or others) should be putting out files that would have this kind of problem. As you noted, Finale and others might be trying to repair some of these faults behind the scenes when importing junk files. In the end, that might do more harm than good because it ultimately is a “garbage in, garbage out” thing.

My real question is whether or not any of the vendors in the music recognition space are actually very interested in this. If they have concluded the investment is not worth the expected returns, that’s certainly their right to decide. But I do think it is wrong for companies to represent these products as if they are generally useful and more-or-less commercial quality – and continue to take money for them – when the products really don’t do what they say under real world conditions.

For some problems connected with rhythm, the fact that the standard does contain ambiguities is the root cause of the problem. If the standard allows two different ways to interpret the same MusicXML file, there is no way to tell which is the correct one!

Without getting too nerdy, the problem is that notes have lengths defined in terms of “divisions” (similar to MIDI ticks) but the number of divisions per quarter note can be arbitrarily changed anywhere in the file. You can also skip forwards or backwards by any number of divisions. That is useful - e.g. you can define the whole of voice 1 on a staff, then skip back to the start of the score and define the whole of voice 2. But it also means that you can (probably inadvertently) change the size of a “division” retrospectively after you already skipped over the point where you changed it! (And why would anybody want to do that? Answer- because there are things called “tuplets” which can mean that a written quarter-note has a different length in different voices on the same staff).

As the old map makers said on their charts, “Here be dragons.”

No doubt if you interpret this feature in Myriad’s files “Myriad’s way” they make perfect sense, and so do Sibelius’s files interpreted “Sibelius’s way”. But that isn’t how standards are supposed to work, of course.

Can we ask collectively that the companies publish a DTD (A human/machine readable document that defines the valid XML that they recognize) which they presumably are already using to validate the XML? That way Myriad would have a shot at formatting their XML to the needs of Dorico or Finale, and vice versa. Or one of our enterprising community members could write a translation utility using XSLT.

A DTD doesn’t define semantics. And semantics are the cause of most of the problems.

I don’t think a DTD can even capture basic semantics like “in 3/4 time a bar is supposed to be three quarter-notes long”.

As Rob says, it is syntax versus semantics. There should not be any XML syntax errors because that is well defined. The standard seems to leave too much room for interpretation.

From the earliest days of UNIX, there was a command, LINT, that analyzed C code. It did the basic syntax checks, of course, but it also did a deeper analysis and flagged things that were likely errors, archaic or deprecated functions, orphaned code and other problems that were more semantic than syntax.

It seems to me the music technology industry would be well served by an open source project to develop a “MUSICXML LINT” program. If a MUSICXML file could past through that scanner without errors or severe warnings, then it could be considered a proper file that any other program should be able to import properly. It seems that the Reaper folks already have much of that code in their DAW. I wonder if they could be persuaded to package just the analysis part of that and place it in the public domain as a starting point.

I’ll stick my neck out here and say that XML is simply the wrong vehicle to attempt to define a music interchange format.

It is (probably) a good enough vehicle to store a serialized version of the definition in a text file, but that is confusing the medium with the message.

In the earliest days of UNIX (when C had only just been invented, and was a much simpler language than it is now) there were indeed tools like LINT. But the published specification of the latest version of C is a 500-page document. The spec of C++ is almost three times as long.

The fact that there is no self-consistent specification written down anywhere for what music notation means doesn’t help, of course. Even a 700-page book like Gould is only an approximate description of modern notation practice (mostly “by example” rather than “by rules”), and it doesn’t cover historical notations (which are still actively used by publishers and performers) at all.

Maybe. But as a practical matter, it is the only reasonable tool widely available today. It will probably be impossible to make MisocXML handle all cases perfectly. But the industry can certainly do a lot better than where we are today.

Just to indicate the height of the mountain to climb: there is already an XSD schema for MusicXML 3.1. I just checked what it says about time signatures (see the invalid snippet in an earlier post).

In fact any number of pairs of “beats” and “beat type” are permitted, to represent time signatures like 2/4 + 3/8. Also the “beat” is not necessarily just a number, to represent something like 2+3+2/8.

But all this is only described in the human-readable annotations, and the computing definition of “beat” and “beat type” are both just “string”.

So it is perfectly legal in a MusicXML file to state that the “beats” In a time signature are a copy of the Gettysburg address, and the “beat type” is a copy of the Declaration of Independence.

What it would mean is another question of course (but there is probably an avant-garde composer somewhere who has the answer to that!)

You have made a very good case that the standard is lacking. To me, that begs for either a better definition (which might take a decade) or a Lint-type tool that could at least flag nonsense as WARNING if not ERROR. Having participated in some industry-level XML negotiations (travel industry,) I know how tedious any such standards changes can become. I’d like to think a tool like Lint4MusicXML could evolve faster than standards negotiations because it would not be necessary to get something adopted in “the standard” in order to flag it with a WARNING.

I only mention this because it seems Reaper is halfway there already. If several leading companies could get behind that, perhaps there could be some rapid improvement.