Dorico File Format - Commitment to an unencrypted file format

Will Steinberg commit to a policy of an unencrypted fie format for Dorico.? This is extremely important to my company that they do. If they do not there is a requirement that all aspects of the file created be able to be exported in at least an XML format. The reason for this is that external software is used to add harmonic and melodic agogics using AI. Without this ability our company can not commit to using this product. It is for our team an essential requirement, The same is also a requirement for Cubase.

1 Like

It appears to be a zip at least, you might try unzipping it. I know because we use Perforce which uses the read only file bit to indicate a file being checked out. Anyhow I tried to save on an unchecked out file (e.g. read only) and the message box was “Can’t save the zip file XYZ.dorico”

Otherwise I doubt they’ll want to be constrained to publishing and having to support an external file format like this. It’s constricting and would likely give trouble in the future as you add new features.

Instead you might look at the scripting API, perhaps combined with other external file formats reading in like MIDI and such.

I can pretty much guarantee you that the answer to this request will be “no”.

All of the big players in this space have proprietary file formats. Any/all interoperability happens via xml.

Support for xml (import and export) increases with each release, and I highly doubt that trend will slow down until there is full and rich support (which many other users clamor for as well). Whether or not the peculiarities of what you specifically need as a part of that xml export are implemented to your satisfaction any time soon is anyone’s guess.

But as I said, you can certainly plan on ever-richer xml, but not an open dorico format. I’ll eat my hat if I’m proven wrong.


I do not expect an external file format to be used. The zip file will be used directly as the source file. Thank you for the response Very much appreciated,

The big players using encryption forces the interop to be via xml. I am hoping with Dorico that they will not encrypt thus allowing interop - albeit undocumented. One obvious application I hope Steinberg will introduce is free software to markup music for analysis and other purposes for Dorico files… Any thoughts on this?

Will be looking at the zip files tomorrow.

I don’t know that it’s a zip, and it probably is still binary, but it’s common to zip save files and name them as something else (I do this professionally also). It’s usually as a convenient way to combine several output files into one.

I believe this is indeed the case, because there were times on the old forum where people were instructed to change a .dorico file to .zip so the forum would allow them to upload the file. If, therefore, this operation doesn’t do any damage to the file (or indeed, the reversal of it) I would presume it is some form of zip structure but with a custom extension.

1 Like

Yes, .dorico files are a zipped collection of various different files, but as stated, the data is in a binary undocumented format. (Not necessary ‘encrypted’ as such.) Processing the data will be ‘non-trivial’, if not impossible.

When you say “all aspects of the file to be exported in XML”, I think you need to spell out exactly what data you need from the document.

1 Like

Even if you could decrypt the binary format used for the Dorico project file format, I don’t think you will find it very helpful. You won’t even find any information in the file format about the way the music is notated, since the file format only describes the lowest-level representation of the music, i.e. the streams of events assigned to each instrument. Those events are defined in abstract terms of pitch and duration, and there’s not even a way from the file format to determine what bar or beat each event occurs at without it being extremely convoluted.

In addition, I believe (though I cannot check the precise verbiage at this moment in time, because our web site is currently undergoing maintenance) that reverse-engineering of the Dorico file format is expressly prohibited in the End User License Agreement by which you are bound by installing the software.

However, if you can say more about your specific requirements, it’s quite possible that we will be able to find an approach that will be practical.


From the Steinberg EULA

"A modification of the software is permitted only insofar as far as the software
is capable of such modification in accordance with its intended function. You
may not decompile, disassembe, carry out reverse engineering or try in
another manner to determine the source code of the software, unless this is
permitted by law. Furthermore, you must not modify the binary code of the
software to bypass in any manner the activation function or the use of the
license module (Steinberg key and/or Soft eLicencer). "

I see the phrase “carry out reverse engineering or try in
another manner to determine the source code of the software,” From what I understand the data file structure would in no way be reverse engineering the product and/or trying to determine the source code. Could you please clarify. I do not want to get into trouble with your legal team, I do not intend to reverse engineer Steinberg’s software and violate your EULA.

I’m working on exactly what the requirements are for the data - this would be preferable to software adapted to an ad hoc work flow process using the Dorico data file… But it would need to include any expression data used to play synthesized music. Some of this expression data is modified and others added using AI - specifically a neural net trained with specific data from tensor decompositions of recorded music and performances to create more realistic instrument sound and articulations. Will work on the details of what the exported XML files would need to contain. I can go into more details about the goals if needed to help understand what we need, But pretty sure your team knows exactly what I’m writing about. It’s basically everything but data that is used to make the engraving look pretty - what a loaded term pretty - anyway - it is the best I could come up to easily communicate what is needed.

Unfortunately there is a lot of legacy stuff in music synthesis processes that makes such a project difficult. Especially in the analysis of data because phrasing information and dynamic markings are required. Midi files alone make this ‘impossible’. Most of what we do now is completely synthetic performances of classical type movie orchestration music. The goal is extremely realistic performances using training from agogic type expression data extracted from a reference set of recordings. Anyway it is a process of doing semi-automatically something that musician/audio engineers have to spend insane amounts of time doing - making it cost prohibitive because using a full symphonic orchestra is less expensive - and even then it needs work. Anything to reduce the cost of this process toward producing more realistic orchestrations - so realistic only synthesis is required - or a smaller ensemble is required to meet the production requirements. Any way to cut costs though AI postprocessing and markup reflecting back on the source score information and synthesized sound.

By the way Dorico is a really amazing product,

Aren’t you simply describing a kind of NotePerformer? As far as I know, NP has no access to Dorico’s internal data structure, yet manages to imitate human playback to a remarkable degree. It doesn’t need the internal data at all, and I guess neither do you. In any case, Steinberg doesn’t seem inclined to give you their specs, and I’d stop insisting they should.

If the XML export provided by Dorico is insufficient (it’s known to be incomplete as yet), you might try to teach the AI to simply read well-engraved music instead. In order to play realistically, the AI would need to know quite a bit about interpretation, musical conventions and traditions, and preferably a bit of music theory.
Just like the professional musicians you’re apparently trying to make jobless. If a full orchestra is less expensive — dammit —, why don’t you take real people? The ‘expression data’ you’re trying to uncover is in the mind of musicians. It’s nowhere in the notation software at all (unless someone puts it there — I hear it’s a lot of work indeed).
You’re right that MIDI is one of the stupidest of all musical representations. It’s about recording the pressing of buttons. If those buttons happen to be pressed by an inspired musician, they may actually happen to produce nice music.


If you worked with an open file format, such as musicXML, then you’d be able to support many more notation products. Just saying.

And you wouldn’t tie yourself to something that might break every time Steinberg decides to add an attribute to a note / event which required a change to the binary file format.

1 Like

These approaches all have substantial limitations. Producing a Beethoven Sonata Performance of the caliber of Rudolf Serkin is an example of a goal, As far as insisting on the specs I am not. All I’m asking is if determining the data structure of the Dorico files contained within the file violates Steinberg’s EULA. I am familiar with the structure of Musescore’s file but I don’t think it would be that hard to figure out Dorico’s.- there are possibilities that would make it difficult - like the use of serialized objects. But I will not do this - since there is possibility of violating the EULA. I will not proceed unless granted permission to. Which I really can not reasonably expect from Steinberg, So will move on. And yes NotePerformer is a reasonable first order approximation. In piano music notes are often played at different times in a chord. For example by playing the leading note slightly before the other notes - approximately 20-50 milliseconds it changes the perceived tone of the chord. Many performers use these touch related effects. There are a lot of psychoacoustic aspects to sound. This is just one of countless examples used by concert pianists. There are subtle gradations of articulation around various notes in the melody. All kinds of subtle things. ( Ai learned contextual curve fitting for variations ) There is also the issue of creating purposeful randomness within certain parameters. It is a difficult task. Creating a convincing performance that is indistinguishable from a concert pianist is a demanding task. This is the sort of thing attempted where I work - with limited success I might add. But getting there.

At this point I will not inquire further. I think using the XML export will be just fine with internal tools and file formats developed internally to add and be in control of all this other crazy stuff. It gets pretty gnarly :frowning:

Thank you. Great recommendation.

in the end, a human performer just looks at the sheet music.

1 Like

Yes, and that what makes human performers so amazing. And of course they experiment in the practice room as a sort of lab to find the sound they want. And also adapt to the concert hall and instrument. Which makes performance a pretty amazing thing.

I agree that the EULA does not appear to explicitly specify whether or not it is acceptable to attempt to reverse-engineer any of our applications’ file formats. I will seek clarification on this point, but it may take a while to get it. My gut feeling is that it is intended that the prohibition on reverse-engineering the application extends also to its file formats, but I will try to establish that with certainty.

I also agree with other posters in this thread that MusicXML (perhaps ideally also making use of that format’s features for encoding MIDI performance data, from those applications that export it – Dorico is not among them at the present time) would seem on its face (and without a detailed understanding of your specific requirements) to be the ideal format to use for this purpose.

1 Like

Who? Serkin, Argerich, Lang Lang, Horowitz, Staier? Or their average?


An ‘interpretation’ that will be accepted as another ‘interpretation’ in the same class as that of concert pianists. AI - a term I do not like - is used to enhance human ability in this case, not replace it. Like for example two A players provided when they collaborated together using chess engines and won against the strongest chess engine in the world. The game played against the computer was a game played by humans using ‘machines’. The goal here is not automatic Ai but Ai that aids in producing great interpretations by musicians. Thanks for the great question. I leave a question for you - can a great violinist create piano music on par with a great pianist using technology to bridge the gap? That is the question I’m interested in. Not the general original aesthetic interpretation one. being produced by ‘AI’ in my case experimenting with the idea that tensor decomposition together with machine learning can bridge this gap. Not interested in a novelty approach of training AI with a set of Horowitz recordings and then imputing sheet music he never recorded. Although it might be fun to see what happens - excluding the fun of probably being sued by the holders of the copyrighted pieces used to train the AI.