Voices or layers

Thank you so much, James. I really appreciate the compliment.

And thank you for contributing to the discussion.

Daniel at Steinberg wrote:

… I can’t see us changing the term from “voice” at this stage, though I will suggest to our documentation team that we cross-reference voices from “layers”.

We are using the shortcut Shift-V to start a new voice or layer…and would rather not have to completely reorganise all of our key commands at this stage.

As a compromise, you might use the plural: Voices 1, Voices 2 * and make it clear in the documentation that any number of actual musical voices may be entered in each level. This would preserve both correct musical terminology and Shift-V.

*or Voice(s) 1, Voice(s) 2

This is a very good solution, since it doesn’t supply alternate meaning to established terminology.

The low level playback code should support the option of independent voices for an instrument, which can be routed separately. As with many other things, it’s quite possible that the support won’t be there in the initial version to control this from the UI as it adds an extra layer of complexity, but please be assured that we are thinking about this use case.

To me, having worked with notation software since the mid 1980s (KCS Level II + Copiest, Cubase, Emagic [now Apple] Notator/Logic, Sibelius, Finale, MuseScore, and many more)…

The term ‘layer’ is a visual thing. Imagine putting different elements of a score on different transparent sheets. When you align all the sheets on top of one another, you get the entire ‘score’. This applies to DTP and 2D paint/illustration software as well.

We also used ‘layers’ when designing Marching Drills for ‘Corps Style’ Marching Band drills on a USA Football field. I.E. Odd numbered ‘sets’ would be drawn on white page. Even numbered sets + alphanumeric pages would be on a transparent sheet bound in the drill book. When you flipped the transparent ‘layer’ on top of a ‘white layer’ you could see both ‘sets’ in relationship with one another. Flip another transparent page, and you could see ‘special pathing’ information in a new color showing the routes each marcher should follow. Some more complex ‘movements’ might have multiple ‘transparent layers’ before moving to the next ‘white page’. These days many use Pyware on a computer that ‘animates’ the drill…and don’t bother with the transparent pages anymore, but still…the term ‘layer’ has to do with ‘visual’ elements of a presentation…not ‘auditory’.

Quite a few of the scoring packages I’ve used allow you to define what sort of elements are applied to which ‘layer’ of a score. I.E. You could have the staves, bars, and clefs all be Layer 1. The notes, articulations, and hairpins be layer 2. Directional text and Dynamic Markings on layer 3. Titles, headers, footers, page numbers, and publishing crop marks are often on layers of their own as well. Some might even allow you to have multiple ‘note and stave layers’ that you can overlap so they seem to be only one layer when ‘printed’ on paper.

Voices on the other hand have to do with independent polyphonic melodies that might be sharing a stave. I.E. for enter a 4 part Organ fugue, where there are different rhythmic lines (or counter melodies) happening on the same staff. I.E. For scoring drum parts where overhead kit pieces get an independent rhythmic line from the snare, and maybe even yet another one for the kick drum. A ‘voice’ can indeed have monophonic ‘harmony’ built into a single voice.

As far back as I can remember, the terminology for counter point writing on a single stave has been the term ‘voice’ in software. In music theory class, when doing analysis of Baroque era contrapuntal pieces, we usually used the terms ‘voice, melody, or counter-melody’. I.E. Please examine Voice 1 in measure 22 of Little Fugue in G Minor (Then we’d look at the top most voice on the top most staff). I.E. Please examine the Bass Melody (Voice 5) in this Organ Piece with 5 part intervention. So we’d look at the bass pedal part written on ledger lines below, but very much a part of the left hand staff.

So in my mind…unless I’m working with a synth or sampler doing micro-level sound design…
‘layers’ have to do with visual elements of Desk Top Publishing, while ‘voices’ have to do with independent melodies or rhythm groups on a single ‘stave’.

It’s not really a big deal to me what it’s called, but I’ve totally gotten used to the terms ‘Voice’ or ‘Melody’ when referring to the ‘musical sounds represented by the notes on the page’. In similar fashion, the word ‘layer’ leads me to imagine ‘visual’ elements of a printed or otherwise ‘displayed’ presentation (be it music, drill/choreography sets, text layouts, or even CG work in video or power-point presentations).

If I recall correctly Finale will let you do a bit of both.
I.E. You could have up to 4 independent ‘voices or melodies’ on a stave (Or 8 on a ‘grand staff’). You might also decide to ‘layer’ another stave or staff on top of an existing one, plus have multiple ‘voices’ on that ‘top’ stave layer as well.

It is true that at some point in the late 1990s, synth and sampler engineers started applying the term ‘layer’ to more complex patches/programs in multi timbral synths. I.E. A default ‘patch, or program’ might get two oscillators and a bundle of filters that could be applied to it. If you linked two patches together but controlled it over the same MIDI channel it would be become a ‘Combi’. Later they added extra ‘polyphony’ over independent MIDI channels, and some people started referring to the practice of copying the same track in a DAW (or echoing a track with AUX MIDI sends, or having a live controller echo MIDI events) and sending it to two different instruments over two different MIDI channels as ‘layering patches’.

Really, the term ‘layer’ didn’t become popular in synth/sampler patch/program design until folks started moving to VST plugins, and suddenly got access to nearly unlimited numbers of oscillators. The old terms ‘combi’ or ‘multi patch’ gradually got shifted aside as the flexibility of these synth and sampler engines grew by leaps and bounds. So technically…even when working with low level sound design tools…the term ‘layer’ is usually pretty rare. Terms like 'region, zone, combi, multi, etc…are far more common names (while some of the documentation might use the word ‘layer’ to help describe what those terms ‘mean’). Of course, when one takes to YouTube videos, or buys a set of ‘how to’ books…the term ‘layer’ does start to pop up more often than it should when working with synths and samplers.

1 Like

Thank you, Brian, for this in-depth contribution.

While Daniel has made it clear that further discussion on this topic is fruitless with respect to changing the terminology in Dorico, I will indulge you by addressing your general points.

I recognize that using the word ‘layer’ as intended by John Ruggero and myself could indeed prove to be problematic in view of further development of Dorico. To my knowledge, no such conflict presently exists, however. My most pressing reason for suggesting to adopt Finale’s more recent (and commonly used) terminology, was to avoid having to conjure up an entirely new word for this purpose, but you make a good case for this being unavoidable.

While your explanation of the term ‘voices’ in a limited, double (or to be fair, multiple) stemmed context is entirely correct, this reasoning seems to me to be missing the point of the discussion entirely. At least in my own view, there is a need for notation software to be able to distinguish the different musical ‘planes’ of single notes or chords established by multiple stems on the same staff from a single melodic line or chord note, which is what constitutes a voice in traditional musical terminology. Such as it is, by using ‘voices’ to describe both these aspects, an important level of descriptive precision is inherently lost.

Your statement ‘It’s not really a big deal to me what it’s called’, while in line with certain earlier comments in this thread, nevertheless expresses an indifference that leaves me a bit discouraged. As I’ve stated earlier: I am no native English speaker. In fact, I am Norwegian, and as such, my self-interest in this matter is fairly limited. I would, however, have expected people who are more invested in English musical terminology than myself to be more in favor of such a distinction than what this thread seems to imply. Then again, Sibelius and other scoring software’s long standing misuse of the term may well be irreversible.

I think I better understand your point Knut…

You seem to be talking about ‘chord voicing’ as opposed to ‘contrapuntal melodic’ voicing. Typically that is done in a music theory routine that analyzes all the notes playing on a stave and checks to see what sort of chord and voicing it is. It’d be based on note names and intervals. Cubase Pro can already do this…it can do real time analysis of what is on a track and do a pretty good job of guessing your corresponding chord symbol (which is of course, reliant on how the chord is voiced).

In my mind…when thinking of ‘voices’ in anything based on General MIDI protocols…I just think ‘MIDI channel number’. If it’s on channel 1 in the track, it belongs together, and everything on channel 1 gets its own set of rules for stems and rests. If it’s on channel 2, it belongs together and gets a fresh set of rules for stems and rests…and so forth. Under the hood, this is basically what current generations of Scoring software does. Each MIDI event on the stave/track gets a MIDI channel number that matches it’s ‘voice’.

Dorico may well have a new advantage that the others might not implement for a quite some time yet. VST3 allows ‘each note’ to hold on to note on/off/velocity/expression data, as opposed to using MIDI Channel data. Of course, one will need an instrument capable of doing VST3 note expression (The Halion engine can do this).

As an example…in Cubase Pro with Halion and a patch that implements ‘VST3 note expression’, instead of storing a sFz attack as channel data on CC1 controller lane, you can double click the note itself…a window pops up, and you can draw that sFz attack/decay curve to apply to the ‘individual note’. From then on, dragging or copying that note would retain the sFz data that was painted into it. Another example might be having two notes divisi style on the same staff…with one doing a slight pitch bend upwards, while another is doing a slight pitch bend downwards. Before VST3, the only way to do these sorts of tricks was to use separate staves/tracks/MIDI channels on the two divisi notes. The sequencer just played whatever CC events happen to be on that channel.

I have no idea if Dorico is taking advantage of the VST3 ‘note expression’ protocol, but it seems to me that this could eventually allow MUCH more flexibility in shifting/cutting/copying/pasting individual notes just about anywhere on the score you like while retaining all kinds of information along with it that used to be more or less lumped together as MIDI CC channel data. I am beginning to envision easy ways to just click groups of notes that you might want to be in the same ‘auditory plane’ and just ‘link them’ as such into a ‘group’ that could then be named anything YOU like as the composer. In this case, as the end user, you wouldn’t really need to keep up mentally with ‘voice/channel numbers’ anymore. At a global level, you could even assign default colors for the different groups or planes.

Example: Imagine if you could hold down the ctrl key, then click a dozen or so notes one at a time anywhere on the entire score…no matter where they are, then ‘link’ them as a group and give that group a unique name…then ‘connect’ that group to whatever ‘instrument(s)’ you like as their ‘end-point’ where they get played. When hitting the play back button, the sequence would then send all the events in all of your groups to their assigned ‘end-points’ based on where they are in the time-line. Of course each group could share elements along the timeline if desired…it’s the ‘end point’ that controls what you’d hear from a ‘play back’ engine. This would be an incredibly powerful and flexible way to handle playback. VST3 ‘note expression’ gets us a step closer to having that kind of ultimate flexibility.

In short, VST3 could someday ‘free the user’ from many of the current limitations that have been introduced over the years simply due to limitations of the General MIDI protocol.

No, not exactly. To clarify, this is what I mean:

The word ‘chord voicing’ is derived from the traditional understanding of ‘voice’, and doesn’t make sense in relation to most scoring software’s definition of voices, since those can be both mono- and polyphonic.
Skjermbilde 2016-07-03 kl. 12.41.52.png

I see. One only needs 2 channels (or two voices) in most scoring software I’ve ever seen to achieve this effect.

This ‘chord voicing’ concept doesn’t seem very useful either for engraving or analysing keyboard music, where the number of notes can vary arbitrarily from chord to chord.

kbd chords.png

We seem to agree that there is a difference in the meaning of the word ‘voice’ when dealing with harmonic analysis vs counterpoint analysis, but the current standard in Scoring software is to go by the counterpoint concept.

A Stave Voice, in terms of MIDI based Scoring software does not refer to ‘chord voicing’. It refers to ‘melodic/rhythmic themes’ (even if that melodic line includes harmonies, such as a series of note pairs in parallel 3rds) that will share a staff but require independent engraving rules.

In harmonic analysis, ‘voicing’ applies to how a chord is stacked. There may be different sets of rules one can go by in representing the chord voicings with shorthand symbols. I.E. Baroque era figured bass. I.E. West Coast Lead Sheet Jazz Chords. etc…

In contrapuntal analysis, ‘voicing’ refers to ‘melodic themes’. These themes typically need independent engraving rules to be legible when sharing a stave with another melodic ‘theme’. I.E. A fugue for organ. I.E. A drum stave with multiple instruments doing very different rhythmic patterns notated on it. In scoring software, this is what is meant by the term ‘voice’ on a stave. ‘Layers’ exist in some scoring packages as well…where you can literally superimpose another stave on top of an existing one and effectively double the number of voices available.

Consider pipe organ music, where you have more than one contrapuntal idea going on in the same stave. You might even have a different set of STOPS for each ‘theme’ and play them on multiple keyboards. Each melodic ‘theme’ using the same stop settings would be a ‘voice’.

In the image below, you do not need multiple ‘voices or layers’ at all. All of the notes in the top stave share the same engraving rules. Every note in the bottom stave shares the same engraving rules. There are not any contrapuntal themes that require an isolated set of engraving rules with independent stems/rests/etc. This passage only has one ‘theme or voice’.

To me, ‘chord voicing’ simply refers to the makeup of a chord, i.e., the number and order of the voices that form it. The texture may vary greatly from one chord to the next without challenging the legitimacy of the term.

The term ‘voice leading’, however, can indeed be problematic in certain contexts of rapidly shifting textures or disjointed harmonic relationships. It’s important to remember, though, that most of our musical nomenclature has been established within certain tonal and harmonic boundaries, and that those terms will not always make much sense when applied to music without a clear tonal centre or harmonic structure.

That said, in the context of keyboard music, a voice may split or join into other voices at any time without necessarily obstructing or conflicting with a clear relationship of voice leading. And while the tonality of your example is certainly free, it’s not difficult to point out a clear voice leading relationship between the chords if not confounded by the literal representation of register and duration in the score.

Brian,

While I certainly appreciate your approach to this question from a more technical, computer-based perspective, I do feel that there is no reason for these finer technical points to influence the front-end terminology of a scoring application in this particular case. After all, we are talking about an application that for the most part will concern itself with music notation within well established and familiar parameters, and I personally think it sufficient that the terminology makes sense analytically.

However, a lot of what you’ve touched upon has made me which for the day when scoring software technology renders the distinction between what I call layers and voices obsolete. It would be great if voices, in the true sense of the word, could be freely linked together or applied to a certain stem, while at the same time being routed to a unique MIDI channel number without needing to apply to any particular visual restrictions (such as stem direction). I don’t know if this would be possible, or indeed practical within current technology, but it sure would be nice if it were.

if my current understanding of VST3 is correct, it should go a long way towards making your vision easier to achieve. Also, from what I hear about Dorico playback being based on ‘end points’, it sounds like they might be on the right path to creating an environment where you can someday group and link notes any way you like (no matter where they are on a page), and ‘name’ those groups anything you like, and tag them for various engraving and playback rules and window dressings. Since VST3 events can keep up with a bit more data on their own, and fully VST3 compliant instruments aren’t as tied down to ‘channel controller messages’…it seems to me a lot of doors making things far more flexible and creative should open up in the not so distant future.

That does indeed sound very promising, even if it makes it hard to understand how the current definition of voices in Dorico will fit into this picture. The best thing, I think, would be to rename that entire aspect of the application if such a flexible treatment of voices is ever implemented, but that would probably be problematic for a number of different reasons, and is most likely never going to happen, unfortunately.

The problem may come down to the fact that some composers now think of melodic lines differently than in the past.

During the 18th century and much earlier, composers counterpointed single-note “voices” against each other to produce harmony, as shown by the used of individual stems on every note of a chord to show that each note was part of an different voice.


In the 19th century, voices that shared the same rhythm were stemmed together for convenience and aesthetics, but most composers still thought in terms of single-note voices producing harmony. There are composers today who think of a “voice” in the same way and need the term in discussing music of this type.

In the 20th century, however, Impressionistic composers colored melodies by adding notes that were not a true counterpoint but an enhancement of the overtones, similar to octave doubling. This technique is now used in jazz, pop and concert music, and composers who use it have no trouble considering a “voice” to consist of any number of simultaneous notes.

In the example given by Brian Roland, I hear several voices in free keyboard style moving in the same rhythm—not one voice enhanced by coloristic notes. Others might disagree. But if we don’t share the same language, discussing the matter will be difficult.

For that reason, I think that the term “voice” should preserve its standard dictionary definition as a single-line melody. Another term might be used for melodies enhanced by color notes.

Since it makes no difference from the point of view of music engraving whether Roland’s example is a single voice or many, a special engraving term might be used for notes that share a common stem. But that term should not be “voice”, because that term is already taken. “Layer” is one possibility.

It fits because the potential would be there for you, the user, to ‘link objects’ and ‘name them’ any way you like according to your own conventions and wishes.

You could group things anyway you like, and name them anything you like, in any language you like.

Imagine you have a full score in front of you. Imagine that you can go through and ctrl-click on a number of notes anywhere in that score, then choose to ‘group’ them and tag that ‘group of objects’ to have any name and eventual purpose you like.

Imagine that later you draw an end-point for that group to eventually play back to any endpoint you like (where endpoints will link up with your playback engine to choose the instrument(s) that will sound).

Next, imagine you can net rules for these ‘groups’ that have to do with engraving, playback, and more.

At this point, you become free from the actual confines of the ‘stave’ itself. It starts to work more like an object based DTP program.

I honestly do not know how far Dorico plans to go with this approach, but VST3 protocol as I understand it can make it much easier for program developers to develop a code base and User Interface to ‘free the user’ from the confines of the old ‘MIDI CHANNEL’ based playback engine.