VST3 HOA Support > 3rd order

If I am not mistaken because VST3 speaker arrangements based on a bit set and it seems that in the Speaker Definitions array you do not want to use the same bit index for multiple things (even though I can’t imagine a situation where it would be a problem where you are using ambisonics OR speaker layouts), it is not possible for VST3 plug-ins to support > 4th order channel counts, since you haven’t got enough bits left.

Since HOA support is a fairly recent addition, why not correct this in the SDK ASAP, so that kSpeakerACN0 = 0 and we can do 7th order HOA (64 channels) in VST3 plug-ins, like we can with VST2 in Reaper.

const Speaker kSpeakerACN0  = (Speaker)1 << 20;	///< Ambisonic ACN 0
const Speaker kSpeakerACN1  = (Speaker)1 << 21;	///< Ambisonic ACN 1
const Speaker kSpeakerACN2  = (Speaker)1 << 22;	///< Ambisonic ACN 2
const Speaker kSpeakerACN3  = (Speaker)1 << 23;	///< Ambisonic ACN 3
const Speaker kSpeakerACN4  = (Speaker)1 << 38;	///< Ambisonic ACN 4
const Speaker kSpeakerACN5  = (Speaker)1 << 39;	///< Ambisonic ACN 5
const Speaker kSpeakerACN6  = (Speaker)1 << 40;	///< Ambisonic ACN 6
const Speaker kSpeakerACN7  = (Speaker)1 << 41;	///< Ambisonic ACN 7
const Speaker kSpeakerACN8  = (Speaker)1 << 42;	///< Ambisonic ACN 8
const Speaker kSpeakerACN9  = (Speaker)1 << 43;	///< Ambisonic ACN 9
const Speaker kSpeakerACN10 = (Speaker)1 << 44;	///< Ambisonic ACN 10
const Speaker kSpeakerACN11 = (Speaker)1 << 45;	///< Ambisonic ACN 11
const Speaker kSpeakerACN12 = (Speaker)1 << 46;	///< Ambisonic ACN 12
const Speaker kSpeakerACN13 = (Speaker)1 << 47;	///< Ambisonic ACN 13
const Speaker kSpeakerACN14 = (Speaker)1 << 48;	///< Ambisonic ACN 14
const Speaker kSpeakerACN15 = (Speaker)1 << 49;	///< Ambisonic ACN 15

Hi
I think we need to adapt a different approach for more than 64 channels, i will make a proposal where the bitset arragement will get a new mode which will define a configuration directly with number of used channels. For now we keep the bitset like it is…and support only 3rd order for HOA…
stay tuned…
Best Regards

sounds good

Are there any updates? The current situation with the VST3 layouts is not really suitable for inputs/outputs being independent of standard loudspeaker layouts. See this thread: https://sdk.steinberg.net/viewtopic.php?f=4&t=624
Especially for Higher Order Ambisonic plug-ins, Wave-field synthesis, multi-source encoders/renderers, loudspeaker decoders for arbitrary number of loudspeakers (e.g. for HOA, VBAP, …).

I am not really familiar with what’s going on within the VST3 standard, but I guess a quick solution would be allowing discrete channels as a default layout, so DAW / plug-in host manufacturers can allow non-standardized loudspeaker layouts.

Opened an issue regarding this matter here: https://github.com/steinbergmedia/vst3sdk/issues/28
With spatial audio technologies like Ambisonics or even WFS, the lack of an arbitrary discrete channel layout is a big problem for plug-in developers, making VST3 not suitable for that. Porting from VST2 is simply not possible at the moment.

Approaching nearly 2 years later… Have there been any developments regarding this issue?

One issue is that based on the current Definition of the Bus Arrangement it is not possible to extend to complex Speaker Arrangements or new Arrangements with a lot of channels like HOA 7order, without breaking compatibility with old released Plug-ins and hosts.
We are thinking to add a new dedicated interface defining a new way to describe Speaker Arrangement. Here a proposal:

//----------------------------------------------------------------------
namespace Steinberg {
namespace Vst {
	
//------------------------------------------------------------------------
/** \defgroup vst3typedef VST 3 Data Types
*/
/*@{*/
//------------------------------------------------------------------------
// Speaker Arrangements Types
//------------------------------------------------------------------------
typedef uint64 ExtSpeakerArrangement;
/*@}*/

//------------------------------------------------------------------------
// Extended speaker Arrangement Macros
#define SPEAKER_ARRANGEMENT_FORMAT(enumVal, numChannels) \
	(((Speaker)1 << 63) | ((Speaker)enumVal << 16) | (numChannels & 0xFFFF))
#define SPEAKER_ARRANGEMENT_FORMAT_NUMCHANNELS(speakerArrFormat) \
	(static_cast<uint16> (speakerArrFormat & 0xFFFF))
#define SPEAKER_ARRANGEMENT_FORMAT_ENUM(speakerArrFormat) \
	(static_cast<uint16> ((speakerArrFormat >> 16) & 0xFFFF))
#define IS_SPEAKER_ARRANGEMENT_FORMAT(speakerArrFormat) \
	((speakerArrFormat & ((Speaker)1 << 63) != 0)

enum eSpeakerArrangementIndex : uint16
{
	kESAF_Empty = 0,		///< empty arrangement
	kESAF_Mono,				///< M
	kESAF_Stereo,			///< L R
	kESAF_StereoSurround,	///< Ls Rs
	kESAF_StereoCenter,		///< Lc Rc
	kESAF_StereoSide,		///< Sl Sr
	kESAF_StereoCLfe,		///< C Lfe
	kESAF_StereoTF,			///< Tfl Tfr
	kESAF_StereoTS,			///< Tsl Tsr
	kESAF_StereoTR,			///< Trl Trr
	kESAF_StereoBF,			///< Bfl Bfr

	kESAF_30Cine,			///<  L R C
	kESAF_30Music,			///<  L R S
	kESAF_31Cine,			///<  L R C   Lfe
	kESAF_31Music,			///<  L R Lfe S
	kESAF_40Cine,			///<  L R C   S (LCRS)
	kESAF_40Music,			///<  L R Ls  Rs (Quadro)
	kESAF_41Cine,			///<  L R C   Lfe S (LCRS+Lfe)
	kESAF_41Music,			///<  L R Lfe Ls Rs (Quadro+Lfe)
	kESAF_50,				///<  L R C   Ls Rs
	kESAF_51,				///<  L R C  Lfe Ls Rs
	kESAF_60Cine,			///<  L R C  Ls  Rs Cs
	kESAF_60Music,			///<  L R Ls Rs  Sl Sr
	kESAF_61Cine,			///<  L R C  Lfe Ls Rs Cs
	kESAF_61Music,			///<  L R Lfe Ls  Rs Sl Sr
	kESAF_70Cine,			///<  L R C   Ls  Rs Lc Rc
	kESAF_70Music,			///<  L R C   Ls  Rs Sl Sr
	kESAF_71Cine,			///<  L R C Lfe Ls Rs Lc Rc
	kESAF_71CineFullRear,	///<  L R C Lfe Ls Rs Lcs Rcs
	kESAF_71Music,			///<  L R C Lfe Ls Rs Sl Sr
	kESAF_80Cine,			///<  L R C Ls  Rs Lc Rc Cs
	kESAF_80Music,			///<  L R C Ls  Rs Cs Sl Sr
	kESAF_81Cine,			///<  L R C Lfe Ls Rs Lc Rc Cs
	kESAF_81Music,			///<  L R C Lfe Ls Rs Cs Sl Sr

	/*-----------*/
	/* 3D formats */
	/*-----------*/
	kESAF_80Cube,			///<  L R Ls Rs Tfl Tfr Trl Trr (40_4)
	kESAF_71CineTopCenter,	///<  L R C Lfe Ls Rs Cs Tc
	kESAF_71CineCenterHigh, ///<  L R C Lfe Ls Rs Cs Tfc
	kESAF_71CineFrontHigh,	///<  L R C Lfe Ls Rs Tfl Tfr
	kESAF_71CineSideHigh,	///<  L R C Lfe Ls Rs Tsl Tsr
	kESAF_81MPEG3D,			///<  L R Lfe Ls Rs Tfl Tfc Tfr Bfc 			(41_3_1)
	kESAF_50_4,				///<  L R C Ls Rs Tfl Tfr Trl Trr
	kESAF_51_4,				///<  L R C Lfe Ls Rs Tfl Tfr Trl Trr
	kESAF_70_2,				///<  L R C Ls Rs Sl Sr Tsl Tsr
	kESAF_71_2,				///<  L R C Lfe Ls Rs Sl Sr Tsl Tsr
	kESAF_100,				///<  L R C Ls Rs Tc Tfl Tfr Trl Trr			(50_5 TopC)
	kESAF_101,				///<  L R C Lfe Ls Rs Tc Tfl Tfr Trl Trr		(51_5 TopC)
	kESAF_102,				///<  L R C Lfe Ls Rs Tfl Tfc Tfr Trl Trr Lfe2	(52_5)
	kESAF_110,				///<  L R C Ls Rs Tc Tfl Tfc Tfr Trl Trr		(50_6 TopC)
	kESAF_111,				///<  L R C Lfe Ls Rs Tc Tfl Tfc Tfr Trl Trr	(51_6 TopC)
	kESAF_70_4,				///<  L R C Ls Rs Sl Sr Tfl Tfr Trl Trr
	kESAF_71_4,				///<  L R C Lfe Ls Rs Sl Sr Tfl Tfr Trl Trr
	kESAF_70_6,				///<  L R C Ls Rs Sl Sr Tfl Tfr Trl Trr Tsl Tsr
	kESAF_71_6,				///<  L R C Lfe Ls Rs Sl Sr Tfl Tfr Trl Trr Tsl Tsr
	kESAF_90_6,				///<  L R C Lfe Ls Rs Lc Rc Sl Sr Tfl Tfr Trl Trr Tsl Tsr
	kESAF_91_6,				///<  L R C Lfe Ls Rs Lc Rc Sl Sr Tfl Tfr Trl Trr

	kESAF_122,				///<  L R C Lfe Ls Rs Lc Rc Tfl Tfc Tfr Trl Trr Lfe2	(72_5)
	kESAF_130,				///<  L R C Ls Rs Sl Sr Tc Tfl Tfc Tfr Trl Trr			(70_6 TopC)
	kESAF_131,				///<  L R C Lfe Ls Rs Sl Sr Tc Tfl Tfc Tfr Trl Trr		(71_6 TopC)
	kESAF_140,				///<  L R Ls Rs Sl Sr Tfl Tfr Trl Trr Bfl Bfr Brl Brr	(60_4_4)
	kESAF_222,				///<  L R C Lfe Ls Rs Lc Rc Cs Sl Sr Tc Tfl Tfc Tfr Trl Trc Trr Lfe2 Tsl Tsr Bfl Bfc Bfr (102_9_3 TocC)

	/** First-Order up to 8th Order with Ambisonic Channel Number (ACN) ordering and SN3D normalization */
	kESAF_Ambi1stOrderACN,
	kESAF_Ambi2cdOrderACN,
	kESAF_Ambi3rdOrderACN,
	kESAF_Ambi4thOrderACN,
	kESAF_Ambi5thOrderACN,
	kESAF_Ambi6thOrderACN,
	kESAF_Ambi7thOrderACN,
	kESAF_Ambi8thOrderACN,

};

//------------------------------------------------------------------------
enum eSpeakerArrangementFormat: uint64
{
	kSAF_Empty		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Empty, 0),
	kSAF_Mono		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Mono, 1),
	kSAF_Stereo		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Stereo, 2),
	kSAF_30Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_30Cine, 3),
	kSAF_30Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_30Music, 3),
	kSAF_31Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_31Cine, 4),
	kSAF_31Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_31Music, 4),
	kSAF_40Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_40Cine, 4),
	kSAF_40Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_40Music, 4),
	kSAF_41Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_41Cine, 5),
	kSAF_41Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_41Music, 5),
	kSAF_50			= SPEAKER_ARRANGEMENT_FORMAT (kESAF_50, 5),
	kSAF_51			= SPEAKER_ARRANGEMENT_FORMAT (kESAF_51, 6),
	kSAF_60Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_60Cine, 6),
	kSAF_60Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_60Music, 6),
	kSAF_61Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_61Cine, 7),
	kSAF_61Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_61Music, 7),
	kSAF_70Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_70Cine, 7),
	kSAF_70Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_70Music, 7),
	kSAF_71Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71Cine, 8),
	kSAF_71CineFullRear = SPEAKER_ARRANGEMENT_FORMAT (kESAF_71CineFullRear, 7),
	kSAF_71Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71Music, 8),
	kSAF_80Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_80Cine, 8),
	kSAF_80Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_80Music, 8),
	kSAF_81Cine		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_81Cine, 9),
	kSAF_81Music	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_81Music, 9),

	kSAF_Ambi1stOrderACN	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Ambi1stOrderACN, 4),
	kSAF_Ambi2cdOrderACN	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Ambi2cdOrderACN, 9),
	kSAF_Ambi3rdOrderACN	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Ambi3rdOrderACN, 16),
	kSAF_Ambi4thOrderACN	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Ambi4thOrderACN, 25),
	kSAF_Ambi5thOrderACN	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Ambi5thOrderACN, 36),
	kSAF_Ambi6thOrderACN	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Ambi6thOrderACN, 49),
	kSAF_Ambi7thOrderACN	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Ambi7thOrderACN, 64),
	kSAF_Ambi8thOrderACN	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_Ambi8thOrderACN, 81),

	kSAF_80Cube				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_80Cube, 8),			
	kSAF_71CineTopCenter	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71CineTopCenter, 8),
	kSAF_71CineCenterHigh	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71CineCenterHigh, 8),
	kSAF_71CineFrontHigh	= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71CineFrontHigh, 8),
	kSAF_71CineSideHigh		= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71CineSideHigh, 8),
	kSAF_81MPEG3D			= SPEAKER_ARRANGEMENT_FORMAT (kESAF_81MPEG3D, 9),
	kSAF_50_4				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_50_4, 9),
	kSAF_51_4				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_51_4, 10),
	kSAF_70_2				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_70_2, 9),
	kSAF_71_2				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71_2, 10),
	kSAF_70_4				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_70_4, 11),
	kSAF_71_4				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71_4, 12),
	kSAF_70_6				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_70_6, 13),
	kSAF_71_6				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_71_6, 14),
	kSAF_90_6				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_90_6, 15),
	kSAF_91_6				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_91_6, 16),
	
	kSAF_100				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_100, 10),			
	kSAF_101				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_101, 11),			
	kSAF_102				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_102, 12),			
	kSAF_110				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_110, 11),			
	kSAF_111				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_111, 12),			
	kSAF_122				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_122, 14),
	kSAF_130				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_130, 13),
	kSAF_131				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_131, 14),
	kSAF_140				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_140, 14),
	kSAF_222				= SPEAKER_ARRANGEMENT_FORMAT (kESAF_222, 24),
};


//------------------------------------------------------------------------
/** Extended IAudioProcessor interface for a component.
\ingroup vstIPlug vst370
- [plug imp]
- [extends IAudioProcessor]
- [released: 3.7.0]
- [optional]

This interface extends the \ref IAudioProcessor::setBusArrangements and \ref IAudioProcessor::getBusArrangement 
of \ref IAudioProcessor by using the ExtSpeakerArrangement type. This new definition allows to support more
 complex arrangement with more than 64 channels and is not based on bitmask combination.
\n
\see IAudioProcessor
\see IComponent*/
//------------------------------------------------------------------------
class IExtendedSpeakerArrangement: public FUnknown
{
public:
	//------------------------------------------------------------------------
	/** Try to set (from host) a predefined arrangement for inputs and outputs.
		The host should always deliver the same number of input and output buses than the Plug-in needs 
		(see \ref IComponent::getBusCount).
		The Plug-in returns kResultFalse if wanted arrangements are not supported.
		If the Plug-in accepts these arrangements, it should modify its buses to match the new arrangements
		(asked by the host with IComponent::getBusInfo () or IAudioProcessor::getExtBusArrangement ()) and then return kResultTrue.
		If the Plug-in does not accept these arrangements, but can adapt its current arrangements (according to the wanted ones),
		it should modify its buses arrangements and return kResultFalse. */
	virtual tresult PLUGIN_API setExtBusArrangements (ExtSpeakerArrangement* inputs, int32 numIns,
												      ExtSpeakerArrangement* outputs, int32 numOuts) = 0;

	/** Gets the bus arrangement for a given direction (input/output) and index.
		Note: IComponent::getBusInfo () and IAudioProcessor::getExtBusArrangement () should be always return the same 
		information about the buses arrangements. */
	virtual tresult PLUGIN_API getExtBusArrangement (BusDirection dir, int32 index, ExtSpeakerArrangement& arr) = 0;

//------------------------------------------------------------------------
	static const FUID iid;
};

What do you think about it ?

How about a more flexible audio channel layout description like in Audio Units instead?

Hi Stefan,
please describe what you mean with “flexible audio channel layout description like in Audio Units”, thank you.

IExtendedSpeakerArrangement would seem adequate to me. What would happen with the existing SetBusArrangements() when a plugin supports this interface - just doesn’t get called?. Maybe Daniel or Leo could comment since they work at research institutions that may be dealing with Very-HOA .

Hi Arne,
I would appreciate to get a layout description consisting of channel labels instead of just an index to predefined ones (same for set, of course).
So basically a vector of audio channel descriptions.

Would it be possible to add an option for just N discreet channels? I.e not specify an ambisonic bus? I think this is what most developers of multichannel plugins would appreciate

Sometimes it is good to know what the channels are, e.g. you may want to know what the lfe is if only in order to ignore it.

Would IExtendedSpeakerArrangement be equivalent to the VST2 behavior ?

Using VST2 and Juce, we used to declare a number of supported in/out configs, specifying only the number of channels, e.g., {2, 2}, {1, 2}, {6, 2}, {6, 6}…
Then the user could choose his loudspeaker layout directly in the plugin. That worked fine basically.

Would be nice to have a simple interface like this in VST3.

Thanks,

Charles

I second this entirely. Multichannel audio is not limited just to Ambisonics, we have now other competing formats such as Spatial PCM Sampling (SPS, aka Mach1), T-design geometries (T-format), MPEG-H, Dolby Atmos, etc.
Also the maximum number of channels should be unlimited: in Wave Field Synthesis it is common to use systems with 128 or even 192 channels. We have a WFS room here in Parma, Italy, inaugurated in 2008 (12 years ago) equipped with a single computer driving 192 channels. The WFS rendering cannot actually be done with a VST3 plugin, simply because the VST3 format does not allow for such channel number, whilst the host program (Plogue Bidule) has no problem managing the RME MADI interface providing them.
In some of these formats, such as MPEG-H and Dolby Atmos, we get a mixture of “loudspeaker” channels in fixed positions and a potentially large number of “sound objects”, which are mono tracks encoding sound sources moving around.
In some other cases, each of these moving sources is Ambisonics-encoded (perhaps just at first or second order) or SPS-encoded, representing the directivity of the sound source with a number of channels, which can then be moved in space and oriented in space (6DOF rendering).
In conclusion, what would be important is to allow for maximum flexibility, both on channel number and channel labelling.
Sticking to a reduced set of predefined “speaker layouts” is something which brings all us back by 20 years.
So Please Steinberg, remove all these limitations form the VST3 SDK…

I’d like to voice my support for the approach Oli and Angelo mentioned. It would make things a lot more flexible and allow a lot more formats to be supported.

I would also like to request N discreet channels

I see there’s a vote feature now, might be good if everyone still interested in a discrete channel layout, which is definitively necessary for both spatial audio and general research applications, would press that upvote button :slight_smile:

@Yvan @Arne_Scheffler Please tell us, how we can help to resolve this issue.
I have to admit I am not very familiar with the bare VST SDK, as I’ve always used wrappers likes JUCE. However I am happy to dive deeper into it and contribute to it!

The simplest immediate solution would be to take one currently unused bit, and make it mean “the rest of the word is just number of channels.”

This is essentially ExtSpeakerArrangement from above – but it doesn’t necessarily need any additional interface. Just jam the number of speakers in there, and let the user sort it out.