Issues with large bitmaps on Windows using VSTGUI >=4.10

Hi everybody,

There’s an issue with bitmaps that have a width and/or height of >=32768 pixels on Windows which has been the case ever since you switched to the Direct2D backend. CBitmaps with a greater size will be decoded and initialized properly (the underlying platform bitmap has the correct dimensions, a data pointer and so forth) but simply won’t be displayed by the Direct2D implementation of CDrawContext::drawBitmap(), which is annoying when using HiDPI graphics and e.g. a CAnimKnob because you quickly hit said size limit.

I have, however, arranged with it as it seems to be some internal/undocumented restriction in the underlying D2D API that performs the drawing. We’re talking about VSTGUI 4.9 here.

Now here’s the thing with VSTGUI 4.10 and where it starts to get a little complicated: In VSTGUI 4.9 there is an optional static function that allows to enable hardware accelerated rendering, i.e. useD2DHardwareRenderer(). I experimentally enabled the hardware renderer and noticed the size limit drops to <16384 pixels and thus never used it.

I see that this feature has moved into the PlatformFactory implementation, namely Win32Factory::useD2DHardwareRenderer (). However, some debugging revealed that this setting is not respected unless the underlying draw context is created using D2DDrawContext::createRenderTarget(), which under normal circumstances isn’t the case.

It seems that draw contexts bridged through iterateRegion in Win32Frame::paint() always have hardware rendering enabled implicitly which seems to have something to do with the switch to direct composition. In other words, in its current form VSTGUI does not support bitmaps with sizes >= 16384 pixels on windows regardless of the setting applied through Win32Factory::useD2DHardwareRenderer ()

Can you please look into this?

Best,
Ray

Hi Ray,
this seems to be the case, yes.
With the switch to DirectComposition the UI is always rendered via the GPU and it is not possible to turn it off.
I also don’t see a quick workaround for this at a first look.
Looking at the documentation it looks like that 16384 pixels is the hardware limit since a long time. So I guess this won’t get bigger over time.

So we need other ways to render such things as the CAnimKnob, because going back to render on the CPU should not be the goal.

There are multiple possible solutions:

  • Tile the bitmap into columns and rows (this will hit the limitations too at some day)
  • Use one bitmap per frame (numbers of files will grow)
  • Let the GPU calculate a rotation of the knob (may not produce the pixel perfect result)
  • Use a lower scaled bitmap and scale it on render (will hit limitations and is not pixel perfect either)

Any more ideas?

Cheers,
Arne

Thanks for the swift response and the clarification, Arne.

  • Tiling the bitmap would be only a temprary workaround, given that display pixel densities might get larger / size limitations may get even stricter in the future. I sometimes use background graphics that have multiple columns to switch between different states of an animation via the setBackOffset() function, so this would kind of interfere here, too - while we’re at it: Is there any reason this has been removed in the newer versions of VSTGUI??
  • Using a bitmap per frame isn’t particularly viable (imagine the resulting editor code or the case something has to be changed)
  • Letting the GPU calculate the rotation would be an option for generic controls, but doesn’t really work for pre-rendered 3D knobs
  • Using a lower resolution and then upscaling the graphic would defy the whole idea of having high DPI graphics

I don’t think any of these are really optimal, but I have another idea that’s along the lines of option 2, i.e. given that larger images can be decoded properly and it’s only drawing a slice of an image that’s too big that fails, why don’t we slice the background graphic and build e.g. a std::vector<SharedPointer<CBitmap>> that contains all the sub-frames in CAnimKnob::setBackground() and then also call that in the ctor and then access the approriate state in CAnimKnob::draw() so the CDrawContext only has to deal with rather small frames. Should be rather easy to do via the CBitmapPixelAccess API, don’t you think?

I also believe this woul be the most transparent solution in terms of existing GUI code.

Best,
Ray

I don’t think that using CBitmapPixelAccess or any other high level VSTGUI API is a good idea here, because this will all break the GPU path of data flow. What is needed is a way to split the WICBitmap (the one that can be loaded) into multiple chunks and upload these to the GPU. But I haven’t found a way to do this yet.

Hi Arne,

Just to avoid misunderstandings, I’m not talking about accessing the proper slice on the fly through CBitmapPixelAccess. Instead I’m suggesting to use it to precompute an array of CBitmaps.

Here’s a sketch:

class CAnimKnob : public CKnobBase, public IMultiBitmapControl
{
    ...
private:
    std::vector<SharedPointer<CBitmap>> slices;
};

...

void CAnimKnob::setBackground(CBitmap *background)
{
    CKnobBase::setBackground (background);
	if (heightOfOneImage == 0)
		heightOfOneImage = getViewSize ().getHeight ();
	if (background && heightOfOneImage > 0)
		setNumSubPixmaps ((int32_t)(background->getHeight () / heightOfOneImage));

    // Create the slices
    slices.clear();

    for(int i=0; i<getNumSubPixmaps (); i++)
    {
        auto slice = makeOwned<CBitmap>(getWidth(), heightOfOneImage);
        // Use CBitmapPixelAccess to memcpy the respective image slice from the background bitmap
        // ...
        slices.push_back(slice);
    }
}

void CAnimKnob::draw (CDrawContext *pContext)
{
	if (!slices.empty())
	{
        int i = 0;

		float val = getValueNormalized ();
		if (val >= 0.f && heightOfOneImage > 0.)
		{
			CCoord tmp = getNumSubPixmaps () - 1;
			if (bInverseBitmap)
                i = (int) floor ((1. - val) * tmp);
			else
				i = (int) floor (val * tmp);
		}

		slices[i]->draw (pContext, getViewSize (), CPoint(0, 0));
	}
	setDirty (false);
}

I don’t see in which way that differs from having an array of CBitmaps that are initialized from multiple resources representing the sub frames. Or are you pointing out that using the CBitmap::CBitmap (CCoord width, CCoord height) ctor will never upload the image data to the GPU on Windows?

If that’s the case then this would be yet another and deeper issue that should be addressed IMO, it would also imply that using e.g. said ctor or any of the stuff in CBitmapFilters would always result in CBitmaps that are rendered in software.

Here is an idea, though: If things have to be passed through the WIC decoder in order to be placed in GPU memory, wouldn’t it possible to build e.g. a BMP header chunk + image data (or some uncompressed format that supports alpha, perhaps TIFF??) via CBitmapPixelAccess and then pass that through Win32Factory::createBitmapFromMemory () so it’s decoded the usual way? Admittedly, this feels a little hacked, but would be easy to do and work out of the box using VSTGUI’s existing tools.

Best,
Ray

Hi,
in the meantime I think the solution is to add the possibility to add multiple bitmaps to the CAnimKnob class, to make this cross-platform aware. One bitmap contains as much frames as possible. So if you have a frame that is 170 x 170 pixels and you want to create a CAnimKnob with 100 frames, you must split the frames in 2 bitmaps. The first one holds 96 frames and the second one the remaining 4 frames. This has the benefit that the current optimal way to load bitmaps is not changed and no extra runtime cost is added to the opening of the editor. The small disadvantage is that this must be known to the developer and that the developer has to prepare this.
What do you think?

Hi Arne,

With all due respect, I don’t really think this is a very workable / elegant solution. Think of the work it would take developers to adapt existing projects, besides the fact that you’re mentioning…it’s a weird limitation that needs to be documented.
In that case I would even prefer the first solution you were suggesting, i.e. “stack” the frames into multiple columns if a row becomes too high. Albeit this might break in the future if the limitation becomes less on some of the supported platform…so might your latest idea, in that regard it doesn’t feel very “future-proof”, again no disrespect.

What’s wrong with my suggestion btw. (except that it would takes a few milliseconds to slice the image when a new CAnimKnob is created)?

I could also think of a CMultiBitmapContainer that internally uses something like my suggested std::vector<SharedPointer<CBitmap>> storage and which can be passed to things like CAnimKnob, CMovieBitmap etc. and which can either be created from a set of resources or a single, existing resource that’ll sliced internally to suit the platform limitations. This would help developers to bridge existing projects without much of a hassle, no?

Best,
Ray

What I don’t like about your solution is the runtime cost. Every use of the plug-in generates energy that normally the developer has to do once.
But I’m open for your solution, if you’d like to provide a working prototype I will take the time to integrate it.

Okay, gotcha. Using the suggested CMultiBitmapContainer data could be cached, though, e.g. once the first editor instance is created. Depending on the API it could also support different tilings or it could also be created using multiple resources as you suggested in your initial post. It could even support the row/columns approach, too. So I don’t think it needs to be either/or. I particularly don’t think an irregular tiling feels very intuitive/elegant.

Something along those lines:

class CMultiBitmapContainer : public CBaseObject
{
public:
	// Create image container via multiple resources
	CMultiBitmapContainer(const std::vector<CResourceDescriptor> &descriptors, const double scaleFactor = 1.0)
	{
		for(auto it=descriptors.begin(); descriptors.end(); ++it)
		{
			CResourceDescriptor desc = *it;
			auto bitmap = makeOwned<CBitmap>(desc);
			if(bitmap)
			{
				auto pbm = bitmap->getPlatformBitmap();
				if(pbm)
					pbm->setScaleFactor(scaleFactor);
				slices.push_back(bitmap);
			}
		}
	}

	// Slice an existing image according to the given tiling
	CMultiBitmapContainer(SharedPointer<CBitmap> &bitmap, int numRows, int numColumns)
	{
		for(int row=0; row<numRows; ++row)
		{
			for(int col=0; col<numRows; ++col)
			{
				// Slice source image via CBitmapPixelAccess according to the provided tiling
				...
			}
		}
	}

	~CMultiBitmapConteiner();

	int getNumSubPixmaps() const  { return (int) subPixMaps.size(); } // Return the number of sub frames
	const SharedPointer<CBitmap>& getSubPixmap(int i) { return i<getNumSubPixmaps()? subPixMaps[i]:nullptr; } // Return the i-th sub-frame
	
private:
	std::vector<SharedPointer<CBitmap>> subPixMaps; // Sliced sub-frames
};

OK, but how does this make the live for existing projects easier? This must be done manually by every developer, or am I missing something?

Sorry, it’s sometimes a little hard to express these things, precisely.

My idea was to extend CAnimKnob, CMovieBitmap etc. in a way that it allows for using CMultiBitmapContainer instead of a CBitmap as a background, so only projects in which the graphics size exceeds the size limitations need to be adapted, e.g. like this:

BEFORE:

auto hKnob = makeOwned<CBitmap>("knob.png");
CRect size(0, 0, hKnob->getWidth(), hKnob->getHeight() / 101);
auto knob = new CAnimKnob(size, listener, tag, hKnob);

AFTER:

auto hKnob = makeOwned<CBitmap>("knob.png");
CRect size(0, 0, hKnob->getWidth(), hKnob->getHeight() / 101);
auto hMultiBitmapContainer = makeOwned<CMultiBitmapContainer>(hKnob, 101, 1); 
auto knob = new CAnimKnob(size, listener, tag, hMultiBitmapContainer);

Only one line needs to be added where it’s required and no need for restructuring projects / re-slicing resources etc…

Now that I write about it…I think that from a software design point of view it’d be even more elegant to use a similar idea where CMultiBitmapContainer becomes a covariant of CBitmap that allows for doing the slicing internally, but which behaves just like a CBitmap in terms of its API (including a suitable draw method that looks up the appropriate slice). That way the respective draw methods wouldn’t have to provide multiple code paths. Something like class CLargeBitmap : public CBitmap ....

So it simply becomes and everything else is handeled internally:

auto hKnob = makeOwned<CLargeBitmap>("knob.png");
CRect size(0, 0, hKnob->getWidth(), hKnob->getHeight() / 101);
auto knob = new CAnimKnob(size, listener, tag, hKnob);

Or yet another idea: Provide a CLargeBitmap class that uses software rendering, so it’s up to the plugin developer to find his/her own workaround when GPU accelerated drawing is required in all circumstances.

Anyway, just some food for thought.

EDIT:

Perhaps the above idea should be handeled on the D2DBitmap implementation layer, i.e. the underlying data object “slices itself” once the size limitations are exceeded and uses an appropriate draw implementation to access the correct portion(s). That way all the control/view specific as well as existing plugin code could remain as is. Maybe that’s the idea you were referring to in Post No. 3?

This would be the ideal solution, but as long as I know, there’s no way to get a part of a WICBitmap onto the GPU without extracting it completely to memory before.

After thinking about it some more: What about patching CBitmap in a way that the tiling strategy you mentioned earlier, i.e. potentially multiple tiles of 16384 pixels length or whatever the specific platform limit is and remainder tiles for the borders, is applied there using CBitmapPixelAccess as outlined in my CMultiBitmapContainer sketch and then patching the draw function so that it’s capable of drawing cross the tile-bondaries if required depending on the offset?

This way stuff would be 100% transparent in terms of existing code and most of the time it’d work like it currently does. Only in cases where the bitmap size exceeds the platform limit would it need to create multiple PlatformBitmap tiles.

I don’t see what’s so bad about having to decode the entire bitmap once and then having to memcpy a couple megabytes around once the bitmap is created…a coded stream isn’t supposed to be random access, so I don’t see a way around it anyway, except for having the developer perform the tiling “offline” which I don’t think is particularly beautiful conceptually and a PITA for those how have to adapt dozens of existing projects.

Let me know if I should sketch it out in code a little more to convey my idea more clearly.

Just imagine when we have screens with a scale factor of 4 or 5. Now you have a stunning looking big knob which has a dimension of 150x150 Pixels per Frame for the unscaled case. You use 100 frames for the knob. So the bitmap has a dimension of 15000x150 Pixel. The bitmap for the 2x factor has 30000x150 Pixels. In this case the memory footprint for this uncompressed image is ~17 MB. For the 4x factor we have ~34MB. But now we see that turning the knob may be not looking good anymore with 100 frames so we start adding more frames so it looks nice. Just imaging we use 500 Frames now. This now becomes 170 MB for 4x. If the user has two screens, one with 4x and one with 2x we would need 255 MB just for one image. And normally a plug-in has more than just one of these bitmaps.
This just does not feel right.

I think that there must be some way for Direct2D or Direct3D to load uncompressed image data. So one possible solution could be to uncompress the bitmap to disk once and then upload the part of the image needed to the GPU from storage.

So just looking up the API there is ID2D1Bitmap::CopyFromMemory. So I think a solution could be to uncompress the bitmap to disk somewhere and then map the file into memory for uploading it to the GPU.
As I don’t think that one could upload all frames to the GPU (not enough memory), this will come with a rendering trade off.

Hi Arne,

Gotcha, thanks for your feedback…my point was that the copying around has to be done only once if done right, e.g. when the module is loaded. Writing to VRAM could probably be done using bursts, so I don’t know whether the latency would be an issue but I’m not trying to argue.

I guess the best trade-off would be solution 1 from your initial post (rows and columns). The CAnimKnob/CMovieBitmap etc. ctors could then be extended by optional rows/cols arguments and the draw methods could be adapted easily and the re-organization overhead on the developers’ end would become minimal. 16384 x 16384 pixels should be plenty of room to play with in terms of high resolution images / animations, no?

Best,
Ray

Yes I think so too.
I will try to get this into the next release. Or if you’d like to create a PR on GitHub…

Hi Arne,

As far as a potential PR goes, I’m quite busy at the moment, but do you have any estimate on when the next SDK update is going to be released?

I’m thinking about adapting the IMultiBitmapControl definition/implementation to feature the following API extensions / changes and then adapting CAnimKnob, CMovieBitmpap, CAutoAnimation and so forth (untested!):

class IMultiBitmapControl
{
public:
	virtual ~IMultiBitmapControl() {}
	
	virtual void setWidthOfOneImage (const CCoord& width) { widthOfOneImage = width; }
	virtual CCoord getWidthOfOneImage () const { return widthOfOneImage; }
	
	virtual void setHeightOfOneImage (const CCoord& height) { heightOfOneImage = height; }
	virtual CCoord getHeightOfOneImage () const { return heightOfOneImage; }
	
	virtual void setNumSubPixmaps (int32_t numSubPixmaps) { subPixmaps = numSubPixmaps; }
	virtual int32_t getNumSubPixmaps () const { return subPixmaps; }

	virtual void setSize(CPoint size) { if(size.x > 0. && size.y > 0. ) { width = size.x; height = size.y; } }
	
	// Required / Feasible?
	virtual void autoComputeSizeOfOneImage ();
	
	virtual void setBackOffset(const CPoint& point) { if(backOffset != point) { backOffset = point; invalid(); } }
	virtual void setInverseBitmap (bool bVal = true) { if(bInverseBitmap != bVal) { bInverseBitmap = bVal; invalid (); } }

	// Compute the proper offset within the background image (hardcoded so the images are arranged in columns)
	virtual CPoint getCPointForValue (float value) const
	{
		CPoint where = backOffset;
		if (value >= 0.f && value <= 1.f
		&& width > 0. && height > 0.
		&& widthOfOneImage > 0. && heightOfOneImage > 0.)
		{
			if(bInverseBitmap)
				value = 1. - value;
		
			int32_t tmp = floor(value * (getNumSubPixmaps () - 1)) * heightOfOneImage;
		
			where.y += tmp % height;
			where.x += tmp / height * widthOfOneImage % width;
		}
	}
	
protected:
	IMultiBitmapControl () :
		width (0),
		height (0),
		widthOfOneImage (0),
		heightOfOneImage (0),
		subPixmaps (0),
		bInverseBitmap (false)		
	{
	}
	
	CCoord width;
	CCoord height;
	 
	CCoord widthOfOneImage;
	CCoord heightOfOneImage;
	int32_t subPixmaps;
	
	CPoint backOffset{};
	bool bInverseBitmap;
};

Then in e.g. CAnimKnob one could do something like this:

void CAnimKnob::draw (CDrawContext *pContext)
{
	if (getDrawBackground ())
	{
		CPoint where  = getCPointForValue(getValueNormalized ());

		getDrawBackground ()->draw (pContext, getViewSize (), where);
	}
	setDirty (false);
}

Let me know if that makes sense.

We currently hacked draw() for animKnob to use a 16xn table of frames if number of frames is >= 16.

Arne added a fix for this issue in the form of CMultiFrameBitmap , see here vstgui/cbitmap.h at develop · steinbergmedia/vstgui · GitHub - so it seems it dind’t make its way into the latest SDK update.