You are probably trying to do too much configuring, but it’s hard to say from here. Playback works exactly as it does in a regular session, the system uses skeuomorphic design- it’s modeled after an actual studio. Engineer in the Control Room wearing headphpones, performer in a recording booth wearing headphones. Conceptually everything follows from there.
Whatever your experience has been, you should start from scratch with an empty project by running Setup VST Connect from the VST Cloud menu in Cubase. Do not turn on any track monitors. and make sure that you are not monitoring an input channel via your audio interface.
You have to read the manual, sorry, I know many people are annoyed by hearing that, but in this case you must. I don’t mean looking something up when you are in trouble, but reading it from end to end- it’s not all that long. If any images don’t match current versions don’t worry, the concept is the same.
It sounds like you don’t understand how the routing works, so study the explanation and images at http://connectvst.com, don’t skim it, and don’t use it as a reference only when things go wrong, if that’s what you’re doing.
Here’s couple hints, and I’m just repeating what I’ve read here-
VST Connect handles everything
No track monitors allowed,
No direct monitoring of the audio interface inputs.