audio – How to patch sources in ffmpeg?

I’m sure this has been covered before, but it’s not making sense to me.

If I want to capture the far right of 3 screens (3 wide, 1 high) at 30fps, all of which are 1920×1080, I do this:

    -f x11grab -r 30 -s 1920x1080 -i :0.0+3840,0 

(based loosely on this)

That works, and produces a silent video. So far, so good.

Now I want to add a mono soundtrack to it, taken from channel 3 of a 32-channel USB interface, so I do this to start with???:

    -f alsa -ac 32 -i plughw:CARD=XUSB,DEV=0 
    -f x11grab -r 30 -s 1920x1080 -i :0.0+3840,0 

(based loosely on this)

I imagine that that would give me a video file with 32 uncompressed audio tracks. And once I see that working, I could add one more line to the command to filter out just the one that I want, and then another line or two to compress the audio and video. But as it is, it still gives me a silent video, and a bunch of “ALSA buffer xruns” in the terminal while it’s running.

I can’t re-patch the hardware to channel 1 (cropped screenshot shown) because channels 1&2 are a stereo pair for a different, simultaneous use, and that receiving app only cares about 1&2. So the broadcast must go there, and I need to pick channel 3 or higher to be the mono soundtrack of the additional recorded video.

enter image description here

I can’t use the broadcast app to record this, because the broadcast needs to be different from the recording, and that app only does one stream. If it wasn’t already tied up, then I could use it for this recording (with the audio patched to 1&2), and it would be dead simple.

But since all the components of the recording already exist, I figured I could just add some lines to the startup script to pull it all together behind the scenes. When the event is done, “Oh look! There’s the recording too! And it’s different from the broadcast, just like we wanted.”

I can’t imagine that no one has documented this particular use as a working example, albeit with possibly different numbers, but I can’t seem to find it.

My specific case is a meeting with some remote participants, with the broadcast feeding the remote people without looping their feeds back to them, and the recording needs to include everyone.
But I can see a nearly identical configuration used for gaming or software demonstrations, etc.

Recording audio alone does work, using arecord:

    --device=plughw:CARD=XUSB,DEV=0 --channels=32 --file-type=wav --format=S32_LE --rate=48000 

That gives me a 32-track wav file, all of which is correct, according to Audacity.
(that’s the only format that this interface supports – it just is what it is)

So that gives me a little bit of reassurance that it can work somehow. I just can’t seem to find a decent example to take channel 3 or higher as the mono soundtrack to a separate video source.