Mark Oliver's World

Posted: 01/04/2022

Microsoft Teams Compliance Recording _ Part 2

This is the second post in a series on Microsoft Teams Compliance Recording .

In the first post, I gave an overview of what Compliance Recording with MSTeams entails. With this post, I will talk a bit about the process of building a compliance policy bot.

The steps your bot need to take are:

  • Connect to your teams tenant
  • Initialise the graph api
  • Setup your certificate
  • Setup the web hooks for teams to contact us with call notifications and call updates
  • Setup the data socket tunnel for the audio and video to be sent
  • Initialise the media platform
  • Listen on the HTTP endpoints supplied to MSTeams

If everything is configured correctly, you will be able to receive audio and video

So if we assume that your policy bot can receive call notifications, and call updates, then the next stages are:

  • Pump the audio event queue quickly
  • Pump the video event queue quickly

The important of the "quickly" part above should not be overlooked.
The audio events are received at 50 events per second. If the events are not processed quickly enough, the MS media platform becomes unhealthy, and starts dropping audio packets.

The recommendation is that the audio packet event handler does very little. It should take the packets, and complete the event as soon as possible. Even writing the data to disk is too slow.
The best thing to do is to write the data packets to a memory based queue, that is then serviced in the background outside of the event processing.

The same is true for video, but in this case, we have 30 packets a second of video data. However the data is much larger than the audio packets, so although we receive less of them, we have a bigger overhead of processing them.

My preference for this work is to create an async background file writer (for the audio packets), which stores the packets in memory on a concurrent queue. Then a background thread runs and pushes the audio packets with there timestamps into a file.

Now for audio, reading the data back out to be played is not trivial, but is simpler than video (see future posts). We receive a constant stream of audio with no gaps (unless it is muted), therefore we can just play back the audio packets 1 at a time sequentially at the correct speed of recording.

We should however be using the timestamp that was supplied with each packet to ensure the audio is spaced out with the correct silence and synchronised with any other recorded elements (like the video).


Thanks for reading this post.

If you want to reach out, catch me on Twitter!

I am always open to mentoring people, so get in touch.