Skip to content

Fully configure frame processors when they are used directly on an audio stream#679

Open
1egoman wants to merge 27 commits into
mainfrom
frame-processor-on-audio-stream
Open

Fully configure frame processors when they are used directly on an audio stream#679
1egoman wants to merge 27 commits into
mainfrom
frame-processor-on-audio-stream

Conversation

@1egoman

@1egoman 1egoman commented May 20, 2026

Copy link
Copy Markdown
Contributor

Updates the python sdk so that FrameProcessor-based noise cancellation providers can be used directly on AudioStream, without having to go through the agent's RoomIO to be able to initialize itself with credentials.

For example, with this change, something like the below becomes possible:

stream = rtc.AudioStream.from_track(                                                                                                                   
    track=track,
    sample_rate=SAMPLE_RATE,                                             
    num_channels=CHANNELS,
    noise_cancellation=ai_coustics.audio_enhancement(model=ai_coustics.EnhancerModel.QUAIL_VF_L)  ,
) 

The way this works - Tracks now keep track of which room they are part of (holding a weakref value). When the room a track is in changes, it computes new frame processor options and sends these to any AudioStreams which are associated with the track.

The noise_cancellation_leave_open parameter allows the agents sdk to call this from_track method with a frame processor which remains open across the whole session, and won't be auto-closed when the track is closed.

This goes along with livekit/agents#5867, which removes the relevant event handling logic in the agents sdk. I will follow up with a node version of this once the python one is in a good state.

Todo

  • Add some tests for this newly added behavior

@1egoman 1egoman force-pushed the frame-processor-on-audio-stream branch from 3e5a9ab to f62c247 Compare May 26, 2026 15:15
Comment thread livekit-rtc/livekit/rtc/track.py Outdated
@1egoman 1egoman marked this pull request as ready for review May 26, 2026 21:25
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread livekit-rtc/livekit/rtc/audio_stream.py Outdated
num_channels: int = 1,
frame_size_ms: int | None = None,
noise_cancellation: Optional[NoiseCancellationOptions | FrameProcessor[AudioFrame]] = None,
noise_cancellation_leave_open: bool = False,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
noise_cancellation_leave_open: bool = False,

Can we move that inside NoiseCancellationOptions?

@1egoman 1egoman May 28, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no - this is important to the FrameProcessor[AudioFrame] side of that noise_cancellation union. Open to putting it somewhere else but it needs to be settable in the FrameProcessor path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, not sure if it's a good idea, but could it be a field on the FrameProcessor interface instead?

Then we could add it to NoiseCancellationOptions and new FrameProcessors would be able to set it on the processor itself

@1egoman 1egoman May 29, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a setting that a frame processor would always want to have set or not have set, so I'm not sure that would really make sense either.

For context, the reason this is here is so the agents sdk can reuse a single FrameProcessor across multiple underlying tracks. Previously, this wasn't a problem in the way this used to work, because the agents sdk had the responsibility of closing the FrameProcessor, so it could easily do it at room disconnection time. But in order to support the ability to use FrameProcessors directly on an AudioStream, calling close needs to be pushed down deeper than the agents sdk layer. This flag allows the caller to explictly tell AudioStream that they will manage cleaning up the FrameProcessor so that both use cases can continue to work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this flag is not really configuring the noise suppression behavior, but how AudioStream deals with its own noise suppression, maybe the naming of noise_cancellation_leave_open is a bit confusing ?

how about close_noise_cancellation_on_stream_close or manage_noise_cancellation_processor ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a setting that a frame processor would always want to have set or not have set

it could stay undefined by default? 🤷
I understand however that it feels a bit weird for it to live on the processor if the processor itself doesn't really use the field.

We shortly discussed also the option to introduce a restart method on the processor. I think this could still be a viable alternative?

@1egoman 1egoman Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shortly discussed also the option to introduce a restart method on the processor. I think this could still be a viable alternative?

It could, but the con there is it's a breaking api change to FrameProcessor.

Just generally, I want to understand what folks' concerns are in more detail. Is it just the noise_cancellation_ prefix naming like shijing suggested (I think out of the two suggestions, I like manage_noise_cancellation_processor better)? Or is there something deeper behavior wise that is concerning?

FWIW, two fairly similar patterns I found:

  • LiveKitAPI conditionally controls aiohttp.ClientSession cleanup here based on whether the user passes a custom session or uses an inbuilt session.
  • The LocalAudioTrack has a userProvidedTrack parameter which is used to control whether the track is cleaned up or not here.

@1egoman 1egoman Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @lukasIO in a 1:1 and he confirmed his concern was mostly with the naming, not with the broad approach, which is helpful.

A few other name ideas, in addition to shijing's suggestions (close_noise_cancellation_on_stream_close / manage_noise_cancellation_processor) - some of these would involve flipping the flag:

  • shared_noise_cancellation
  • noise_cancellation_externally_managed
  • auto_close_noise_cancellation
  • owns_noise_cancellation

Out of the above, I think I like auto_close_noise_cancellation the best:

# Usage within agents sdk:
AudioStream.from_track(
    # ...
    noise_cancellation=frame_processor,
    auto_close_noise_cancellation=False,
)

I'm going to update the pull request to use it for now in 8d5e656.


Another possible idea: maybe something like the below could be a different way to package the same data which could better contain it. In a world like this, noise_cancellation would be of type Union[NoiseCancellationOptions, FrameProcessorOptions, FrameProcessor]:

AudioStream.from_track(
    # ...
    noise_cancellation=FrameProcessorOptions(frame_processor=self, leave_open=True)
)

Do any of these ideas look better than the current state?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theomonnom @xianshijing-lk Lukas gave a a general 👍 to the rename to auto_close_noise_cancellation addressing his concern. Do either of you have further concerns beyond what this rename could accomplish with this approach?

If not / I don't hear anything in the next few days I think I am good to merge this.

1egoman added 18 commits June 8, 2026 12:23
…io stream

And extracting metadata from that room that can be fed into the frame
processor.
The agents sdk can pass this opt-out flag so that it can reuse the frame
processor across many audio tracks
Need to think about this a bit more, this pattern as written won't work,
since the FrameProcessor today can't have a set of no-op credentials
pushed.
…Processor methods, and use them when moving a track out of a room
1egoman added 3 commits June 8, 2026 12:24
@1egoman 1egoman force-pushed the frame-processor-on-audio-stream branch from 8d5e656 to 8e3a461 Compare June 8, 2026 16:27
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

1egoman added 3 commits June 8, 2026 13:06
…ll reconnect

When the room does a full reconnect, make sure the audiostream metadata
gets a new push with the updated track sid
Comment on lines +781 to +789
if republished.track is not None:
# Keep the local-track invariant (track.sid == publication.sid,
# set at publish_track) intact across republish, then re-push
# metadata so any attached FrameProcessor learns the new
# publication SID / credentials. _set_room with the same room
# is a no-op for the token_refreshed listener but re-fans the
# metadata to every registered AudioStream.
republished.track._info.sid = republished.sid
republished.track._set_room(self)

@1egoman 1egoman Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to any reviewers: the Devin bot surfaced this case where when a full reconnect happens, tracks would not have their metadata kept up to date. I'm not sure if republished.track._info.sid = republished.sid is a good pattern, but wanted to give an opportunity for folks to weigh in on these final late breaking changes before merging.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't dive too deep into this, just a thought: if it's about the metadata being outdated, shouldn't we override all of the track's info ?

devin-ai-integration[bot]

This comment was marked as resolved.

1egoman added 2 commits June 18, 2026 15:55
…handlers once

Previously, calling track._set_room(None) twice would call the handlers
twice.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment on lines +762 to 770
if unpublished.track is not None:
unpublished.track._set_room(None)
# Mirror track_unsubscribed: drop the publication's track
# reference. This also makes unpublish_track's own
# _set_room(None) a no-op when it loses the race (its
# `publication._track is not None` guard short-circuits),
# avoiding a redundant clear.
unpublished._track = None
self.emit("local_track_unpublished", unpublished)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 local_track_unpublished event now emits with publication.track = None — behavioral change

Before this PR, the local_track_unpublished room-event handler emitted the event with publication.track still set (the track was only nulled later by unpublish_track). Now at room.py:769, unpublished._track = None is set BEFORE the emit at room.py:770. This is intentional (the comment explains it mirrors track_unsubscribed and prevents a redundant _set_room(None) in the race), and existing callers in this repo only access publication.sid. However, any downstream consumer (e.g. agents SDK) that accesses publication.track inside a local_track_unpublished callback will now always see None, where previously it was sometimes set. Unlike track_unsubscribed (which passes the track as a separate argument), this event provides no alternative way to access the track.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of the same thing that was surfaced here just flipped around?

I don't really have the depth of knowledge on this code to know for sure what to do here. Here's what a LLM had to say about this whole situation:

Is it actually a breaking change? Not meaningfully.

  Three reasons:

  1. It was never reliable before. Because of the documented unpublish race, unpublish_track could null publication._track before the room event handler
  ran. So pre-PR, a callback reading publication.track already saw None sometimes (race-dependent). Any correct consumer already had to handle None. We've
  gone from "racy/sometimes-set" to "always None" — not from "guaranteed-set" to "None."
  2. No consumer reads it. I grepped both repos for local_track_unpublished handlers:
    - python-sdks: tests/rtc/test_e2e.py and examples/basic_room.py — both read only publication.sid.
    - agents SDK: doesn't subscribe to local_track_unpublished at all.        

  So there's zero actual breakage today.                                      
  3. The event never offered guaranteed track access anyway. Unlike track_unsubscribed (which passes the track as a separate emit arg),
  local_track_unpublished only emits the publication. So publication.track was the only route, and it was always racy.

  Net: it's an observable behavior change to a public event's payload, but not a break that affects any real consumer, and it removes only an
  already-unreliable capability. I'd be comfortable merging as-is (optionally noting it in a changelog).

  If you want to be conservative, two options                                 

  - Reorder: null after the emit (one line). _on_room_event is synchronous and emit is synchronous, so the null still completes before any other coroutine
  (including unpublish_track) can interleave — meaning the redundant-clear avoidance still holds, and the callback can read publication.track in the
  non-racy case (restoring the exact pre-PR behavior). This is the minimal way to make Devin's concern moot without changing the event signature.
  - Add the track as a second emit arg (like track_unsubscribed): gives reliable access, but it's a public event-signature change — bigger, and I would not
  fold that into this PR.                                                     

  My recommendation: leave it as-is (nulling before emit is the more honest semantic — "the track is gone"), since nothing consumes it. But if you'd rather
  not introduce any behavior delta at all, the after-emit reorder is a safe one-liner

I'm tempted to leave it alone since this very minor "breaking change" I think is actually now in practice more correct. But I'm open to alternate perspectives here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't the alternative be to simply null it after emitting the event? for sync event handlers that should keep the previous behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants