Skip to content

Conversation

@pelwell
Copy link
Contributor

@pelwell pelwell commented Dec 18, 2025

commit 2c1fd53 upstream.

Data loss on serial line was observed during communication through serial ports ttyAMA1 and ttyAMA2 interconnected via RS485 transcievers. Both ports are in one BCM2711 (Compute Module CM40) and they share the same interrupt line.

The problem is caused by long waiting for tx queue flush in the function pl011_rs485_tx_stop. Udelay or mdelay are used to wait. The function is called from the interrupt handler. If multiple devices share a single interrupt line, late processing of pending interrupts and data loss may occur. When operation of both devices are synchronous, collisions are quite often.

This rework is based on the method used in tty/serial/imx.c Use hrtimer instead of udelay and mdelay calls.
Replace simple bool variable rs485_tx_started by 4-state variable rs485_tx_state.

Tested-by: Lino Sanfilippo [email protected]

Link: https://lore.kernel.org/r/[email protected]

@nbuchwitz
Copy link
Contributor

hrtimer_init(&uap->trigger_start_tx, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
hrtimer_init(&uap->trigger_stop_tx, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
uap->trigger_start_tx.function = pl011_trigger_start_tx;
uap->trigger_stop_tx.function = pl011_trigger_stop_tx;

is missing in pl011_axi_probe() in order to make this work with rp1

@pelwell pelwell force-pushed the rs485fix branch 2 times, most recently from 1a1d94f to b647e7c Compare January 23, 2026 14:54
@pelwell
Copy link
Contributor Author

pelwell commented Jan 23, 2026

Like this?

@nbuchwitz
Copy link
Contributor

Yes, that's the same as what I tested here.

There is still a significant performance regression (essentially unusable) with our RS485-based backplane driver. I would really appreciate holding off on the merge until we have investigated this further. Otherwise, we would need to revert this downstream.

@pelwell pelwell marked this pull request as draft January 23, 2026 15:51
@pelwell
Copy link
Contributor Author

pelwell commented Jan 23, 2026

It's going nowhere for now.

@nbuchwitz
Copy link
Contributor

nbuchwitz commented Jan 27, 2026

I've put together a workaround that brings back stable functionality for our serdev-based backplane driver:

https://gitlab.com/revolutionpi/linux/-/commit/c3cae2df785ea604b49addfd2f7d75159e91475a

Since we'd like to get as much as possible into upstream, I'm still figuring out how to frame this in a way that makes sense for mainline. Before posting to the mailing list, I wanted to check if you or Jonathan have any thoughts on this.
Any feedback would be much appreciated!

@nbuchwitz
Copy link
Contributor

I recently learned that converting a PR to draft status disables notifications for subscribers. Could you confirm whether you received my earlier comment, @pelwell

@pelwell
Copy link
Contributor Author

pelwell commented Feb 2, 2026

I recently learned that converting a PR to draft status disables notifications for subscribers.

Does it? That doesn't sound correct.

Yes, I did see the comment, but I was in the middle of something engrossing and haven't got back to it. I did take a look at the patch and thought it was not a simple "yes" (which is why you asked the question). But in practice, I trust that you wouldn't be proposing a complex solution if you knew of a simple one, and you've spent far longer looking at this problem than I have.

Turn your patch into a PR and we can encourage people to try it.

@6by9
Copy link
Contributor

6by9 commented Feb 2, 2026

There are almost always compromises with RS485 transceivers on UARTS that give a FIFO empty rather than shift register empty interrupt.

The only solution I've seen reliably work for 2-wire RS485 without spinning is to stuff an extra byte into the FIFO when you hit the end of a message. Disable RTS when that extra byte is loaded into the shift register and therefore it is effectively discard it because the RS485 TX isn't active. (That worked on Z8530 and 8250 UARTs)
You would need to ensure that an extra data submitted to the UART during that dummy byte period wasn't immediately loaded into the FIFO and enabled RTS.

I haven't fully thought through whether that is another option here.

@nbuchwitz
Copy link
Contributor

There are almost always compromises with RS485 transceivers on UARTS that give a FIFO empty rather than shift register empty interrupt.

The only solution I've seen reliably work for 2-wire RS485 without spinning is to stuff an extra byte into the FIFO when you hit the end of a message. Disable RTS when that extra byte is loaded into the shift register and therefore it is effectively discard it because the RS485 TX isn't active. (That worked on Z8530 and 8250 UARTs) You would need to ensure that an extra data submitted to the UART during that dummy byte period wasn't immediately loaded into the FIFO and enabled RTS.

I haven't fully thought through whether that is another option here.

Thanks! Bit of a bummer if we lose DMA - it finally works with RP1 and PL011. But yeah, worth exploring.

miroslav-ondra and others added 3 commits February 3, 2026 14:42
commit 2c1fd53 upstream.

Data loss on serial line was observed during communication through
serial ports ttyAMA1 and ttyAMA2 interconnected via RS485 transcievers.
Both ports are in one BCM2711 (Compute Module CM40) and they share
the same interrupt line.

The problem is caused by long waiting for tx queue flush in the function
pl011_rs485_tx_stop. Udelay or mdelay are used to wait.
The function is called from the interrupt handler. If multiple devices
share a single interrupt line, late processing of pending interrupts
and data loss may occur. When operation of both devices are synchronous,
collisions are quite often.

This rework is based on the method used in tty/serial/imx.c
Use hrtimer instead of udelay and mdelay calls.
Replace simple bool variable rs485_tx_started by 4-state variable
rs485_tx_state.

Tested-by: Lino Sanfilippo <[email protected]>
Signed-off-by: Miroslav Ondra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
pl011_axi_probe was missing the equivlent hrtimer initialisation
from commit 2c1fd53 ("serial: amba-pl011: Fix RTS handling in RS485 mode")
and commit 8cb4418 ("serial: amba-pl011: Switch to use hrtimer_setup()")
resulting in the kernel blowing up as soon as pl011_rs485_stop_tx
tried to use them.

Add the hrtimer initialisation.

Fixes: 120c89e ("serial: pl011: rp1 uart support")
Signed-off-by: Dave Stevenson <[email protected]>
The hrtimer-based TX drain introduced in commit 2c1fd53 ("serial:
amba-pl011: Fix RTS handling in RS485 mode") can cause RX errors in
fast request/response RS485 protocols. The hrtimer callback latency
(often 50-100µs per iteration) delays RX re-enable, causing the first
bytes of slave responses to be lost.

Add a configurable bounded polling mode that spins for TX completion
with a timeout, falling back to hrtimer only if the timeout is exceeded.
This is controlled by two new device tree properties:

  rs485-tx-drain-poll    - enable bounded polling mode
  rs485-tx-drain-timeout-us - explicit timeout in microseconds (optional)

When rs485-tx-drain-poll is set without an explicit timeout, the timeout
is auto-calculated as (fifo_size + 1) * character_time, accounting for
the FIFO depth plus one character in the shift register. This adapts
automatically to baud rate changes.

The default behavior (no DT properties) remains unchanged, preserving
the pure hrtimer approach for systems where shared IRQ latency is the
primary concern.

Signed-off-by: Nicolai Buchwitz <[email protected]>
@pelwell
Copy link
Contributor Author

pelwell commented Feb 3, 2026

Updated with @nbuchwitz's commit from #7221.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants