Re: [PATCH v16 1/2] pwm: add microchip soft ip corePWM driver

From: Conor Dooley
Date: Tue Apr 11 2023 - 09:57:25 EST


Hey Uwe,

On Tue, Apr 11, 2023 at 12:55:47PM +0200, Uwe Kleine-König wrote:
> On Tue, Apr 11, 2023 at 09:56:34AM +0100, Conor Dooley wrote:
> > Add a driver that supports the Microchip FPGA "soft" PWM IP core.
> >
> > Signed-off-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
> > ---
> > drivers/pwm/Kconfig | 10 +
> > drivers/pwm/Makefile | 1 +
> > drivers/pwm/pwm-microchip-core.c | 509 +++++++++++++++++++++++++++++++
> > 3 files changed, 520 insertions(+)
> > create mode 100644 drivers/pwm/pwm-microchip-core.c
> >
> > diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> > index dae023d783a2..f42756a014ed 100644
> > --- a/drivers/pwm/Kconfig
> > +++ b/drivers/pwm/Kconfig
> > @@ -393,6 +393,16 @@ config PWM_MEDIATEK
> > To compile this driver as a module, choose M here: the module
> > will be called pwm-mediatek.
> >
> > +config PWM_MICROCHIP_CORE
> > + tristate "Microchip corePWM PWM support"
> > + depends on SOC_MICROCHIP_POLARFIRE || COMPILE_TEST
> > + depends on HAS_IOMEM && OF
> > + help
> > + PWM driver for Microchip FPGA soft IP core.
> > +
> > + To compile this driver as a module, choose M here: the module
> > + will be called pwm-microchip-core.
> > +
> > config PWM_MXS
> > tristate "Freescale MXS PWM support"
> > depends on ARCH_MXS || COMPILE_TEST
> > diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
> > index 7bf1a29f02b8..a65625359ece 100644
> > --- a/drivers/pwm/Makefile
> > +++ b/drivers/pwm/Makefile
> > @@ -34,6 +34,7 @@ obj-$(CONFIG_PWM_LPSS_PCI) += pwm-lpss-pci.o
> > obj-$(CONFIG_PWM_LPSS_PLATFORM) += pwm-lpss-platform.o
> > obj-$(CONFIG_PWM_MESON) += pwm-meson.o
> > obj-$(CONFIG_PWM_MEDIATEK) += pwm-mediatek.o
> > +obj-$(CONFIG_PWM_MICROCHIP_CORE) += pwm-microchip-core.o
> > obj-$(CONFIG_PWM_MTK_DISP) += pwm-mtk-disp.o
> > obj-$(CONFIG_PWM_MXS) += pwm-mxs.o
> > obj-$(CONFIG_PWM_NTXEC) += pwm-ntxec.o
> > diff --git a/drivers/pwm/pwm-microchip-core.c b/drivers/pwm/pwm-microchip-core.c
> > new file mode 100644
> > index 000000000000..0a69ec376c51
> > --- /dev/null
> > +++ b/drivers/pwm/pwm-microchip-core.c
> > @@ -0,0 +1,509 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * corePWM driver for Microchip "soft" FPGA IP cores.
> > + *
> > + * Copyright (c) 2021-2023 Microchip Corporation. All rights reserved.
> > + * Author: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
> > + * Documentation:
> > + * https://www.microsemi.com/document-portal/doc_download/1245275-corepwm-hb
> > + *
> > + * Limitations:
> > + * - If the IP block is configured without "shadow registers", all register
> > + * writes will take effect immediately, causing glitches on the output.
> > + * If shadow registers *are* enabled, a write to the "SYNC_UPDATE" register
> > + * notifies the core that it needs to update the registers defining the
> > + * waveform from the contents of the "shadow registers".
>
> You only write once to the sync update register (i.e. during probe). So
> that register specifies that a period should be completed before a new
> setting becomes active, right?

Correct.

> Even with sync update this is still racy, right?

I assume the period ticking over as we are updating the values is your
concern here. I'm not sure that there's all that much we can do about
that, so I guess I shall update the comment.
Perhaps writing out period_steps and prescale should be done after
computing the new duty cycle to reduce the possible window since that
may require an expensive division on a 32-bit arch?

> > + * - The IP block has no concept of a duty cycle, only rising/falling edges of
> > + * the waveform. Unfortunately, if the rising & falling edges registers have
> > + * the same value written to them the IP block will do whichever of a rising
> > + * or a falling edge is possible. I.E. a 50% waveform at twice the requested
> > + * period. Therefore to get a 0% waveform, the output is set the max high/low
> > + * time depending on polarity.
> > + * If the duty cycle is 0%, and the requested period is less than the
> > + * available period resolution, this will manifest as a ~100% waveform (with
> > + * some output glitches) rather than 50%.
>
> The last paragraph refers to negedge = 0, posedge = 0 and period_steps =
> 0?

Yes. Although, I did some poking around with it just now & that actually
only happens if prescale is also 0.
If it is non-zero, get to see some other "interesting behaviour" where
the period becomes gigantic - for example @ prescale = 0x3, the period
becomes about a quarter of a second w/ a 50% duty cycle. clk_rate is
62.5 MHz. I'd need to dig out the RTL to justify that one!

I've just gone and made apply() return -EINVAL for this, which the
subsystem does for requests of zero periods.

> > + * - The PWM period is set for the whole IP block not per channel. The driver
> > + * will only change the period if no other PWM output is enabled.
> > + */
>
> > +static void mchp_core_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm,
> > + bool enable, u64 period)
> > +{
> > + struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
> > + u8 channel_enable, reg_offset, shift;
> > +
> > + /*
> > + * There are two adjacent 8 bit control regs, the lower reg controls
> > + * 0-7 and the upper reg 8-15. Check if the pwm is in the upper reg
> > + * and if so, offset by the bus width.
> > + */
> > + reg_offset = MCHPCOREPWM_EN(pwm->hwpwm >> 3);
> > + shift = pwm->hwpwm & 7;
> > +
> > + channel_enable = readb_relaxed(mchp_core_pwm->base + reg_offset);
> > + channel_enable &= ~(1 << shift);
> > + channel_enable |= (enable << shift);
> > +
> > + writel_relaxed(channel_enable, mchp_core_pwm->base + reg_offset);
> > + mchp_core_pwm->channel_enabled &= ~BIT(pwm->hwpwm);
> > + mchp_core_pwm->channel_enabled |= enable << pwm->hwpwm;
> > +
> > + /*
> > + * Notify the block to update the waveform from the shadow registers.
> > + * The updated values will not appear on the bus until they have been
> > + * applied to the waveform at the beginning of the next period.
> > + * This is a NO-OP if the channel does not have shadow registers.
> > + */
>
> The code doesn't match the comment. I think that is a relict from the
> times when we thought that a trigger was necessary to update the
> operating settings from the shadow registers?

Yeah, I read this back to myself before sending v15 & thought that it
didn't need to be changed. I think removing the first line should go.

>
> > + if (mchp_core_pwm->sync_update_mask & (1 << pwm->hwpwm))
> > + mchp_core_pwm->update_timestamp = ktime_add_ns(ktime_get(), period);
> > +}
> > +
> > +static void mchp_core_pwm_wait_for_sync_update(struct mchp_core_pwm_chip *mchp_core_pwm,
> > + unsigned int channel)
> > +{
> > + /*
> > + * If a shadow register is used for this PWM channel, and iff there is
> > + * a pending update to the waveform, we must wait for it to be applied
> > + * before attempting to read its state. Reading the registers yields
> > + * the currently implemented settings & the new ones are only readable
> > + * once the current period has ended.
> > + */
> > +
> > + if (mchp_core_pwm->sync_update_mask & (1 << channel)) {
> > + ktime_t current_time = ktime_get();
> > + s64 remaining_ns;
> > + u32 delay_us;
> > +
> > + remaining_ns = ktime_to_ns(ktime_sub(mchp_core_pwm->update_timestamp,
> > + current_time));
> > +
> > + /*
> > + * If the update has gone through, don't bother waiting for
> > + * obvious reasons. Otherwise wait around for an appropriate
> > + * amount of time for the update to go through.
> > + */
> > + if (remaining_ns <= 0)
> > + return;
> > +
> > + delay_us = DIV_ROUND_UP_ULL(remaining_ns, NSEC_PER_USEC);
> > + fsleep(delay_us);
> > + }
>
> There is no way to query the hardware if there is still an update
> pending, right?

Hah, no. This IP is about as old as I am & appears to have been written
with keeping the FPGA utilisation % to a minimum. No such luxuries!

> Maybe that's possible implicitly by memoizing the
> expected read value? For me the current approach is fine enough though.
> This can be addressed in the future if needed.
>
> > +static u64 mchp_core_pwm_calc_duty(const struct pwm_state *state, u64 clk_rate,
> > + u8 prescale, u8 period_steps)
> > +{
> > + u64 duty_steps, tmp;
> > +
> > + /*
> > + * Calculate the duty cycle in multiples of the prescaled period:
> > + * duty_steps = duty_in_ns / step_in_ns
> > + * step_in_ns = (prescale * NSEC_PER_SEC) / clk_rate
> > + * The code below is rearranged slightly to only divide once.
> > + */
> > + tmp = (prescale + 1) * NSEC_PER_SEC;
> > + duty_steps = mul_u64_u64_div_u64(state->duty_cycle, clk_rate, tmp);
> > +
> > + return duty_steps;
> > +}
> > +
> > +static void mchp_core_pwm_apply_duty(struct pwm_chip *chip, struct pwm_device *pwm,
> > + const struct pwm_state *state, u64 duty_steps,
> > + u16 period_steps)
> > +{
> > + struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
> > + u8 posedge, negedge;
> > + u8 first_edge = 0, second_edge = duty_steps;
> > +
> > + /*
> > + * Setting posedge == negedge doesn't yield a constant output,
> > + * so that's an unsuitable setting to model duty_steps = 0.
> > + * In that case set the unwanted edge to a value that never
> > + * triggers.
> > + */
> > + if (duty_steps == 0)
> > + first_edge = period_steps + 1;
> > +
> > + if (state->polarity == PWM_POLARITY_INVERSED) {
> > + negedge = first_edge;
> > + posedge = second_edge;
> > + } else {
> > + posedge = first_edge;
> > + negedge = second_edge;
> > + }
> > +
> > + writel_relaxed(posedge, mchp_core_pwm->base + MCHPCOREPWM_POSEDGE(pwm->hwpwm));
> > + writel_relaxed(negedge, mchp_core_pwm->base + MCHPCOREPWM_NEGEDGE(pwm->hwpwm));
>
> Is this racy with sync update implemented in the firmware? A comment
> about how the sync update is implemented would be good.

Unless this is a different fear of racing, see above.

> > +}
> > +
> > +static int mchp_core_pwm_calc_period(const struct pwm_state *state, unsigned long clk_rate,
> > + u16 *prescale, u16 *period_steps)
> > +{
> > + u64 tmp;
> > + u32 remainder;
> > +
> > + /*
> > + * Calculate the period cycles and prescale values.
> > + * The registers are each 8 bits wide & multiplied to compute the period
> > + * using the formula:
> > + * (prescale + 1) * (period_steps + 1)
> > + * period = -------------------------------------
> > + * clk_rate
> > + * so the maximum period that can be generated is 0x10000 times the
> > + * period of the input clock.
> > + * However, due to the design of the "hardware", it is not possible to
> > + * attain a 100% duty cycle if the full range of period_steps is used.
> > + * Therefore period_steps is restricted to 0xfe and the maximum multiple
> > + * of the clock period attainable is (0xff + 1) * (0xfe + 1) = 0xff00
> > + *
> > + * The prescale and period_steps registers operate similarly to
> > + * CLK_DIVIDER_ONE_BASED, where the value used by the hardware is that
> > + * in the register plus one.
> > + * It's therefore not possible to set a period lower than 1/clk_rate, so
> > + * if tmp is 0, abort. Without aborting, we will set a period that is
> > + * greater than that requested and, more importantly, will trigger the
> > + * neg-/pos-edge issue described in the limitations.
> > + */
> > + tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC);
> > + if (!tmp)
> > + return -EINVAL;
> > +
> > + if (tmp >= MCHPCOREPWM_PERIOD_MAX) {
> > + *prescale = MCHPCOREPWM_PRESCALE_MAX;
> > + *period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX;
> > +
> > + return 0;
> > + }
> > +
> > + /*
> > + * There are multiple strategies that could be used to choose the
> > + * prescale & period_steps values.
> > + * Here the idea is to pick values so that the selection of duty cycles
> > + * is as finegrain as possible.
> > + * This "optimal" value for prescale can be calculated using the maximum
> > + * permitted value of period_steps, 0xfe.
> > + *
> > + * period * clk_rate
> > + * prescale = ------------------------- - 1
> > + * NSEC_PER_SEC * (0xfe + 1)
> > + *
> > + * However, we are purely interested in the integer upper bound of this
> > + * calculation, so this division should be rounded up before subtracting
> > + * 1
> > + *
> > + * period * clk_rate
> > + * ------------------- was precomputed as `tmp`
> > + * NSEC_PER_SEC
> > + */
> > + *prescale = DIV64_U64_ROUND_UP(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX + 1) - 1;
>
> If state->period * clk_rate is 765000000001 you get tmp = 765 and then
> *prescale = 2. However roundup(765000000001 / (1000000000 * 255)) - 1 is
> 3. The problem here is that you're rounding down in the calculation of
> tmp. Of course this is constructed because 765000000001 is prime, but
> I'm sure you get the point :-)

Hold that thought for a moment..

> Also we know that tmp is < 0xff00, so we don't need a 64 bit division
> here.

Neither here nor below, true.

> > + /*
> > + * Because 0xff is not a permitted value some error will seep into the
> > + * calculation of prescale as prescale grows. Specifically, this error
> > + * occurs where the remainder of the prescale calculation is less than
> > + * prescale.
> > + * For small values of prescale, only a handful of values will need
> > + * correction, but overall this applies to almost half of the valid
> > + * values for tmp.
> > + *
> > + * To keep the algorithm's decision making consistent, this case is
> > + * checked for and the simple solution is to, in these cases,
> > + * decrement prescale and check that the resulting value of period_steps
> > + * is valid.
> > + *
> > + * period_steps can be computed from prescale:
> > + * period * clk_rate
> > + * period_steps = ----------------------------- - 1
> > + * NSEC_PER_SEC * (prescale + 1)
> > + *
> > + */
> > + div_u64_rem(tmp, (MCHPCOREPWM_PERIOD_STEPS_MAX + 1), &remainder);
> > + if (remainder < *prescale) {
> > + u16 smaller_prescale = *prescale - 1;
> > +
> > + *period_steps = div_u64(tmp, smaller_prescale + 1) - 1;
> > + if (*period_steps < 255) {
> > + *prescale = smaller_prescale;
> > +
> > + return 0;
> > + }
> > + }

...so in your prime case above, we would initially compute a prescale
value that is too large, and then wind up hitting the test of the
remainder here, thereby realising that the smaller prescale value is a
better fit?
Perhaps that's not an acceptable way to handle the issue though.

> I don't understand that part. It triggers for tmp = 511. So you prefer
>
> prescale = 1
> period_steps = 254
>
> yielding period = 510 / clkrate over
>
> prescale = 2
> period_steps = 170
>
> yielding 513 / clkrate. I wonder why.

Because 513 > 511 & 254 > 170!
Is the aim not to produce a period that is less than or equal to that
requested? The aim of this driver is to pick a prescale/period_steps
combo that satisfies that constraint, while also trying to maximise the
"finegrainness" of the duty cycle.
The latter should be stated in a comment above.


> Alsot tmp = 511 is the only value
> where this triggers. There is a mistake somewhere (maybe on my side).

It should trigger for any value 255 * n < x < 256 * n, no?
Say for tmp of 767:
*prescale = DIV64_U64_ROUND_UP(767, 254 + 1) - 1 = DIV64_U64_ROUND_UP(3.00784...) - 1 = 3
remainder = 0.00784.. * (254 + 1) = 2

Am I going nuts? Wouldn't be the first time that I've made a hames of
things here, there are 16 versions for a reason after all.

Cheers,
Conor.

Attachment: signature.asc
Description: PGP signature