Dynamic pricing

To optimize revenues and reduce waste, Wasteless’s pricing engine employs a branch of machine learning called Reinforcement Learning. This allows our engine to quickly learn how consumers respond to dynamic pricing so it can then find the optimal discount policy.

Pricing perishables is inherently a dynamic decision.

This is because decisions made in the present affect future stock conditions. If a retailer chooses low prices today, goods will sell faster, but there might not be enough inventory for the future.

On the other hand, if prices are set too high, the retailer sells fewer goods now, but carries over more inventory into the next period.

Dynamic programming is a mathematical technique that was developed to solve these types of problems. Wasteless uses state-of-the-art methods to solve high-dimensional dynamic programming problems and automatically maps inventory stock and time of day into a series of optimal prices.

However, before it’s possible to solve these problems, our statisticians and economists must have knowledge of various parameters:

For example,

How exactly do consumers trade off freshness (expiration date) with discounts?

How do consumers respond to low inventory?

When is the next shipment of goods set to arrive?

How many units will it include, and at what cost?

What are the relationships between the products, and which products are paired with an alternative choice?

Clearly, the answers to these questions will determine the optimal set of discounts implemented by Wasteless.

Static pricing

Singlee price points


Dynamic pricing

Multiple price points

1 2 3 4 5

Reinforcement Learning

How can Wasteless learn these parameters as quickly as possible, so that prices can start being optimized?

How can the retailer trade off price experimentation with the current expected value of setting the optimal price?

Enter Reinforcement Learning (RL)

Pulled from a branch of machine learning, RL solves both the dynamic program and the learning problem. The core concept of RL is to update the parameters by using a random policy that conducts experiments more early on, but then converges with the optimal policy. The policy is trained to preserve a degree of randomness, and as such, it can update and continue to learn as consumer trends emerge and preferences change over time. This way, retailers will note a boost in revenues, improved inventory management and, of course, a significant reduction in food waste.

Ready to get started?