This is because decisions made in the present affect future stock conditions. If a retailer chooses low prices today, goods will sell faster, but there might not be enough inventory for the future.
On the other hand, if prices are set too high, the retailer sells fewer goods now, but carries over more inventory into the next period.
However, before it’s possible to solve these problems, our statisticians and economists must have knowledge of various parameters:
Singlee price points
Multiple price points
Pulled from a branch of machine learning, RL solves both the dynamic program and the learning problem. The core concept of RL is to update the parameters by using a random policy that conducts experiments more early on, but then converges with the optimal policy. The policy is trained to preserve a degree of randomness, and as such, it can update and continue to learn as consumer trends emerge and preferences change over time. This way, retailers will note a boost in revenue, improved inventory management, and of course, significantly less waste.