Research, Volume: 11( 8) DOI: 10.37532/2320-6756.2023.11(8).367
Total Rejection Sampling and the Reduction of the Wave Function
- Noureddine Kermiche
Western Digital Corporation, Irvine, CA 92612,
Received date: 22-July-2023, Manuscript No. tspa-23-107715; Editor assigned: 24-July-2023, Pre-QC No. tspa-23-107715 (PQ); Reviewed: 10-August-2023, QC No. tspa-23-107715 (Q); Revised: 12-August-2023, Manuscript No. tspa-23-107715 (R); Published: 31-August-2023, DOI. 10.37532/2320-6756.2023.11(8).367
Citation: Kermiche N. Total Rejection Sampling and the Reduction of the Wave Function. J. Phys. Astron.2023;11(8):367.
We propose an exact new sampling algorithm using classical statistical theory. The proposed method will process an infinite amount of rejected samples in parallel before accepting one exact single sample from a target distribution using a winner-take-all mechanism. The algorithm is applied to sampling from general distributions in addition to sampling from the wave function density distribution which lies at the heart of the measurement problem in quantum physics.
Exact sampling; Rejection sampling; Reduction of the wave function; Quantum measurement problem
Assuming that the reduction of the quantum wave function is a real physical process implies that we must believe that nature has found the perfect algorithm for statistical sampling from density distributions. In the double slit experiment, the locations of the scintillations appearing on the photographic plate are independent random samples drawn from a predictable density distribution derived from deterministic equations of interference patterns of propagating waves. This mysterious sampling algorithm appears to be instantaneous, does not waist extra energy, and does not suffer from the curse of dimensionality which plagues all known classical sampling algorithms. And regardless of whether the reduction of the wave function is a real physical process or not, this sampling ability is what gives quantum computers their potential real advantage over their classical counterparts.
The mysterious sampling algorithm that nature has apparently found cannot be duplicated using known classical sampling algorithms. Sampling during the reduction of the wave function is apparently 100% efficient since we always get 1 scintillation on the photographic plate for every wave-packet. The 100% efficiency requirement eliminates classical rejection sampling algorithms as a tentative explanation since rejection sampling with its rejection/acceptance rule is never 100% efficient . The locations from a sequence of scintillations are completely independent from each other which also eliminates all sequential sampling algorithms such as Markov Chain Monte Carlo (MCMC) . Quantum sampling due to the reduction of the wave function is also an exact sampling algorithm which eliminates all approximate classical sampling algorithms as possible models.
In this study, we construct a new classical sampling algorithm that tries to mimic quantum sampling. We introduce a new way of performing exact sampling from a target density distribution that works at any dimension. We show that emulating quantum sampling can be achieved by accepting one sample out of an infinite amount of rejected samples that are processed all at once. And while the processing of the infinite number of rejection samples uses only local information, choosing the winner-take-all (WTA) sample requires a non-local (global) optimization operation. Physical implementation of the WTA mechanism, as well as the noise sources that power the algorithm, are also discussed.
The algorithm is applied to sampling from general distributions and is also applied to sampling from a wave function density distribution. This study infers by analogy that it is possible that the quantum sampling algorithm is 0% efficient instead of the apparent 100% efficiency because an infinite number of samples are rejected every time a single sample is accepted from the target distribution. The appearance of 100% efficiency could be due to processing an infinite amount of non-local information both in parallel and all at once. However, this claim is contingent on assuming that the reduction of the wave function is a real physical process as mentioned previously and not simply some good “cookbook” recipe [3, 4].
II. Total Rejection Sampling
Instead of using a classical random sampling strategy, where the sampling algorithm test, accept, or reject one sample at a time, we propose a new approach where all tentative samples are accessible at the same time. We will show that a sample that is accepted from all these samples, using a minimization criterion, will be an exact sample from the target distribution.
Total rejection sampling algorithm
Theorem 1: Given a continuous probability density function that is known up to a multiplicative constant, and given a uniformly distributed random function where the random variables U(x) for different x values are identical and independently distributed, then the random variable obtained by finding the location x of the minimum in Rn of the ratio has a probability density function
Theorem 2: Given a probability density function that is known up to a multiplicative constant, and given a positive random function where the random variables X(x) for different x values are independently distributed over with identical distribution density then the random variable obtained by finding the location x of the minimum in Rn of the ratio has a probability density function
Proofs for both theorems can be found in the Appendix.
What the two theorems are advocating is the least efficient sampling algorithm possible. To extract one exact sample from a target distribution, we need to evaluate the distribution everywhere and we also need a “white noise” to be generated everywhere. The ratio between the noise and the target distribution is locally evaluated but the minimum operation to find the sample location is nonlocal (global).
Finding a physical process that can implement the proposed algorithms is a tall task due to the requirements of collecting information everywhere, the non-locality of the minimum operation, and the fact that pure “white noises” are ideal constructs that do not exist in nature.
Besides the uniform white noise used in Theorem 1 and the exponential white noise in Theorem 2, we could not generalize the proposed algorithm to other noise sources. It is possible that there are other distributions that could be used but computer simulation of many examples using other distributions proved that a general white noise will not work with the proposed algorithm.
In the experimental section of this study, we will show that a good approximation of a target distribution can be achieved using a small finite number of tentative samples and that the generated distribution will converge to the ideal target distribution as the number of tentative rejected samples increases to all x ∈Rn.
An example of a probability distribution that can be used for Theorem 2 is the sum or the squares of 2 normal random variables. For normal distributions with 0 means and unit variances, the distribution is called a Chi-Squared distribution
Chi-Squared distribution random variables can be obtained using the following method that combines the Central Limit Theorem and Fourier series of white phase noise with uniform distribution. The initial application of such an idea to generate Gaussian white noise from uniform random phases was proposed in . A complete mathematical treatment of this result can be found in .
Theorem 3: given n independent and uniformly distributed random angles and as the real and imaginary parts of the normalized discrete Fourier transform of the random phases,
will converge to independent identically distributed normal random variables with mean 0 and variance will converge to an independent identically Chi-Squared exponential distribution
The limit exponential distribution labeled is going to be used for the rest of this study. In the following we will focus on applying Theorem 2 assuming that the white noise is generated using Theorem 3.
Reduction of the Wave Function
Other ratio options and the measurement problem
Finding the location of the minimum in Rn of the ratio will give the same statistics as finding the maximum of the ratio finding the maximum in Rn of the ratio:
This new ratio is bounded between 0 and 1. For numerical or physical implementations of the algorithm, this bounded ratio is preferable, but it is not necessary.
Using the definition of probability from quantum mechanics given a unitary wave function and the white noise S(x) from Theorem 3, the bounded ratio is:
Theorem 2 guarantees that the random location that maximized this ratio will have a distribution which is exactly what the quantum measurement problem is.
Physical collapse models
To address physical implementation of the proposed method when sampling from the wave function density, we start by looking at existing physical collapse models of the reduction of the wave function. In most physical collapse models , the reduction of the wave function occurs randomly in space and at random times. The reduction is modeled as Gaussian “hits” at some random location xi. The wave function undergoes a local shrinking using the multiplication,
where d is a new constant of nature. And because the multiplication invalidates the unitary property of the wave function, the wave function undergoes a non-local normalization:
These models have a circular argument problem since xi are sampled from the same using the probability density These models do not attempt to explain the mysterious quantum sampling algorithm but merely tries to use the same mechanism to explain it. The main criticism of these models is based on an economic principle: if nature knows how to sample from anytime and anywhere, then why waste samples in empty space with all the added complications such as energy non-conservation and introduce new constants of nature instead of sampling only once at the photographic plate?
Physical implementation of winner-take-all mechanism
Given the ratio that was proposed previously with the previously defined white noise S(x):
and as stated earlier, Theorem 2 guarantees that the random location that maximizes this random ratio will have a distribution If we define for time step 0, the non-local winner-take-all operation can easily be replaced by the following recursive relation at time step i given the ratio at time step 0:
followed by the normalization step
The sequence of wave functions will converge to the Dirac delta function at the same maximum pointer with distribution which solves the measurement problem without the circular argument seen in the previous physical collapse model. However, we did not solve the non-locality issue because we just replaced the original non-local maximum operation with another non-local normalization. The new sampling proposal based on Theorems 1 and 2 is non-local by construction and finding a local shortcut may not be possible. The issue of non-locality could be similar the dilemma that is encountered every time we try to explain quantum physics using classical approaches .
Trying to use some function of as R(x) a non-linear random potential in the Schrodinger equation to induce reduction of the wave function could be an alternative idea in future studies to see if we can relax the non-local constraints of Theorems 1 and 2.
S(x) being a white noise is another complication since pure white noises, as stated previously, are ideal constructs that do not exist in nature. We could not find a way to make the algorithm work with colored noises but that does not mean that it cannot be done with some new extensions to the ideas proposed in this study.
Numerical Example for Sampling from General Distributions
In the following we are going to use Theorem 1 to sample from the following function The function is derived from the interference pattern of slits with wavelength λ. We set a = λ as the width of each slit and d = 4λ is the distance between the slits.
The function is defined for angle θ where
The sharpness of the distribution is controlled by changing the number of slits N.Io is a normalization constant. is evaluated using a small θ step size (0.0001).
Instead of using all the samples of θ as stated in Theorem 1 and construct a histogram of the location that minimizes the ratio we tried to find the minimum ratio we tried to find the minimum ratio using only 2, 4, 8, and 16 random samples in the range The quality of the obtained histogram is compared against the asymptotic condition where one sample is picked out of all samples in the range
FIG.1-3 show clearly that the generated distribution converges to the target distribution as the numbers of samples changes from 2 to 16 and as the number of samples approach the asymptotic condition of Theorems 1 where all samples are used.
Figure 1: A soft distribution for N = 1. The generated distribution is a good approximation to the target distribution after only 8 samples are used.
Figure 2: A less soft distribution for N = 2. The generated distribution is closer to the target distribution after 16 samples are used. The number of samples to approximate the target distribution is larger than the softer distribution with N=1 as seen in FIG. 1.
Figure 3: A sharp distribution for N=4. The distribution is close to the target distribution if we use a larger number of samples that the 16 seen in FIG. 2. The number of samples required to approximate the target distribution is larger for distributions with sharp and isolated peaks.
Even with a limited number of rejected samples, the efficiency of the proposed sampling is not necessarily better than existing rejection sampling algorithms. However, the new sampling algorithm has an advantage because a sample is always returned after a fixed number of rejections. The quality of the distribution improves as the number of samples increase since we know that the new sampling algorithm is an exact sampling algorithm in the asymptotic limit where all samples are used to generate tentative rejection samples.
We proposed a new way of looking at algorithmic sampling from a target distribution. We presented numerical examples to show how the proposed method can be applied to practical problems. However, applying the proposed algorithm to sampling difficult high-dimensional and sharp distributions is still a work in progress.
The proposed algorithm was applied to the measurement problem in quantum physics. We do not know if the quantum measurement problem is a real physical process or not. And even if we assume that such a problem is real, we do not know if the present study contributed to addressing the problem. However, we did introduce a new classical sampling algorithm that tries to mimic quantum sampling. The proposed algorithm is powered by white noise and uses non-local information. The algorithm physical limitations were highlighted. However, and despite the limitations, we were able to use the algorithm to address the circular argument problem that plagues most collapse theories where the random location of the “hits” that cause the reduction of the wave function are samples drawn from the same collapsing wave function that the collapse algorithm is trying to explain.
We believe that this study could open new avenues for looking at a vexing problem that has been staring at us for so long: if the reduction of the wave function is real then how does nature know how to sample from the wave function density?
Proof of theorem 1
We start by analyzing discrete distributions before generalizing to continuous distributions. Assume we have a discrete binary distribution that takes a pointer value 1 with probability p1 and pointer value 2 with probability p2 with p1 + p2 =1. U1 and U2 are independent and identically uniformly distributed random variables between 0 and 1. A new random variable Y is set to value 1 if and it is set to value 2 otherwise. Using the independence property of U1 and U2,a quick calculation will show that the probability that is equal to value 1 is:
Equivalently, we have,
The reader can verify that by symmetry that the integration operation will result in the following probability for pointer 2:
Now we are going to assume that the discrete distribution can take m possible values where is equal to some unknown constant. We could set the constant to 1 but the probability normalization is not necessary for the following calculations to yield normalized pointer probabilities. The probability of the random variable Y is equal to a pointer k between 1 and m where for all can be derived by generalizing (1) from the binary pointer case to m pointers:
Using the change of variables we can write (2) as:
We are going to show that for large m,the term gk is a constant that does not depend on pk. We use the Leibniz integral rule to compute the derivative function of gkwith respect to pk:
For large m, the product in (4) approaches the infinite product,
Since f(x) is continuous and for all x ∈Rn, we can always find x ∈Rn such that . This means that there is always a term in the infinite product that is identically equal to zero. Thus, and for large m the derivative function (4) is identically equal to 0 and gk is a constant function. And since (2) is a normalized probability, we have for large m,
And (3) can be written as,
The extension of (5) to a probability density where the random variable Y takes a continuous pointer x ∈Rn with for all is:
This completes the proof.
Proof of theorem 2
As we did in proving Theorem 1, we start the proof by analyzing discrete distributions before generalizing to continuous distributions. Assume we have a discrete binary distribution that takes a pointer value 1 with probability p1 and pointer 2 with probability p2 with p1 + p2 not necessarily normalized to 1. X1 and X2 are independent and identically positive distributed random variables with exponential density with a>0. A new random variable Y is set to value 1 if and it is set to value 2 otherwise. Using the independence property of X1 and X2 ,a quick calculation will show that the probability that Y is equal to value 1 is:
Equivalently, we have,
The integration yields the normalized probabilities:
Now we are going to assume that the discrete distribution can take m possible values where is equal to some unknown constant. We could set the constant to 1 but the probability normalization is not necessary for the following calculations to yield normalized pointer probabilities. The probability of the random variable Y is equal to a pointer k between 1 and m where for all can be derived by generalizing (6) from the binary pointer case to m pointers:
The development of the integrals yields the following normalized probability result for any number of pointers,
The extension of (8) to a probability density where the random variable Y takes a continuous pointer x ∈Rn with for all is,
Which completes the proof.
A comprehensive proof should involve measure theory when going from discrete to continuous random variables and from a discrete to a white noise that is defined everywhere. However, the applications of both theorems to numerical sampling from various target distributions provided the practical validity of both theorems.
- CPSC 540: Machine Learning [Google Scholar] [Crossref]
- An Introduction to MCMC for Machine Learning [Google Scholar] [Crossref]
- Leggett AJ. Probing quantum mechanics towards the everyday world: where do we stand?. Physica Scripta. 2002;2002(T102):69.
- Leggett AJ. Testing the limits of quantum mechanics: motivation, state of play, prospects. J. Phys.: Condens. Matter. 2002;14(15):R415.
- C. Lanczos, B. Gellai., "Fourier Analysis of Random Sequences," Comput. Math. appl.1975; 1(3-4): 269-76. [Google Scholar]
- Peligrad M, Wu WB. Central limit theorem for Fourier transforms of stationary processes.
[Google Scholar] [Crossref]
- Ghirardi G.C., Rimini A., Weber T. Unified dynamics for microscopic and macroscopic systems. Phys. rev. D. 1986;34(2):470.
- Bell JS. Speakable and Unspeakable in Quantum Mechanics. Cambridge University Press. 1987, 2004. 17. On the impossible pilot wave.
[Google Scholar] [Crossref]