Dear Editor,

We would like to thank the referees for their very detailed comments, and tried to address their comments in an updated manuscript. As referee #3, we do believe that, in comparison to currently used schemes with extremely high rates, we present a much simpler physical setup to generate random numbers based on quantum processes, and therefore would ask you to consider our updated manuscript, where we address the points raised by the referees as follows:

1. Use of min entropy vs. Shannon entropy (point 1 by ref. 3 and point 7 by ref.2):
For comparison, we added the an expression for the min entropy (new equation 5), which leads to a estimation of 13.4 bits per sample, as compared to the 14.1 bits of Shannon entropy per sample. While this reduces the entropy entering the randomness extractor by 0.7 bits, it still is much more than the 8 bits we extract per step, even without making the i.i.d assumption for consecutive samples necessary for using the Shannon entropy, as suggested by referee 3. We also edited figure 4 to make clear what maximal probability we use to evaluate the min-entropy.

2. Comparison to other work, especially very recent faster quantum number generation rates with phase randomization (referee 3 point 2):
We did already point this out in the introduction before, but now added a reference to this fact in the summary, where we explicitly state that the advantage of our system is its simplicity in hardware. The higher rates achieved in references 17, 23 and 25 rely strongly on faster ADC converters, which could also be adopted to the scheme we present in this work. We included more recent references as well.

3. Reliance on beam splitter accuracy (referee2, points 1, 4 and 13):
First, we added a quantitative statement on the level of balancing we achieve by rotating the laser diode (which is linearly polarized, and thus can be used to provide an easily accessible balancing parameter): the resulting common mode rejection derived from the residual DC component seen at the summation node amounts to 50dB.  We did characterize the noise in classical laser fluctuations (blue trace in figure 2), and found it never to be more than 10dB above the shot noise level. Thus, the resulting residual classical noise from etalon effects as suggested by referee 2, but also due to electrical noise pickup in the laser diode itself (we point out the usual peak around 100MHz from FM radio stations) is much smaller than the electronic noise. Furthermore, the polarization drifts of laser diodes are very small due to their construction (for sure we did not measure any during the experiment), so we believe this initial alignment, together with the ability to monitor any imbalance on the DC component, makes this much less of a problem as it is perhaps suggested in the comments 1 and 4 of referee 2.

4. Simplicity of our setup in comparison with phase-diffusion RNG (point 2 of referee 2):
We respect the view of the referee to consider the physical mechanism of phase-diffusion RNG as simpler, but would like to leave this assessment to a reader; our standpoint is that we require as the only relevant optical component a single polarization beam splitter, which we also use, together with the orientation of the laser diode, as an alignment tool to achieve the balancing on the two photoreceivers. We specifically do not need any fiber launching optics, no high speed pulsed laser driver, and the necessary electronic characterization of chirp effects and the like associated with this, and the "few" fiber components, and hence retain our point that our setup has an advantage of great overall system simplicity in comparison with other schemes.

5. Allegation that we don't have vacuum entering the empty port (point 3 of referee 2):
We are not sure why the referee believes that there is a lot of light entering the dark port of the beam splitter (more accurately, in the complementary mode to the local oscillator mode). We do place a light block near this port, which we believe can be considered as part of a "safe perimeter" we should be allowed to assume if we were to follow an adversarial line of argument, as found in quantum key distribution. What the referee has perhaps in mind is residual scattered light of the local oscillator from the inactive surfaces of the beam splitter cube, but straight reflections certainly don't enter the photodetectors, and the minimum of scattered light certainly enters the detectors in a spatial mode that has very little overlap with the local oscillator mode. We did measure the noise spectrum for several power levels of the local oscillator, and find the dependency as described in Eq. (1), but believe its detailed discussion would detract from the main line of our argument in this paper. We therefore retain our position that what we measure are to a sufficiently large extent the vacuum fluctuations of the electromagnetic field. The by far dominating source of other noise is really the electronic noise of the amplifier chain.

6. Clarification about the Transimpedance amplifier (referee 2, point 5):
We now give a more detailed description of the transimpedance amplifier (AD8015,
followed by two wideband gain blocks MAR6). The gain profile of the chain pretty much follows the two noise traces in figure 2 with light on the photodiodes, so we did not explicitly show another similar trace for the gain profile. The 20MHz approximately refers to a cutoff caused by transformer and another high pass due to AC coupling components in the signal path, and we quote clearly the bandwidth range of 20-120MHz in the caption of figure 1. With the now explicitly  specified amplifier models, so it becomes clearer that the bandwidth, especially for any practical transimpedance amplifier, is of course not infinite. The problem with an independent characterization of the amplifier gain is a strong dependency on the way current is injected at bandwidths at 10-200MHz. While the uncertainty of the effective impedance (540kOhm+-118kOhm) appears rather large (about 20%), but is mostly determined by the relatively large spread of individual component properties. A more careful direct measurement would require to replace the photoreceivers with a well-defined current injection circuitry -- which we did, but there, the uncertainty about additional parasitic capacities leads probably to an even larger uncertainty. We do find that the theoretical shot noise limit falls within 1.5dB (about 40%) of what we observe, compatible with the uncertainty in the determination of the effective transimpedance. We believe that the additional technical details convince the reader that not something crazy is going on with this optical receiver, as perhaps insinuated in the referee comment, but we don't want to go overboard with a detailed characterization in the spirit of keeping the main message of our manuscript clean.

7. Residual correlations in the signal (referee 2, point 6 and point 17):
A noise spectrum is actually connected to the the correlation function by the Wiener-Kinchin theorem, which states that the correlation function is proportional to the Fourier transform of the power spectrum. We cite this now in the new manuscript to make this connection explicit. For the noise spectrum we record (roughly ranging from 20-100MHz), this results in an approximate characteristic correlation time t_c on the order of 10ns. As pointed out correctly by referee 2, the correlations do not vanish even for t>>t_c for any  finite bandwidth system. To illustrate this, we extended the correlation measurements presented in figure 3 now to a much larger (1000 times) data set. This makes these residual correlations even on longer time scales now visible, with the largest correlation at short time differences being around 10^-3. We point this out explicitly, and state that this residual correlation is much smaller than what has been observed with previously reported experiments (new references 23,36,37). This motivates our assumption in the entropy estimation that adjacent samples are to a good approximation independent. We did try a direct evaluation of the conditional Shannon entropy from the data, but hit a limit in our processing capabilities.

8. Digitizer noise (referee2, point 8): 
The specification of the noise of the digitizer we use (AD9269-65 from Analog Devices) is relatively complex: on one hand, the device quotes an equivalent input noise of 2.8lsb, on the other hand an overall nonlinearity of 2.2 to 6.5lsb. In the same way as the physical noise of the digitizer is of similar origin than the noise in the amplifier, we believe it is perfectly justified to treat it in a similar way, i.e., consider it completely as part of the electronic noise. The topic discussed in the suggested reference (our ref. 26) is a possible influence of an adversary on the noise properties of the digitizer - should this be the case, the adversary would also have the liberty to completely make up the digitizer output - probably a much larger, even a very practical concern compared to a manipulated digitizer noise: most digitizers suitable for this purpose have the ability to generate e.g. a ramp or a pseudorandom pattern to test the high speed communication with downstream electronics. Therefore, it is necessary to assume that this core element in the randomness generation process is trusted. A separate trust model for the noise of the ADC seems a bit superficial, at the very least. The other concern is a nonlinearity in the conversion process (or the amplifier chain, for that matter). We do see perhaps an aspect of this e.g. in figure 4, with the slightly non-Gaussian distribution of the noise we observe. The Gaussian model of the distribution is certainly not perfect (in the same way as any other *practical* experimental system is subject to imperfections), but the impact on the final result is assumed to be minimal. In our case, we believe, supported by the randomness tests, that the safety margin of the hashing we do in the randomness extraction should be sufficiently large to take care of that.

9. Worries about the randomness extraction process (referee 2, point 9/10):
The statement of the Shanta/Vazirani paper is true for any randomness extraction process, or hashing process, or, in a quantum key distribution context, a privacy amplification step, but it is not a reasonable argument that speaks against the mechanism to clean up the distribution. 
 To come up with a quantitatively solid set of assumptions that need to be made to make any practical randomness extractor meaningful is not trivial either. However, we feel that the comment about the "seat-of-the-pants" character of our extractor, as perhaps insinuated by the referee, is a bit off the mark. We make it very clear that the flattening is only one aspect, and it is necessary to *reduce* the the number of random bits we can extract according to the injected entropy, in the same way as it happens in any privacy amplification step in every practical QKD protocol that needs to correct for errors. In fact, one can easily see that  our extractor has the same structure as the hashing mechanisms used for privacy amplification, and can easily be viewed as a matrix method with a uniform and "sufficiently random" distribution of the (binary valued) matrix entries. 
 While we don't have yet a complete proof that our mechanism is safe, we would like to point out that we at least hint at the reasoning why we believe it is, based on our reference (now 42) by Krawczyk. There, he essentially shows that a matrix, filled with the binary values generated by a LFSR sequence can be used as a "good" static hashing matrix, which can even be known to an adversary, similar to the static matrices of even lower complexity used e.g. in the Toeplitz hashing. The low linear complexity that plagues LFSR-generated sequences and makes them unsuitable as pseudorandom number generators for most cases does not pose a problem when populating the hashing matrix. In fact, a single bit that leaves our algorithm can be seen as a product of the bit string with a sequence generated by our physical process with a single line of matrix entries generated by the LFSR sequence "stored" in the register of length 63. If we were not xor-ing the new physical random numbers into the LFSR feedback line, but continued to use this uninterrupted sequence to make up more lines of the hashing matrix, we should be fine according to the Krawczyk reference, as long as we make sure we keep the hash short enough. The point where we believe we can not just use the Krawczyk argument is that we replace the next entries of the continuous LFSR sequence with something that got xor-ed by the new physically generated random bits - a sequence that still is as uniform as a LFSR sequence, but less deterministic, and certainly "more random" than a deterministic LFSR sequence -  the typical requirement for the entries in a hashing matrix.
 This line of reasoning is of course not a proof, only a motivation to use this method. This is where the random test suite comes in, especially with the components that test for a low linear complexity in a bit stream that try to detect the predictability connected with LFSR sequences. They certainly detect the absence of a noisy signal to our extractor, because they are exactly designed for this, and also detect the absence of physical input with insufficient entropy. Now again, this is not a proof of the quality of the extractor, but a strong indication that at least the typical problems with LFSR sequences are not present in the output of our extractor. Needless to say, algorithmic randomness tests are not possible, and one can question their validity on this ground in the first place, but they certainly are able to detect the typical flaws one would suspect with bad extractors - and we don't see them. The big advantage of our method is, similar to the old Krawczyk hashing method, its extremely simple implementation. For many practical applications, being lean on technological resources is a serious advantage. Which is one of the reasons why we believe that it may be interesting also to readers from other areas of Applied Physics Letters.

10. Use of "PRNG" (referee 2, point 11): 
We agree that the use or the acronym "PRNG" for physical random number generators is misleading, as it is the established reference to pseudorandom number generators. However, the use of "True" random number generators, even if it seems to be a popular choice, seems horribly wrong - unless someone comes up with a solid definition of "true" randomness. We would feel much more comfortable to use "Hardware random number generator", and avoid introducing an extra acronym, making the distinction to the algorithm-based ones, and avoiding the pretension connected to statements about the "truth" of physical models.

11. Typos/ extra words (referee2, points 12 and 14):
The typos were corrected.

12. Equations (5-7) vs. (8-10) (referee2, point 15):
Equations (5-7) is the simplest representation of the compression algorithm that we implement - a single LFSR, seeded with a xor product of the input stream and the LFSR feedback taps. Equations (8-10) represent a parallelized version of these equations, which do exactly the same, but 16 steps in a single clock cycle. We keep those because this is what we implemented, but we would also like to keep (5-7), because they represent the same process and are easier to analyze.

13. Origin of equation (1) (referee 2, point 16):
This is the total shot noise power in the whole bandwidth B. In the heterodyne-oriented language of referee 2 this probably corresponds to the double-sided power. We measured it with a conventional spectrum analyzer, with a fixed resolution bandwidth B = 1kHz, as stated. We added a reference to early work of Walter Schottky on the origin of equation (1).


In order to make space for the min entropy discussion, the discussion on the residual correlations, and an extended comparison to other QRNG schemes, we shortened the discussion of the total random bit rate we could achieve with this system.

With this, we hope to have addressed the concerns of the referees, and look forward for your reply.

With Best Regards on behalf of all authors,

Christian Kurtsiefer