REFEREE REPORT(S): Referee: 1 COMMENTS TO THE AUTHOR(S) The authors have made a significant revision to this work, and have in particular completely rewritten the introduction. In my opinion, the new version is much clearer and adequately addresses my concerns from the previous version, and can be accepted. My feeling is that the novelty and interest of the approach taken outweigh any potential remaining technical issues. I do feel that the paper would still benefit from a little more explanation relating to point 3 in my original report, where my concern was that the authors did not seem to show that the NCD measure is a good approximation of the NID. I believe the claim in the authors' reply that performing closely enough to the Shannon limit should (via eqn (7)) imply this; however, I think that a few more words about this at the start of Sec 3 would be helpful. A couple of new typos should also be fixed: in the 2nd-to-last sentence in the abstract ("photons polarisation-entangled") and in Sec 2, para 3 (starts with truncated text "between them"). Referee: 2 COMMENTS TO THE AUTHOR(S) With the author’s revisions and responses to my comments, I now have a better understanding of the paper. The paper does develop an interesting, novel Bell inequality, and successfully tests it experimentally. However, the paper continues to have a serious problem on the theoretical side. The motivation in the first part of section 2, including Figure 1, is confused and basically irrelevant to the eventual inequalities that are important (eqs. (6) and (9)). The overall effect of this disconnect is to make the theoretical developments feel like a rushed attempt to justify publishing a CHSH-type experiment that might not be especially noteworthy otherwise. And this is unfortunate, because there really are some interesting theoretical developments here. With a complete re-write of the first half of section 2, this paper could be publishable. Successfully executing this re-write may require some new ideas for how to motivate the reader that eqs. (6) and (9) are worth testing, and it also probably would require changes to other parts of the paper to make the paper cohere with these new ideas. In my first comment, I outline why I feel this way. Subsequent comments are also important, but have the potential to be addressed by changing a few sentences locally. C1) All Bell inequalities specify the number of parties, the number of settings, and the number of outcomes. For instance, the standard CHSH inequality is a (2,2,2) Bell inequality: the 2 parties are Alice and Bob, the 2 settings are usually labelled a and a’ for Alice and b and b’ for Bob, and the 2 outcomes could be “detected in channel 1” and “detected in channel 2.” First, let’s look at the key inequality of the paper, expression (9). This is in fact a (2,2,2^N) inequality: two parties (Alice and Bob), two settings (a0 and a1 for Alice, b0 and b1 for Bob), and 2^N outcomes. There are 2^N outcomes because each of the four quantities x_a0, x_a1, y_b0, y_b1 is a bit string of length N. The natural way to test such a Bell inequality would be to randomly choose settings at Alice and Bob, record N outcomes, then randomly switch the settings, record N more outcomes, repeat, etc. Let’s call this a fixed setting, string output (FSSO) experiment. Modulo some natural assumptions (fair sampling, etc.), this turns out to be reasonably close to the actual experimental procedure that was implemented by the authors to test the inequality. Now, let’s consider Figure 1. This indicates an experiment where the settings are toggled before each detection window. This could be a natural experiment for testing a (2,2,2) Bell inequality. Alternatively, the figure also suggests considering a whole string of inputs that yields a whole string of outputs. This would require a (2, 2^N, 2^N) Bell inequality. But there is no natural way to shoehorn the experiment of Figure 1 into a test of inequality (9). Trying to do this would require you to take the input string (which, if we look only at the subscripts of the a’s, is something like 000010011101111), dividing it into its constituent 0s and 1s, and analysing the output string that corresponds to the 0s separately from the output string that corresponds to the 1s. You would never do this with the output of a Turing machine; for instance, in general the output from the input string 00001 could be something completely different than the input string 10000. The authors’ uniform complexity assumption (p.6, first paragraph) would require the output (on the 0s) to have the same Kolmogorov complexity in both cases. If we assume that the output bits in Figure 1 are, conditioned on the settings, i.i.d (ie., locally random uncorrelated noise with a possible bias towards 0 or 1), on average the Kolmogorov complexity reduces to the Shannon entropy, and applying the authors’ method to such an experiment will indeed result in a violation for appropriately chosen i.i.d. quantum states. But now the Turing machine model in Figure 1 is completely uninteresting because the Turing machine can only output noise, and furthermore the i.i.d. assumption means that the Braunstein and Caves inequality is applicable to the scenario – and a much better fit. The reason the authors’ can make the claim that they can “relax the i.i.d. assumption” is because in a FSSO experiment, you would have to make the i.i.d. assumption to each individual bit to apply the Braunstein and Caves inequality, whereas equation (9) requires no such assumption. One still needs to assume that there is some uniformity from string to string (hence the uniform complexity assumption), but the claim that the i.i.d. assumption at least can be relaxed, is justified. This is why I believe that much of section 2, from the beginning to right before equation (2), needs to be eliminated. This includes the removal of Figure 1, though equation (2) is relevant and needs to be kept. With the space this frees up, I suggest the authors spend some time thinking more about why counterfactual definiteness/uniform complexity is a natural assumption for an FSSO style experiment, and writing what they may come up with. In my opinion, an FSSO experiment is the only type of experiment for which inequality (9) is relevant. Note that many important experiments have been FSSO experiments. See Giustina et al 2013, Nature 497, 227, and Christensen et al 2013, PRL 111, 130406. As I mentioned in my first round of comments, the only other analysis I am aware of that analyzes FSSO experiments in a natural manner is Knill et al. 2015, PRA 91 032105. I recognize that this requires a major revision of the paper. If the authors believe that the argument in this comment is wrong, I suggest that they look to a different referee to justify publication of the manuscript. C2) Page 2 line 26, what is “labelling” and why do we need to worry about it in some Bell inequalities, but not others? When I hear “labelling,” to me this sounds like the process of assigning different names to different possible experimental outcomes, and clearly any Bell inequality must keep track of different possible experimental outcomes with labels of this sort. So I’m missing something. C3) page 2, line 45: The discussion of Shannon’s source coding theorem is not correct. The theorem does not refer to “infinite data strings,” but properties of finite bit strings of length N in the asymptotic limit of large N. Furthermore, the reason that “one faces a problem finding a suitable algorithm…” is not so much because theorem only holds in the asymptotic limit, but rather because Shannon’s source coding theorem is non-constructive: it proves the existence of good coding schemes, but does not indicate how to find them (short of an exhaustive search of an exponentially large set of possibilities.) C4) page 5 line 40 and line 60, again page 12 line 41. Repeated citations of [12], a paper that was recently published in Phys. Rev. A, are unnecessary and distracting. In particular, the claim that all Bell arguments use “counterfactual definiteness” – with the sole exception of the argument of [12] – is likely overblown, and even if it is true it is not especially relevant to the current paper. However, this claim is echoed each time [12] is cited. If the authors want to acknowledge [12], which does seem to have some relevance due to the use of Kolmogorov complexity (and not the claims about counterfactual definiteness), a single citation would be sufficient. C5) Page 6, top. The formulation of the “uniform complexity” assumption represents a nice improvement over the previous version of this paper. Any additional thoughtful discussion of why this a natural assumption could only help the paper (see also comment 1). C6) On 6 line 15, it is asserted that uniform complexity can in principle be tested experimentally. In fact, some of the authors’ data may address this: on page 12 line 12, uncertainty on S is reported after repeating the experiment 10 times. This means that there are 10 values of NCD(x0,y0), 10 of NCD(x0,y1), 10 of NCD(x1,y0), 10 of NCD(x1,y1). If uniform complexity holds, and C() is a good estimate of K(), then these 10 values should be somewhat uniform for each choice of NCD(xi,yj). It would be nice if the authors could report on to what degree they see the uniformity of each instance of NCD(xi,yj) over the 10 runs. C7) I still don’t understand Section 2.2.1. The first sentence asks us to consider all possible N-bit strings and the average Kolmogorov complexity therof. Why is that relevant to a Bell experiment, where some bit strings may be more likely than others? I suspect this section is trying to reduce Kolmogorov complexity to Shannon entropy by assuming an i.i.d. scenario, but I don’t follow. C8) On page 6, line 37, it is asserted that 8.6 degrees the optimal angle. It seems as though this was determined numerically by an analysis of the curve in figure 2, is this correct? Alternatively, if this figure comes from a calculation in another paper it should be cited. C9) p 11 line 19, putting the basis in a fixed setting for multiple consecutive events can be considered a feature of an experiment testing (9), as opposed to a bug. The following comments are minor and/or optional. M1) In the abstract, perhaps “polarization-entangled photons” instead of “photons polarization-entangled.” M2) p. 2 line 48, I prefer “…prove that a given compression algorithm is the best.” M3) page 3 line 12, citing [6] is OK if the authors want to keep it, but I think it is also ok to drop this citation; Gill’s work is about closing the memory loophole which is very different than what is going on in the present paper. M4) p11 line 15, it could be better to be more explicit about what fair sampling means, eg. “we assume that the set of photons that result in simultaneous detection events at both ends of the experiment represent a fair sample of all photons.” M5) p11 line 38, I feel a concatenation rather than an interleaving may have been more natural here, if we want to think of our outputs as coming in the form of a sequential output from a Turing machine. However, consideration of this alternate viewpoint is optional. I suppose it shouldn’t make a difference if the compression software is good enough. M6) p12 line 34, I don’t really know what to make of the sentence “We do not resort to statistical methods that are alien to the concept of computation,” this notion could be expanded upon, or the sentence could just be deleted. COMMENTS FROM EDITORIAL BOARD: Associate Editor Comments to the Author: The second referee maintains that there are several substantial points that must be addressed before this article can be published. Please consider carefully all of these points and revise your manuscript accordingly. Please also consider the suggestions of the first referee when revising your manuscript.