To clarify, which part of my comment do you think is "almost certainly untrue"? ...

To clarify, which part of my comment do you think is "almost certainly untrue"? What I describe above is how their approach works. It's summarised in the paper. See Section 4. ("Approach"):

1. Pre-train a transformer-based language model on GitHub code with standard language modelling objectives. This model can reasonably represent the space of human coding, which greatly reduces the problem search space.

2. Fine-tune the model on our dataset of competitive programming data, using GOLD (Pang and He, 2020) with tempering (Dabre and Fujita, 2020) as the training objective. This further reduces the search space, and compensates for the small amount of competitive programming data by leveraging pre-training.

3. Generate a very large number of samples from our models for each problem.

4. Filter the samples to obtain a small set of candidate submissions (at most 10), to be evaluated on the hidden test cases, by using the example tests and clustering to pick samples based on program behaviour.

>> Since you're claiming this feat by Alphacode is comparable in difficulty to writing bubblesort (which you could write in 5 minutes), it shouldn't take you a lot of effort to produce something comparable.

What I meant was that the way they announced AlphaCode is like claiming that bubblesort is a novel approach to sorting lists. Not that the effort needed to create their system is comparable to bubblesort. I think if you read my comment again more carefully you will find that this is the first interpretation that comes to mind. Otherwise, I apologise if my comment was unclear.