Show HN: Neural-hash-collider – Find target hash collisions for NeuralHash

anishathalye · on Aug 19, 2021

The README (https://github.com/anishathalye/neural-hash-collider#how-it-...) explains in a bit more detail how these adversarial attacks work. This code pretty much implements a standard adversarial attack against NeuralHash. One slightly interesting part was replacing the thresholding with a differentiable approximation. I figured I'd share this here in case anyone is interested in seeing what the code to generate adversarial examples looks like; I don't think anyone in the big thread on the topic (https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...) has shared attack code for NeuralHash in particular yet.

IncRnd · on Aug 19, 2021

Nicely done. Thank you for sharing.

I'd like to share the following paper for anyone else who may be interested. It is about watermarking rather than a preimage attack.

"Adversarial Embedding: A robust and elusive Steganography and Watermarking technique" https://arxiv.org/abs/1912.01487

Unfortunately, the existence of invisible watermarking demonstrates a separate attack on the hash. Instead of a preimage attack, this might be able to change the hash of an image that is suspected of already being a match. A true-positive would be changed into a false-negative.

GistNoesis · on Aug 19, 2021

Well done ! Here is my version that uses Scipy lbfgs-b optimizers : https://gist.github.com/unrealwill/d64d653a7626b825ef332aa3b...

dang · on Aug 19, 2021

Ongoing related threads:

Apple defends anti-child abuse imagery tech after claims of ‘hash collisions’ - https://news.ycombinator.com/item?id=28225706 - Aug 2021 (401 comments)

Hash collision in Apple NeuralHash model - https://news.ycombinator.com/item?id=28219068 - Aug 2021 (662 comments)

Convert Apple NeuralHash model for CSAM Detection to ONNX - https://news.ycombinator.com/item?id=28218391 - Aug 2021 (177 comments)

(I just mean related to this particular project. To list the threads related to larger topic would be...too much.)

vermilingua · on Aug 19, 2021

The integrity of this entire system now relies on the security of the CSAM hash database, which has just dramatically increased in value to potential attackers.

All it would take now, is for one CSAM hash to be known to the public, then uploading collided iPhone wallpapers to wallpaper download sites. That many false positives will overload whatever administrative capacity there is to review reports in a matter of days.

Matheus28 · on Aug 19, 2021

There's no need for someone to get the entire CSAM database. If they go on the darknet and just find enough images (or hashes) that would trip Apple's system, that would be enough. I'd assume any publicly available image on the darknet would likely also be on CSAM.

copperx · on Aug 19, 2021

Exactly. It would be trivial for anyone to compile a list of possible known CSAM hashes. It would be illegal to do so, but only one person has to do it, and then the list of probably positive hashes can be distributed legally around the web.

gnulinux · on Aug 19, 2021

Human doesn't need to see the material. One can automate this by crawling websites with known bad material and compute hash of all images. Surely some of those hashes are CSAM, we don't know which. But it's still equally dangerous.

ec109685 · on Aug 19, 2021

No, there’s another private hash function that also has to match the known CSAM image for an image to be considered a match.

That one can’t be figured out through this technique.

vermilingua · on Aug 19, 2021

But surely if they're doing this in anticipation of E2EE user data, that procedure becomes moot?

So either they have no intention to actually protect user data, or the system is trivially broken; either way a pretty damning look for Apple.

comex · on Aug 19, 2021

No, because the second hash is only performed after the first hash has already matched, and so, using the threshold secret sharing crypto, the server has learned the escrowed encryption key for the image. (Well, for its “visual derivative”, at any rate.)

vermilingua · on Aug 19, 2021

So, Apple retains the ability to decrypt E2EE data? That’s… worse?

ec109685 · on Aug 19, 2021

Only E2EE data that matches known CSAM hashes, and only when 30 matches have been found, and only the “visual derivative” of those images.

Syonyk · on Aug 19, 2021

They couldn't possibly be using the "other perceptual hash algorithm commonly used for this stuff," PhotoDNA, could they?

I mean, hopefully not, but at this point, it's reasonable to call just about everything into question on the topic.

ec109685 · on Aug 19, 2021

Their goal is to minimize false positives, so would not be in their best interest.

M4v3R · on Aug 19, 2021

Which means all it takes now is one disillusioned Apple employee to leak the details of thay private hash function and the whole system is compromised.

jdlshore · on Aug 19, 2021

One disillusioned employee that has access to the secret algorithm.

simondotau · on Aug 19, 2021

Before they make it to human review, photos in decrypted vouchers have to pass the CSAM match against a second classifier that Apple keeps to itself.

Fnoord · on Aug 19, 2021

To assume CP is reviewed manually is simply wrong. You don't want to put such weight on an individual. You want to automate it as much as possible, with as little false positives (and false negatives) as possible.

For example, in case of a wallpaper, let's say its the Windows XP wallpaper. There's no human skin color in it at all, so you can easily be reasonably sure it isn't CP. You would not need an advanced ML for such.

And they can have multiple checksums, just like a tarball or package or whatever can have an CRC32, MD5, and SHA512. Just because one of these matches, doesn't mean the other don't. Only problem is keeping these DBs of hashes secret. But that could very well be a reason the scanning isn't done locally.

user-the-name · on Aug 19, 2021

Apple explicitly states that it is reviewed manually.

vermilingua · on Aug 19, 2021

To assume it’s never reviewed manually is absolutely terrifying.

Fnoord · on Aug 19, 2021

That is not what I asserted. I asserted that it is only done when other options are exhausted.

romeovs · on Aug 19, 2021

A lot has been said about using this as an attack vector by possibly poisoning a victims iPhone with an image that matches a CSAM hash.

But could this not also be used to circumvent the CSAM scanning by converting images that are in the CSAM database to visually similar images that won't match the hash anymore? That would effectively defeat the CSAM scanning Apple and others are trying to put into place completely and render the system moot.

One could argue that these spoofed images could also be added to the CSAM database, but what if you spoof them to have hashes of extremely common images (like common memes)? Adding memes to the database would render the whole scheme unmanageable, no?

Or am I missing something here?

So we'd end up with a system that: 1. Can't be reliably used to track actual criminal offenders (they'd just be able to hide) without rendering the whole database useless. 2. Can be used to attack anyone by making it look like they have criminal content on their iPhones.

chongli · on Aug 19, 2021

Or am I missing something here?

Wouldn't it be easier for offenders to avoid Apple products? That requires no special computer expertise and involves no risk on their part.

romeovs · on Aug 19, 2021

That would be more effective! But that's not something Apple is trying to solve here (or could ever solve).

They are trying to prevent CSAM images from being stored and distributed using Apple products. If that goal is easily circumvented, the whole motivation for this (anti-)feature becomes invalid.

I a way, this architecture could potentially even make Apple products more attractive for CSAM distributors, since they now have a known way to fly under the radar (something that is arguably harder/riskier on other image sharing platforms, where the matching happens server-side).

One reasonable strategy Apple could have against that is through constantly finetuning the NeuralHash algorithm to hopefully catch more and more offenders. If that works reasonably well, it might deter criminals from their platform because an image that flies under the radar now might not fly under the radar in the future.

NB. I'm not trying to say Apple is doing the right thing here, especially since the above arguments put the efficacy of this architecture under scrutiny.

nonbirithm · on Aug 19, 2021

> They are trying to prevent CSAM images from being stored and distributed using Apple products.

If what Apple is aiming for is a more complete version of E2EE on their servers, maybe that's just an unintended consequence of the implementation, and the very reason why they're surprised that this received so much pushback. If Apple wanted to offer encryption for all user files in iCloud and leave no capability to decrypt the files themselves, they'd still need to be able to detect CSAM to protect themselves from liability. In that case, scanning on the device would be the only way to make it work.

If that were the case, I still wouldn't believe that moving the scan to the device fundamentally changes anything. Apple has to conduct a scan regardless, or they'll become a viable option for criminals to store CSAM. But in Apple's view, their implementation would mean they'd likely be the first cloud company that could claim to have zero knowledge of the data on on their servers while still satisfying the demands of the law.

Supposing that's the case, maybe what it would demonstrate is that no matter how you slice it, trying to offer a fully encrypted, no-knowledge solution for storing user data is fundamentally incompatible with societal demands.

But since Apple didn't provide such an explanation, we can only guess what their strategy is. They could have done a lot better job at describing their motivations, instead of hoping that the forces of public sentiment would allow it to pass like all the other scanning mechanisms actually had in the past.

the_other · on Aug 19, 2021

Allegedly other image sharing services already do CSAM detection after the upload. Switching handset OS removes any image-level defensive capability the abusers may employ, so they might face a higher risk using these other services.

argvargc · on Aug 20, 2021

What we're describing at this point is effectively the same as a system of automatically flagging users as potential criminals based on something as manipulable as a filename.

xucheng · on Aug 19, 2021

In addition to the attacks, such as converting legit image to be detected as CSAM (false positive) or circumventing detection of the real CSAM image (false negative), which have been widely discussed in HN, I think this can also be used to mount a DOS attack or to censor any images.

It works like this. First, found your target images, which are either widely available like internet memes for DOS attack or images you want to censor. Then, compute their Neuralhash. Next, use the hash collision tool to turn real CSAM images to have the same NeuralHash as the target images. Finally, report these adversarial CSAM images to the government. The result is that the attackers would successfully add the targeted NeuralHash into the CSAM database. And people who store these legit image will then be flagged.

iaw · on Aug 19, 2021

Really naive question. What's to stop apple from using two distinct and separate visual hashing algorithms? Wouldn't the collision likelihood decrease drastically in that scenario?

Again, really naive but it seems like if you have two distinct multi-dimensional hashes it would be much harder to solve the gradient descent problem.

63 · on Aug 19, 2021

I'm fairly sure they do, actually. It was in one of the articles earlier today that Apple has a distinct, secret algorithm they perform on suspected CSAM server side after it gets flagged by the client side neural hash. Then only after 30 such images from a single user are identified as CSAM by both algorithms will they be sent to a human reviewer who will confirm their contents. Then, finally, law enforcement will be alerted.

There has been a lot of hyperbole going around and the original premise that this is a breach of privacy is still true, but in my opinion the actual repercussions of attacks and collisions are being grossly exaggerated. One would have to create a collision with known CSAM for both algorithms (one of which is secret) which also overlaps with a legal porn image that could be misconstrued as CSAM by a human reviewer, or at the very least create and distribute hundreds of double collisions to DOS the reviewers.

blintz · on Aug 19, 2021

But relying on an algorithm staying secret is security-by-obscurity 101. You can rely on a cryptographic key staying secret; you can't rely on the design of an algorithm staying secret (I do agree there's a little blurring of these lines with large, trained models, but the gist remains - you can't just hope that nobody sees the structure/weights of your second hash function).

MrMoenty · on Aug 19, 2021

Given that training the same network (with the same structure) will result in different weights and different hashes with high probability, I would argue that the weights actually have all the important properties of a secret key. You would just need to treat them as one operationally, i.e., make sure as few people as possible have access, and use truly random instead of pseudorandom numbers during training.

user-the-name · on Aug 19, 2021

Why can you rely on a cryptographic key staying secret, but not on a trained model staying secret? Both are pieces of binary data that you keep on your own machines and do not give to others.

They are exactly the same case.

saithound · on Aug 19, 2021

There's a massive difference. In fact, the data describing the trained model is not at all analogous to a cryptographic key.

A cryptographic key is a piece of information which, as long as it remains secret, should be sufficient to protect the confidentiality and integrity of your system. This means that your system should remain secure even if your adversary knows everything else apart from the key, including the details of the algorithm you use, the hardware you have, and even all your previous plaintexts and ciphertexts (inputs and outputs). If the key fails to have this property, your cryptosystem is broken.

The trained model (or the weights of a NN) does not have this property at all. Keeping the model secret does not ensure the confidentiality or integrity of the system. E.g. just knowing some inputs and outputs of the secret model allows you to train your own classifier which behaves similarly enough to let you find perceptual hash collisions. If you treat your model as a cryptosystem, this would be a known-plaintext attack: any system vulnerable to these is considered completely and utterly broken.

You'd have to keep all of the following secret: the model, all its inputs, all its outputs. If you manage to do that, this might be secure. Might. But probably not. See also Part 2 of my FAQ, which happens to cover this question. [1]

[1] https://news.ycombinator.com/item?id=28232625

user-the-name · on Aug 19, 2021

> E.g. just knowing some inputs and outputs of the secret model allows you to train your own classifier which behaves similarly enough to let you find perceptual hash collisions.

This seems highly unlikely. You could train a model to find those exact known hashes, but I highly doubt you could get it to accurately find any other unknown hash.

> You'd have to keep all of the following secret: the model, all its inputs, all its outputs.

These are all, in fact, secret.

saithound · on Aug 19, 2021

> This seems highly unlikely. You could train a model to find those exact known hashes, but I highly doubt you could get it to accurately find any other unknown hash.

Your "highly doubt" is baseless. Black box attacks (where you create adversarial examples only using some inputs and outputs, but not the model) on machine learning models are not new. They have been demonstrated countless times [1]. You don't need to know the network at all.

> These are all, in fact, secret.

This is not the case, since regular, unprivileged Apple employees can and will look at the inputs and outputs of the model (the visual derivatives and their hashes). It's also irrelevant.

You insist that there is some kind of analogy between "keeping the model secret" and keeping a "cryptographic key" secret. There is no such analogy. It makes no sense. It is simply not there. Keeping a cryptographic key secret keeps confidentiality and integrity. Keeping your model secret accomplishes neither of these.

[1] https://towardsdatascience.com/adversarial-attacks-in-machin...

user-the-name · on Aug 19, 2021

> Your "highly doubt" is baseless. Black box attacks (where you create adversarial examples only using some inputs and outputs, but not the model) on machine learning models are not new. They have been demonstrated countless times [1]. You don't need to know the network at all.

This is not a machine learning model as such, though, and is used differently than they are.

> This is not the case, since regular, unprivileged Apple employees can and will look at the inputs and outputs of the model

Can they?

londons_explore · on Aug 19, 2021

Security by obscurity has protected a lot of things successfully...

mikeyla85 · on Aug 19, 2021

In this context, security by obscurity is literally what protects private keys.

silvestrov · on Aug 19, 2021

> It was in one of the articles earlier today that Apple has a distinct, secret algorithm they perform on suspected CSAM server side

But then they still need to upload the original image to the server, and what was the reason for doing the scanning client-side then when they still upload it?

Thorrez · on Aug 19, 2021

>and what was the reason for doing the scanning client-side then when they still upload it?

Probably so that China cannot force Apple to hand over arbitrary images in iCloud (or all images in iCloud). With Apple's design the only images China can get from you are malicious images that people send you. If Apple scanned every image serverside without any clientside scanning, then theoretically China could get all newly-uploaded iCloud images.

ec109685 · on Aug 19, 2021

They do: https://news.ycombinator.com/item?id=28230029

They also keep the second hash function private.

zepto · on Aug 19, 2021

They do. The system isn’t vulnerable to these collisions attacks. The people saying they are are just not aware of how the system works.

heavyset_go · on Aug 19, 2021

It's common for two unrelated images come up as false positives when comparing hashes across different unrelated perceptual hashing methods.

yoz-y · on Aug 19, 2021

To me the most interesting findings from this fiasco were:

1. People actually do use generally publicly available services to store and distribute CP (as suggested by the amount of reports done by Facebook)

2. A lot of people evidently use iCloud Photo Library to store images of things other than pictures they took themselves. This is not really surprising, I've learned that the answer of "does anybody ever?" questions is aways "yes". It is a bit weird though since the photo library is terrible for this use case.

signal11 · on Aug 19, 2021

> 2. A lot of people evidently use iCloud Photo Library to store images of things other than pictures they took themselves

Not in the context of CSAM, but this is iOS’s “appliance” user interface coming back to bite it. The iOS photos app doesn’t appear to have a way to show the user only the photos they took.

Apps like Twitter, browsers, chat apps, screenshots all get to save their photos in the photo library. I believe iOS 15 has a way to filter photos by originating app, but for most users currently, it’s hard not to use iCloud Photo Library for photos I didn’t take myself.

Interestingly, users save chat history including images, from apps like iMessage and WhatsApp, on iCloud too. I’m not sure what happens to e2e encryption for backed up data.

yoz-y · on Aug 19, 2021

I also remember that WhatsApp has a default where it stores any viewed message in conversation in the library.

signal11 · on Aug 19, 2021

Yes, originally WhatsApp used to auto-save images received in chats into the photo library. But they added a setting for that years ago, not sure what the default is now.

corgihamlet · on Aug 19, 2021

My iCloud Photo Library is full of pictures saved from the Web and screenshots.

That is just the way it works on iOS and it is really annoying to have random cute dog pictures saved from Reddit crop up in "Your Furry Friends"-Compilations.

yoz-y · on Aug 19, 2021

Before Files I've never actually saved any pictures and painstakingly routinely deleted screenshots.

The first is now a bit easier since the share sheet also has "save to files" (only some of the time though, for no apparent reason). The second is a bit easier as there at least is a Screenshot automatic album.

But yes, I see why people do this, but I wish apple provided a better way to not pollute the iCloud library.

giantrobot · on Aug 19, 2021

> (as suggested by the amount of reports done by Facebook)

The number of reports is not the number of actual incidents. It could be Facebook's algorithms are really shitty and has millions of false positives. NCMEC and similar organizations like to brag about numbers of reports because Big Number Good.

ThinBold · on Aug 19, 2021

Despite that Apple scanning our images is a horrible privacy practice, I don't get why 𝚜̶𝚘̶ ̶𝚖̶𝚊̶𝚗̶𝚢̶ some people think this is an ineffective idea.

Surely you can easily fabricate innocent images whose NeuralHash matches the database. But in what way are you going to send them to victims and convince them to save them to their photo library? The moment you send it via WhatsApp FB will stop you because (they think) it is a problematic image. And Even if the image did land, it has to look like some cats and dogs or the receiver will just ignore. (Even worse, the receiver may report you.) And even if your image does look like cats and dogs, it has to pass another automatic test at the server side that uses another obfuscated, constantly-updating algorithm. After that, even more tests if Apple really wants to.

That means your image needs to collide ≥ three times, one open, one obfuscated, and one Turing.

Gmail scans your attachments and most people are cool with it. I highly doubt that Apple has any reason to withdraw this.

undecisive · on Aug 19, 2021

It's a good question, and unfortunately you're probably right.

It boils down to this: If you can prevent [some organisation] from potentially destroying civilisation, how much effort would be too much effort, and how much uncertainty is too much uncertainty?

For most, there's a trade off. If someone believes that the technology is sufficient for any country to implement a brutal civil rights destruction campaign, and that this is 50% likely, and all you need to do is upload some harmless images to your icloud to thwart it, why wouldn't you? For example, maybe a certain political party could regain power in a massive way and start doing away with gay rights, locking up anyone with pictures of men kissing each other on their phones. Of course, other countries have other types of extremists that look like they might take over government, or have already taken over governments. In those countries, tools like this are already in place, and this new one could be very powerful.

So if you could upload some innocuous images and save hundreds of lives in 20 or more countries in the world, even if you don't fear for your own country, would you?

Assuming you said yes, the question is, could everyone who agrees give Apple enough false positives, force enough human moderators to have to inspect the images, that the whole scheme becomes financially un-viable on Apple's end?

Of course, the only thing we could do here is slow this "progress" down. Maybe we can use that extra time to share the message that "this kind of technology is not ok" and that being naive about what this tech will be used for is almost guaranteed to kill more people than it saves.

But while so few people are thinking of it in these terms, or if everyone believes the attempt to be futile, it can't work. It's like anything - by the time people realise there's a problem, it's often too late to fix the problem.

ThinBold · on Aug 19, 2021

  save hundreds of lives, would you?

Sure, why not.

But let me ask some questions, because at this point I am not sure if people want Apple's system to be robust or jammable. If our fear is that Apple will tune the system to detect pictures of two men kissing, wouldn't an easily jammable system works in our favor because we can DDoS it or threaten to do so anytime we want?

undecisive · on Aug 26, 2021

It's a tricky one, certainly.

There is probably a graph somewhere showing how much effort to fix is too much effort, vs how much bad will in the community this project is inducing, vs how much value this project has.

So no, Apple's system being jammable is a great "booby prize" right up to the point where Apple fix the algorithms, or the chinese government start reporting these false positives as bugs and saying that Apple must fix the bugs before their devices can be sold in China.

And so, one has to assume that if the algorithms can be fixed, they will be fixed. If we can DDoS Apple's human checking capability while it's still young, we might be able to prevent more resource being sunk into it - though I agree, that's unlikely. If we can do all that AND make it clear that this is going to cause nothing but bad will, and if we can get enough governments to regulate against it, then there may be hope.

So your question was, why do people think this software isn't going to work, and the answer is we really hope it isn't going to work and really hope that enough people get onboard with the efforts to subvert it and really hope that the message sinks in to Apple that this was a doomed project that they should abandon for good.

But with enough money and time, there is no problem that can't be overcome. So even if it is a currently jammable child protection system, in the future is will almost definitely become a robust human rights violation system. In the space between those two points there is hope. Slim hope, but hope nonetheless.

asxd · on Aug 19, 2021

I really appreciate this comment, as anytime a new security issue creates a fuss, I feel like I'm the only one wondering what the real attack vector is. I'm genuinely glad people are so thoroughly investigating this new Apple policy, but at the same time I feel like I'm the only one dumb enough not to understand what I should be actually concerned about.

zo1 · on Aug 19, 2021

It's all theater. We need to stop talking about it and get trusted security researchers picked at random to be deployed to Apple for an audit. They sign an NDA, they work in a clean room with no way to exfiltrate data, they get full access to all the algorithms, source code, trained networks, all test data, access to Apples infra to test as they please, etc. And then they need to be the definitive authority on this matter to give us info and suggestions. Not Apple, not PR departments, not the EU parliament, not the US gov. And most certainly not us.

copperx · on Aug 19, 2021

Almost nobody is arguing the effectiveness of the idea. That would be missing the point entirely.

ThinBold · on Aug 19, 2021

The first comment algorithm choses to show me contains substring "and this system is transparently broken". Isn't that a remark on the effectiveness of the idea or I misread?

(In case you have problem with me saying "so many", that I can fix.)

sennight · on Aug 19, 2021

> I don't get why 𝚜̶𝚘̶ ̶𝚖̶𝚊̶𝚗̶𝚢̶ some people think this is an ineffective idea.

Because such architectural flaws become absolute train wrecks when scaled. Remember the Clipper Chip? This is like that: cryptographers pointing out fundamental flaws that may seem like minor issues to most of the users who were going to be compelled to use it - but at scale those flaws result in the direct opposite of the stated objectives.

It feels weird having to explain scalability on HN... everyone here should know that if your little scheme is struggling pre-rollout then trying to power through will only magnify your troubles. So it is hard to account for that blind spot that defenders of this thing seem to have.

ThinBold · on Aug 19, 2021

Unless Apple implements this with a backdoor on iCloud, the worst case scenario here is that they receive millions of false positives per day and terminate the program after two weeks.

The scalability issue seems to work in our favor because, perhaps, the normal usage will overwhelm the human reviewers Apple prepared and we don't even need to send troll images.

And for the entire time, our data remains untouched.

sennight · on Aug 19, 2021

So Apple customers go from paying for a status symbol luxury item to paying for the privilege of participating in this, at best, pointless exercise wherein they rely upon the goodwill and competence of Apple's employees to not get them swatted. The mental gymnastics needed in order to guard one's ego on this issue can't be healthy either.

t-writescode · on Aug 19, 2021

Some people sync iMessages. That might even be the default.

istingray · on Aug 19, 2021

I believe being downvoted because this is thoroughly covered in the thread. Suggest you read it all again.

ThinBold · on Aug 19, 2021

I did my best skimming comments the algorithm showed me, including related posts' comments. But man am I bad at reading comprehension.

If I miss anything then surely I am willing to be corrected. But so far I don't see comments that show us how to penetrate the four-layer system (local hash check, semantic check by user, on-server hash check, and human reviewer).

istingray · on Aug 19, 2021

Ah ok! Here's the relevant part of the thread for that. https://news.ycombinator.com/item?id=28229832

ThinBold · on Aug 19, 2021

That NSFW picture is, in honor of Sean Lock, a challenging wank

Alupis · on Aug 19, 2021

So, what does Apple get out of all this, except negative attention, erosion of their image, possible privacy lawsuits, etc?

I just don't understand what Apple's motivation would have been here. Surely this fallout could have been anticipated?

Mistletoe · on Aug 19, 2021

Is it like other stories I hear on HN where one guy or team is trying to get a promotion so keeps pushing their project? And people were afraid to oppose it? I’m baffled how this kept going up to implementation even in the birthplace of the reality distortion field.

Alupis · on Aug 19, 2021

Something this major, company brand changing, would have required a lot of executive sign-offs, presumably even Cook's personal blessing.

rebuilder · on Aug 19, 2021

Cynical speculation:

Apple have decided their position of not being able to provide access to law enforcement is becoming a liability. They're probably under intense pressure from several governments on that front.

This is a way to intentionally let their hand be forced into scanning for arbitrary hashes on devices at the behest of governments, taking pressure off Apple and easing their relations with governments. They take a PR hit now, but it's not too bad since it's ostensibly about fighting child abuse, and Apple's heart is clearly in the right place. When later, inevitably, the hashes start to include other material, Apple can say their hands are tied on the matter - they can no longer use the "can't do it" defense and are forced to comply. This is much simpler than having to fight about it all the time.

gtm1260 · on Aug 19, 2021

My guess is that internally, they've realized they have a big CP problem on iCloud. That's a huge liability.

Alupis · on Aug 19, 2021

I have series doubts about that. CSAM really isn't an issue in the US, culturally and legally.

quenix · on Aug 19, 2021

Really? It isn't an issue?

bruce343434 · on Aug 19, 2021

It is?

Spivak · on Aug 19, 2021

The FBI off their back that they aren’t doing enough to stop the spread of CP.

Alupis · on Aug 19, 2021

I find it hard to believe CSAM was so pervasive on iDevices that they'd feel compelled to do something about it.

As far as we know (and I'm sure lots of eyeballs are looking now) Android doesn't do this.

And frankly, why would Apple care that the FBI isn't cozy with them. Their entire brand is "security and privacy", kind of goes against most 3 Letter Agencies anyway.

nonbirithm · on Aug 19, 2021

At least according to [1]:

"Last year, for instance, Apple reported 265 cases to the National Center for Missing & Exploited Children, while Facebook reported 20.3 million, according to the center’s statistics. That enormous gap is due in part to Apple’s decision not to scan for such material, citing the privacy of its users."

If you were a law enforcement agency and noticed this discrepancy, would you believe that you'd be letting some number of child abusers get away because of that difference in 20 million reports? iCloud probably doesn't have the same level of adoption as Facebook, but the gap is still very large.

[1] https://www.nytimes.com/2021/08/05/technology/apple-iphones-...

simondotau · on Aug 19, 2021

I agree CSAM isn't likely to be pervasive in the photo libraries on iOS devices.

Android does not do on-device scanning, but Google does scan photos after they are uploaded to their cloud photo service. It's not on-device scanning, but the effect is functionally identical: photos that are being uploaded to the cloud are being scanned for CSAM. The only real distinction is who owns the CPU which computes the hash.

I doubt it's the FBI pressuring Apple. My suspicion is it's fear of the US Congress passing worse, even more privacy-invading laws under the guise of combating CSAM. If Apple's lobbyists can show that iPhones are already searching for CSAM, arguments for such laws get weaker.

Alupis · on Aug 19, 2021

> Android doesn't do on-device scanning, but Google does scan photos after they are uploaded to their cloud photo service.

So did Apple, and pretty much all cloud hosting providers.

This, on device, scanning is what's new, and very out of character for Apple.

> If Apple's lobbyists can show that iPhones are already searching for CSAM, arguments for such laws get weaker.

I'm not aware of any big anti-CSAM push being made by Congress. CSAM just isn't really a big issue in the US, the existing laws, and culture, are pretty effective already.

nonbirithm · on Aug 19, 2021

CSAM can never be a policy issue with two sides, because everyone is in agreement that we need to protect children. The higher powers want to prevent child abuse, and CSAM is directly tied to child abuse. When people argue that "think of the children" can be weaponized to attack their freedoms, they wouldn't dare try to argue against the premise that children are harmed because of CSAM - not because the arguments will fall on the deaf ears of some governmental agents trying to push an agenda, but because the premise itself is sound.

As a result, people will focus their arguments instead on the technological flaws in the current implementation of on-device scanning or slippery slope arguments that are unlikely to become reality, the feature will be added anyway with no political opposition, and in the end Apple and/or the government will get what they want, for what they consider the greater good.

I think that absolute privacy in society as a whole isn't attainable with those values in place, and it raises many questions regarding to what extent the Internet should remain free from moderation. Are there really no kinds of information that are so fundamentally damaging that they should not be allowed to exist on someone's hard drive? If not, who will be in control of moderating that information? Maybe we will have to accept that some tradeoffs between privacy and stability need to be made for the collective good, in limited circumstances.

eivarv · on Aug 19, 2021

There is a lower limit to privacy (as a human right) – which after passing, societies would seize to be "free" (liberal democracies?). But that's not a discussion people seem to want to have, when talking about their good intentions of fighting against horrible things.

simondotau · on Aug 19, 2021

> I'm not aware of any big anti-CSAM push being made by Congress.

Right now. The best time for Apple to do this is when cannot be painted as a defensive move against any specific legislation. The CSAM argument has been used many times in the past and it's certain to be used many more times in the future.

sgent · on Aug 19, 2021

Apple did not scan uploaded images. Apple has never scanned iPhone images. Last year Apple reported 245 cases to missing and exploited children, and Facebook reported 50M, Google ~4M, Microsoft reported, and Apple was below the line.

https://www.nytimes.com/2020/02/07/us/online-child-sexual-ab...

simondotau · on Aug 19, 2021

We do know that Apple has been scanning email attachments sent via iCloud email. I don't think it's ever been claimed that Apple has ever scanned anyone's iCloud Photo Library.

Ethics aside, on-device scanning has the benefit of Constitutional protection, at least in the USA. Because the searching is being performed on private property, any attempt by the Government to try to expand the scope of searches would be a clear-cut 4th Amendment violation.

(Whereas if the scanning is done in the cloud, Government can compel searches and that would fall under the "third party doctrine" which is an end-run around the 4th Amendment.)

Alupis · on Aug 19, 2021

> Because the searching is being performed on private property,

It's it though? The device someone bought 2 years ago suddenly starts reporting them to the FBI's Anti-CSAM unit without the owners realistic consent does seem like a run-around to unpermissioned government searches. It's not reasonable to say "throw away your $1200 device if you don't consent", is it? Nor can a person reasonably avoid iOS updates that force this feature to be active.

> any attempt by the Government to try to expand the scope of searches

We've seen private companies willfully censor individuals at the government's behest under the current administration - will Apple begin expanding the search and reporting mechanisms just to stay in whatever administration's good grace?

Like I said, this is extremely out of character, and very off-brand for Apple. Why would someone trust Apple going forward? Even Google's Android doesn't snitch on it's owners to law enforcement... Setting aside all the ways for nefarious actors to abuse this system and sic LE on innocent individuals.

simondotau · on Aug 19, 2021

> It's it though?

Yes. Your phone is your private property, just like your house or your car. Searching your private property requires a warrant or reasonable suspicion, otherwise it's a 4th Amendment violation.

This twitter thread is worth a read.

https://twitter.com/pwnallthethings/status/14248736290037022...

copperx · on Aug 19, 2021

The 4th amendment only protects you from the government searching your property. Otherwise, the Microsoft telemetry which reports back to Microsoft what software you have installed and what apps you are running would be illegal.

Alupis · on Aug 19, 2021

So, what does a person do if they do not consent to this search? Tough?

You can't realistically avoid the iOS update. Apple has effectively given consent on your behalf... How will that fly?

simondotau · on Aug 19, 2021

If you do not consent to having your photos scanned for CSAM, turn off iCloud Photo Library. Same as how you opt out of CSAM scanning of your photo library on Android.

If you're concerned about other forms of scanning compelled by the Government, you never consented to the search. So even if Apple complied, the search is invalid and cannot be used to prosecute you.

torstenvl · on Aug 19, 2021

> If you're concerned about other forms of scanning, you didn't consent—so even if Apple complied, the search is invalid and cannot be used to prosecute you.

This is a dangerously false understanding of the law. Stop giving legal advice. You are not a lawyer.

simondotau · on Aug 19, 2021

Are you saying that if the US Government compelled Apple to scans millions of citizen's private property for non-CSAM images, this would not be a clear-cut violation of the Fourth Amendment?

I'm curious, do you think that the Third Party Doctrine applies here?

Alupis · on Aug 19, 2021

I think the realistic danger here is the US Government no longer needs to compel this type of activity. Reference Twitter and Facebook/Instagram voluntarily censorship per mere suggestion of the current administration/power party.

simondotau · on Aug 19, 2021

Let's be practical for a minute. What specific image would Apple voluntarily search for on behalf of the US Government? I sincerely can't think of anything.

Alupis · on Aug 19, 2021

Images, leaked government files, anti-administration phrases, unflattering memes of the president, statements that contradict the government's current stance, etc.

All things current at social media companies seem willing to censor after suggestion of the administration.

simondotau · on Aug 19, 2021

You seriously think Apple would voluntarily search private devices for images which aren't illegal and don't even hint at any action which is illegal?

I don't think you're being serious.

Alupis · on Aug 19, 2021

Why not? Facebook and Twitter have done exactly that in the past year. Why is it far fetched for Apple suddenly, given this amazing reverse-course on branding?

The only realistic alternative to Apple is Android... And Google is pretty darn transparent in their spying on users. Apple just did a 180 degree about-face on all the branding they've built over the last decade. Why should anyone trust Apple again?

Look, this whole neural-hash thing took what, 2 weeks for people to fabricate collisions? This just illustrated how poorly conceived and ill-thought the entire plan was from Apple. It's not beyond reason to assume any of these things given the evidence we currently have.

Alupis · on Aug 19, 2021

It does not appear one can opt out certain folders from this scan. If you enable iCloud backups, it's scans the entire shebang.

As previously mentioned, Android doesn't scan all photos on your device... Google scans content uploaded to their servers. Which is reasonable... It's their servers, they can host what they want. Your iPhone is your iPhone.

simondotau · on Aug 19, 2021

> If you enable iCloud backups, it's scans the entire shebang.

Citation?

Alupis · on Aug 19, 2021

Do I need one? Where in iOS can you choose which folders to opt-into CSAM scanning? I only see an all-or-nothing option for iCloud photos.

simondotau · on Aug 19, 2021

Yes, you do need a citation, because I've not heard Apple (or anyone else) claim that iCloud Backups or iCloud Drive are being subject to CSAM scans.

From everything I've read, from Apple and other sources, if the photo is about to be uploaded to iCloud Photo Library then it is scanned for CSAM. If it's not, it isn't.

Alupis · on Aug 19, 2021

How does one choose individual photos to not upload?

simondotau · on Aug 19, 2021

You store the photos you want to keep private in another app. I'm sure there are lots in the App Store.

Still waiting on that citation.

Alupis · on Aug 19, 2021

If the default behavior is not to exclude photo rolls from this new feature, I'm not sure where the argument exists. Telling iOS users they should download some app to keep photos private is absurd.

simondotau · on Aug 19, 2021

If a photo is about to be uploaded to iCloud Photo Library then it is scanned for CSAM. If it's not, it isn't.

Still waiting on that citation.

Alupis · on Aug 19, 2021

Are we arguing the same thing? How does one opt-out a specific photo? It's not possible as far as I know.

simondotau · on Aug 19, 2021

I've no idea what your point is. I've tried offering answers for all these random questions, but I'm still waiting for you to offer a citation for the claim you made earlier.

copperx · on Aug 19, 2021

> I agree CSAM isn't likely to be pervasive in the photo libraries on iOS devices.

Where does this assumption come from? Because of iOS lower market share? Are you implying they are more prevalent in Android devices? In desktop computers? I don't understand the logic.

simondotau · on Aug 19, 2021

The assumption comes from a general observation that most normal people don't tend to use their photo library to store legal porn (other than home made) and I haven't seen any argument for why CSAM aficionados are expected to be any less careful. There are surely plenty of apps out there for keeping separately encrypted vaults of files/photos, and I'm sure many are very easy to use.

You don't have to be particularly tech savvy to know it's a bad idea to co-mingle your deepest darkest secrets alongside photos of your mum and last night's dinner. Especially when discovering those secrets would lead to estrangement, or prison.

As for the few who might be doing it currently, that's likely to plummet quickly. If you think Apple's move caused waves in the Hacker News crowd, just imagine how much it has blown up in the CSAM community right now. I dare say it's probably all they've been talking about for the past two weeks.

sgerenser · on Aug 19, 2021

Yeah, they’re probably all saying “well I know for sure I’ll never use an Apple device for my CP from now on!” From Apple’s point of view, that’s mission accomplished.

norov · on Aug 19, 2021

It is more believable they are introducing this tech for larger international security reasons still kept under wraps.

cirrus3 · on Aug 19, 2021

Possible avoidance of being told how they have to do it later.

They will have to do it either way, and they the fact they are even telling how us they plan to do it is more than we can say for every other cloud services.

This is better than all alternatives at this point. Like it or not. If you don't like, you might need to get up to speed on what other services you may already be using are doing.

niij · on Aug 19, 2021

Government coercion.

colordrops · on Aug 19, 2021

They had to have been coerced.

spullara · on Aug 19, 2021

Apple is scanning files locally before they are uploaded to iCloud in order to avoid storing unencrypted photos within iCloud but still discovering CSAM. All the other storage providers already scan all the images uploaded on their servers. I guess you can decide which is better. Here is Google's report on it:

https://transparencyreport.google.com/child-sexual-abuse-mat...

jjcon · on Aug 19, 2021

> in order to avoid storing unencrypted photos within iCloud

To be clear, Apple does not utilize E2E in iCloud. They can (and already do) scan iCloud contents

ncw96 · on Aug 19, 2021

Apple has said this is not the final version of the hashing algorithm they will be using: https://www.vice.com/en/article/wx5yzq/apple-defends-its-ant...

only_as_i_fall · on Aug 19, 2021

Does it matter? Unless they're going to totally change the technology I don't see how they can do anything but buy time until it's reverse engineered. After all, the code runs locally.

If Apple wants to defend this they should try to explain how the system will work even if generating adversarial images is trivial.

ncw96 · on Aug 19, 2021

Apple has outlined[1] multiple levels of protection in place for this:

1. You have to reach a threshold of matches before your account is flagged.

2. Once the threshold is reached, the matched images are checked against a different perceptual hash algorithm on Apple servers. This means an adversarial image would have to trigger a collision on two distinct hashing algorithms.

3. If both hash algorithms show a match, then “visual derivative” (low-res versions) of the images are inspected by Apple to confirm they are CSAM.

Only after these three criteria are met is your account disabled and referred to NCMEC. NCMEC will then do their own review of the flagged images and refer to law enforcement if necessary.

[1]: https://www.apple.com/child-safety/pdf/Security_Threat_Model...

Dylan16807 · on Aug 19, 2021

I do want to note that decrypting the low-res images would have to happen before step 2.

only_as_i_fall · on Aug 19, 2021

Doesn't disabling the account kind also defeat the whole purpose?

I mean assuming the purpose is to catch child abusers and not merely to use this particular boogeyman to introduce a back door for later use.

copperx · on Aug 19, 2021

Will the high-resolution images be collected and used as evidence? Or just the visual derivatives? That's not clear.

ncw96 · on Aug 19, 2021

Currently, most likely.

I don’t believe Apple has said whether or not they send them in their initial referral to NCMEC, but law enforcement could easily get a warrant for them. iCloud Photos are encrypted at rest, but Apple has the keys.

(Many have speculated that this CSAM local scanning feature is a precursor to Apple introducing full end-to-end encryption for all of iCloud. We’ll see.)

Scaevolus · on Aug 19, 2021

NeuralHash collisions are interesting, but the way Apple is implementing their scanner it's impossible to extract the banned hashes directly from the local database.

There are other ways to guess what the hashes are, but I can't think of legal ones.

> Matching-Database Setup. The system begins by setting up the matching database using the known CSAM image hashes provided by NCMEC and other child-safety organizations. First, Apple receives the NeuralHashes corresponding to known CSAM from the above child-safety organizations. Next, these NeuralHashes go through a series of transformations that includes a final blinding step, powered by elliptic curve cryptography. The blinding is done using a server-side blinding secret, known only to Apple. The blinded CSAM hashes are placed in a hash table, where the position in the hash table is purely a function of the NeuralHash of the CSAM image. This blinded database is securely stored on users’ devices. The properties of elliptic curve cryptography ensure that no device can infer anything about the underlying CSAM image hashes from the blinded database.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

dannyw · on Aug 19, 2021

You can extract the hashes with a few hours spent on the darknet. Doing that is certainly illegal and not to mention VERY morally wrong, but criminals exist and criminals won't hesitate to abuse this as a mechanism for framing, extortion, or ransom.

It's also possible for someone (Attacker A) to go on the darknet and get a list of 96-bit neural hashes, and then publish or sell this list somewhere to another party, Attacker B. The second party would never have to interact with CSAM.

Imagine Ransomware v2: We have inserted 29 photos of CSAM-matching material into your photo library. Pay X monero to this address in 30 minutes, or we will insert 2 additional photos, which will cross the threshold and may result in serious and life-changing consequences to you[1].

The difference here (versus the status quo) is that an easily-broken perceptual hashing enables the attacker to never send or possess any CSAM images[2]. From my experiences with being victims of various hackers, I know a lot of them won't touch CSAM because they know it's wrong, but they'll salivate at an opportunity to weaponise automated CSAM scanning.

[1]: If you think Apple's human review will mitigate this attack, you can permute legal pornography to match CSAM signatures. If Apple's reviewers see 30 CSAM matches and the visual derivatives look like porn, they will be legally required to report to to NCMEC (a statutory quasi-government agency staffed by the FBI), even if all the photos are actually consensual adults.

[2]: If you never possess nor touch CSAM, it might be harder for you to get charged with CP charges. You might be looking at CFAA, blackmail or extortion charges; while your victim faces child pornography charges. This is basically an "amplification attack" on the real world judicial system.

Scaevolus · on Aug 19, 2021

One of the grosser clean room designs.

It's certainly possible, but I posit that the exploit chain necessary to get the capability to inject photos onto an arbitrary user's iPhone is valuable enough that it's more likely to be used for spying by repressive regimes than straight up blackmail-- and if you had such a capability, why bother with hash-colliding permutations of legal pornography? Why not plant CSAM directly onto the user's device?

Nearly all cloud storage services implement a scanner like this, and permit the same level of blackmail with a simpler attack chain, such as phishing Dropbox credentials to inject illegal material.

I think the more interesting attacks are governments colluding to add politically motivated non-CSAM material to the lists and then requiring Apple allow them to perform the human review to discover dissidents.

dannyw · on Aug 19, 2021

If you plant CSAM, you must possess, distribute, and transmit CSAM. That creates moral and legal barriers. The median ransomware actor probably finds CSAM repulsive and wrong.

If you plant material that matches CSAM hashes, you do none of that. The median ransomware actor might find this to thje fastest way to collect a thousand monero.

Also, you can distribute 30 media items per message via WhatsApp. There is a configurable setting for WhatsApp to save all received photos to your iCloud photo library. No exploits needed, you could probably weaponise this via an WhatsApp bot.

d0100 · on Aug 19, 2021

> If Apple's reviewers see 30 CSAM matches and the visual derivatives look like porn

Even worse, just get a "teen" porn screengrab, pass it through the collider and you have pretty much a smoking gun

XorNot · on Aug 19, 2021

The "visual derivative" is not something any of us have been shown an example of either. Whatever it is, I suspect you only need to be vaguely in the same ballpark (I would wager humanoid shaped skin tones maybe).

So I suspect it would be easier then that (particularly since this whole hashing scheme has been surrounded with a lot of clear garbage - "1 in a trillion" -> on demand collisions in a couple of weeks?

ComputerGuru · on Aug 19, 2021

I think visual derivative is just a beating-around-the-bush way of saying “thumbnail.”

XorNot · on Aug 19, 2021

That's their words that they feel no need to elaborate on. Obviously they actually seem to just be doing the "technically the truth" thing - which shows that someone realized no one would like hearing what it actually is.

ComputerGuru · on Aug 19, 2021

Yeah, it was very pointedly awkwardly worded. It’s intended for human reviewers to distinguish a false positive from a real positive. An eigen vector mapped image isn’t going to do that, a heavily Gaussian-blurred image isn’t going to do that - it needs to be something that a minimum wage person who’s only been trained a day or to can distinguish as “CSAM” or “not CSAM” and that means it’s a thumbnail of sorts.

user-the-name · on Aug 19, 2021

This attack does not work, as Apple uses two hashing algorithms, one on the device which is now public, and one which is secret. You would have to collide both, which would be hard enough if you knew what the second one was, which you don't.

bArray · on Aug 19, 2021

To defeat this, all you need to be is a state actor with a database of child porn at your disposal (which is stored for exactly the purpose of training detection systems). Then you run the hashing algorithm against images you know are in the database (Apple suggested that they would accept suggestions by some kind of multi-Country vote). Then you can pull out the hashes and figure out how to trigger false positives on the important vectors.

Next, embed your images in sites of interest, like:

* A meme in some group

* A document or 'leak'

* An email to a journalist

Wait for somebody to save it to their Apply device. Wait for it to be flagged and then use that as 'reasonable means to conduct a search'. When asking for a warrant, the agency would say something like "we detected possible CSAM on a device, the likelihood of a false match is extremely low" - a judge will hardly press further.

You now essentially have a weapon where you can search any Apple device in the name of preventing the distribution of CSAM.

Failing that, you could just have `document_leak.pdf` and download a file that is both a valid PDF and a child porn image, depending on which program you open it with.

godelski · on Aug 19, 2021

There's already a problem that Apple can't verify the hashes. Say a government wants to investigate a certain set of people. Those people probably share specific memes and photos. Add those hashes to the list and now you have reasonable cause to investigate these people.

Honestly this even adds to the danger of hash collisions because now you can get someone on a terrorist watch list as well as the kiddy porn list.

ec109685 · on Aug 19, 2021

Apple is the one doing the first line of investigation.

ec109685 · on Aug 19, 2021

This doesn’t work for two reasons: 1) There’s no way to know the perceptual hash value of Apple’s private NeuralHash function that is run on the derivative of the image server side to verify a hit really is CSAM. So while you could cause a collision with the on device neural hash if you possessed illegal content, you wouldn’t know if you successfully faked Apple’s private neuralhash implementation. 2) An Apple reviewer must verify the image is illegal before it’s passed along to law enforcement.

jazzyjackson · on Aug 19, 2021

is this an example of homomorphic encryption? checking for hashes in a 'blinded' table I mean.

throwaway384950 · on Aug 19, 2021

Some people seem to be confused why a hash collision of a cat and a dog matters. Here's a potential attack: share (legal) NSFW pictures that are engineered to have a hash collision with CSAM to get someone else in trouble. The pictures are flagged as CSAM, and they also look suspicious to a human reviewer (maybe not enough context in the image to identify the subject's age). To show that this can be done with real NSFW pictures, here is an example, using an NSFW image from a subreddit's top posts of all time.

Here is the image (NSFW!): https://i.ibb.co/Ct64Cnt/nsfw.png

Hash: 59a34eabe31910abfb06f308

robertoandred · on Aug 19, 2021

Does anyone save porn to their personal photo libraries? Especially porn as suspicious as the image you posted?

throwaway384950 · on Aug 19, 2021

Going by what some people on Reddit say, it seems to be the case. https://old.reddit.com/r/datahoarder/search?q=porn&restrict_...

Probably not the weird image I posted, which looks obviously suspicious. But maybe someone will make a program to find "cleaner" hash collisions that don't look suspicious.

nullc · on Aug 20, 2021

https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...

I posted some examples that look like totally normal images, they're no harder to produce, you just need to noise-shape the gradient descent so that the introduced noise has a spectrum similar to the image. E.g. just feeding back a gaussian highpassed version of the error signal is sufficient.

pseudalopex · on Aug 19, 2021

The CSAM detection system supposes people save actual child porn to their personal photo libraries.

ec109685 · on Aug 19, 2021

Keep in mind that you have to also collide with another perceptual hash function that only Apple has to trigger a match.

nullc · on Aug 20, 2021

> Keep in mind that you have to also collide with another perceptual hash function that only Apple has to trigger a match.

If it's another neural network I wouldn't be shocked if the adversarial preimages worked across both-- it's not uncommon for blackbox generalization to work for adversarial examples. It would be very likely if someone (maybe the attacker) made their own version of neuralhash and then generated examples that passed both theirs and apple's public one.

Privacy wise, if there were two perceptual hash functions Apple should have used the more restrictive one on the devices too -- because even if they decide to not report you, your privacy is still invaded if they inspect at your images at all.

The neuralhash function is extremely easy to attack. We should not have any confidence in the competence of its authors, so we shouldn't expect their undisclosed mechanism to provide a great deal of protection.

A secret second hash also will not be secret against a state attacker who will have access to this function by virtue of being trusted to create the databases for Apple.

There is, however, a very simple technique they could use that would provide almost perfect protection: They could stop invading the privacy of their users and refrain from scanning their private content!

tandav · on Aug 19, 2021

Does Google Chrome scans downloaded images ?

cirrus3 · on Aug 19, 2021

You seem to be assuming a human cannot tell the difference from some random NSFW content, and some legit known CSAM, 30 times. Try again.

marcan_42 · on Aug 19, 2021

Apple's reviewers don't have access to the original CSAM to know if it's a match or not. That stays with NCMEC. If they see some legal porn that looks like it could be illegal, they'd likely flag it as a match.

least · on Aug 19, 2021

Roughly thirty images is the threshold for the system to activate, but would they need to review all thirty images to pass it on or would they just need to verify one image looks visually like CSAM in order to pass it along?

It seems unlikely in the event that there was anything that they verified as CSAM they wouldn't pass it on just because they found a false positive in those thumbnails.

ipiz0618 · on Aug 19, 2021

Great work, I hope people keep hacking the system to lower the system's credibility. This idea is just beyond insane, and the plan to have manual check on user's photos on their own devices sounds like what China is doing - not great

jchw · on Aug 19, 2021

I am strongly against Apple’s decision to do on-device CSAM detection, but: wasn’t there a secondary hash whose database is not shared? In theory you need to collide with both to truly defeat the design, right?

PBnFlash · on Aug 19, 2021

You just need a sample of something that is evidently so pervasive we're building a nation wide dragnet to stop.

jchw · on Aug 19, 2021

I doubt finding it would really be that hard, if you wanted to be on a list somewhere in some government database, but even armed with a full image that is in the NCMEC database, the problem is that the second hashing algorithm runs on the server and presumably has secure-by-obscurity details… so it would be hard to collide with it on purpose unless you are an insider. That’s my understanding, although details have been a bit shaky at times.

only_as_i_fall · on Aug 19, 2021

I don't see how.

They're hashing on feature space (so trivial cropping and such doesn't defeat this) but they have two totally separate methods of matching those hashes? Doesn't sound right to me...

jchw · on Aug 19, 2021

Apparently the images in question would get sent to the server, and all calculation happens there.

> In a call with reporters regarding the new findings, Apple said its CSAM-scanning system had been built with collisions in mind, given the known limitations of perceptual hashing algorithms. In particular, the company emphasized a secondary server-side hashing algorithm, separate from NeuralHash, the specifics of which are not public. If an image that produced a NeuralHash collision were flagged by the system, it would be checked against the secondary system and identified as an error before reaching human moderators.

https://www.theverge.com/2021/8/18/22630439/apple-csam-neura...

For one reason or another Apple really wants to create this precedent, so it’s only natural they’re doing every last thing to make the feature hard to defeat.

ec109685 · on Aug 19, 2021

Hard to exploit is better phrasing.

zepto · on Aug 19, 2021

Yes, but it’s much easier to just ignore that and proclaim how weak Apple’a system is.

onepunchedman · on Aug 19, 2021

This is just getting wilder and wilder by the day, how spectacularly this move has backfired. As others have commented, at this point all you need is someone willing to sell you the CSAM hashes on the darknet, and this system is transparently broken.

Until that day, just send known CSAM to any person you'd like to get in trouble (make sure they have icloud sync enabled), be it your neighbour or a political figure, and start a PR campaign accusing the person of being investigated for it. The whole concept is so inherently flawed it's crazy they haven't been sued yet.

dannyw · on Aug 19, 2021

The "send known CSAM" attack has existed for a while but never made sense. However, this technology enables a new class of attacks: "send legal porn, collided to match CSAM perceptual hashes".

With the previous status quo:

1. The attacker faces charges of possessing and distributing child pornography

2. The victim may be investigated and charged with child pornography if LEO is somehow alerted (which requires work, and can be traced to the attacker).

Poor risk/reward payoff, specifically the risk outweighs the reward. So it doesn't happen (often).

---

With the new status quo of lossy, on-device CSAM scanning and automated LEO alerting:

1. The attacker never sends CSAM, only material that collides with CSAM hashes. They will be looking at charges of CFAA, extortion, and blackmail.

2. The victim will be automatically investigated by law enforcement, due to Apple's "Safety Voucher" system. The victim will be investigated for possessing child pornography, particularly if the attacker collides legal pornography that may fool a reviewer inspecting a 'visual derivative'.

Great risk/reward payoff. The reward dramatically outweighs the risk, as you can get someone in trouble for CSAM without ever touching CSAM yourself.

If you think ransomware is bad, just imagine CSAM-collision ransomware. Your files will be replaced* with legal pornography that is designed specifically to collide with CSAM hashes and result in automated alerting to law enforcement. Pay X monero within the next 30 minutes, or quite literally, you may go to jail, and be charged with possessing child pornography, until you spend $XXX,XXX on lawyers and expert testimony that demonstrates your innocence.

* Another delivery mechanism for this is simply sending collided photos over WhatsApp, as WhatsApp allows for up to 30 media images in one message, and has settings that will automatically add these images to your iCloud photo library.

fay59 · on Aug 19, 2021

Before they make it to human review, photos in decrypted vouchers have to pass the CSAM match against a second classifier that Apple keeps to itself. Presumably, if it doesn’t match the same asset, it won’t be passed along. This is explained towards the end of the threat model document that Apple posted to its website. https://www.apple.com/child-safety/pdf/Security_Threat_Model...

sigmar · on Aug 19, 2021

What happens if someone leaks or guesses the weights on that "secret" classifier? The whole system is so ridiculous even before considering the amount of shenanigans the FBI could pull by putting in non-CSAM hashes.

fay59 · on Aug 19, 2021

For better or worse, opaque server-side CSAM models are the norm in the cloud photo hosting world. I imagine that the consequences would be roughly the same as if Google's, Facebook's or Microsoft's "secret classifiers" were leaked.

sigmar · on Aug 19, 2021

but in the cloud setting they have the plaintext of what was uploaded. The attack described above is about abusing the lack of information apple has so they will report an innocent user to the authorities.

fay59 · on Aug 19, 2021

The voucher that Apple can decrypt once enough positives have been received contains a scaled-down version of the original. How else would Apple be able to even run a second hash function on the same picture?

ehsankia · on Aug 19, 2021

Can't they just make a new one and recompute the 2nd secret hash on the whole data set fairly easily?

Also, the whole point is that it's fairly easy to create a fake image that collides with one hash, but doing it for 2 is exponentially harder. It's hard to see how you could have an image that collides with both hashes (of the same image mind you).

ummonk · on Aug 19, 2021

Two hash models is functionally equivalent to a particular type of one double-sized hash model. So it shouldn't be any harder to recompute against a 2nd hash, if that 2nd hash were public.

Of course, it won't be public (and if it ever became public they'd replace it with a different secret hash).

snovv_crash · on Aug 19, 2021

If you have both models it is easy. If Apple manages to keep the server model private then it is hard.

iflp · on Aug 19, 2021

You don’t need to have the weights. “Transfer attack” is a thing.

lamontcg · on Aug 19, 2021

You can still hack someone's phone and upload actual CSAM images. That exposes the attacker to additional charges, but they're already facing extortion and all that anyway. I don't understand the "golly gee whizz, they'd have to commit a severe felony first in order to launch that kind of attack" argument.

Don't know why this hasn't already been used on other cloud services, but maybe it will be now that its been more widely publicized.

salawat · on Aug 19, 2021

How...exactly did they train that CSAM classifier? Seeing as that training data would be illegal. I'd be most interested in an answer on that one. They are willing to make that training data set a matter of public record on the first trial, yes?

Or are we going to say secret evidence is just fine nowadays? Bloody mathwashing.

sethgecko · on Aug 19, 2021

They didn't train a classifier, just a hashing function.

rnjesus · on Aug 19, 2021

honestly asking — why is it illegal?

salawat · on Aug 19, 2021

It may not be, so honestly I think my objection is best dismissed. Once I ran down the actual chain I mostly sorted things out with a cooler head.

However, the line of thinking was if Apple has a secondary classifier to run against visual derivatives, the intent is it can say "CSAM/Not CSAM". Since the NeuralHash can collide, that means they'd need something to take in the visual derivatives, and match it vs an NN trained on actual CSAM. Not hashes. Actual.

Evidence, as far as I'm aware, is admitted to the public record, and a link needs to exist, and be documented in a publically and auditable way. That to me implies any results of a NN would necessarily require that the initial training set be included for replicability if we were really out to maintain the full integrity of the chain of evidence that is used as justification for locking someone away. That means a snapshot of the actual training source material, which means large CSAM dump snapshots being stored for each case using Apple's classifier as evidence. Even if you handwave the government being blessed to hold onto all that CSAM as fitting comfortably in the law enforcement action exclusions; it's still littering digital storage somewhere with a lotta CSAM. Also Apple would have to update their model over time, which would require retraining, which would require sending that CSAM source material to somewhere other than NCMEC or the FBI (unless both those agencies now rent out ML training infrastructure for you to do your training on leveraging their legal carve out, and I've seen or come across no mention of that.)

Thereby, I feel that logistically speaking, someone is commiting an illegal act somewhere, but no one wants to rock the boat enough to figure it out, because it's more important to catch pedophiles than muck about with blast craters created by legislation.

I need to go read the legislation more carefully, so just take my post as a grunt of frustration at how it seems like everyone just wants an excuse/means to punish pedophiles, but no one seems to be making a fuss over the devil in the details, which should really be the core issue in this type of thing, because it's always the parts nobody reads or bothers articulating that come back to haunt you in the end.

rnjesus · on Aug 19, 2021

i did a bit of reading as well and came across this. you might find it useful or interesting: https://www.law.cornell.edu/uscode/text/18/2258A at the end (h1-4), it details that providers must preserve the information they submit and also take steps to limit access to only people who need it. in this sense then, it’s not illegal for companies to possess csam. it’s not a big leap to then assume that storing csam for the development of detection software is legal (or at least as been throughly cleared with the courts, which is about the same). photodna was developed twelve years ago, and i can’t find anything about microsoft ever being charged with possession or distribution of cp.

salawat · on Aug 20, 2021

Interesting!

Thank you, that was what I was looking for that closes the gap somewhat.

onepunchedman · on Aug 19, 2021

Somehow this didn't solidify my trust in Apple! By this standard you can probably mount a half decent defence off "ignorance" if you are even caught sending the colliding material. Add this whole debacle on top of what's going on in the EU parliament and 2021 has been WILD for privacy.

copperx · on Aug 19, 2021

It seems like I'm not going to sleep tonight.

Sure, there is hyperbole in OP's comment (CSAM ransomware and automated law enforcement aren't a thing yet), but we're a few steps from that reality.

Even worse, how long will it take until other cloud storage services such as Dropbox, Amazon S3, Google Drive et al implement the same features? Or worse, required by law to do so?

This sounds like the start of an exodus from the cloud, at least in the non-developer consumer space.

spullara · on Aug 19, 2021

Cloud services generally already do this, for example, here is Google's report:

https://transparencyreport.google.com/child-sexual-abuse-mat...

onepunchedman · on Aug 19, 2021

Yeh I was talking in hyperbole, but the possible attack vectors this system enables are so powerful I felt it warranted. Under this system you are able to artificially ddos organizations that verify if CP is sent by sending legitimate, low-res porn whose hash has been modified. You can trigger legitimate investigations by sending CSAM through WhatsApp or through social engineering. You can also fuck with Apple by sending obvious spam.

* With regard to the legislative branch, they can even mandate changes to this system they aren't allowed to disclose. Once this system is in place, what is stopping governments from forcing other sets of hashes for matching.

copperx · on Aug 19, 2021

And this is just one step away from Apple and Microsoft building this scanning into the OS itself (into the kernel/filesystem code, why not?!). This is beyond insane. Stallman was right. Our devices aren't ours anymore.

Now, to be fair, there would be a secondary private hash algorithm running on Apple's servers to minimize the impact of hash collisions, but what's important is that once a file matches a hash locally, the file isn't yours anymore -- it will be uploaded unencrypted to Apple's servers and examined. How easy would it be to shift focus from CSAM into piracy to "protect intellectual property"? Or some other matter?

onepunchedman · on Aug 19, 2021

Jup. As others have pointed out, if Apple were willing to lie about the extent of this system and its inception date, why should we suddenly trust that they won't extend its functionality. They themselves explicitly state that the program will be extended, so if this is the starting point I don't think I will be around for the ride.

It's a shame as I really love some of their privacy-minded features (e.g. precision of access to the phone's sensors and/or media).

happyopossum · on Aug 19, 2021

> Even worse, how long will it take until other cloud storage services such as Dropbox, Amazon S3, Google Drive et al implement the same features? Or worse, required by law to do so

They already do this. Google and Facebook have even issued reports detailing their various success rates…

EGreg · on Aug 19, 2021

So, everyone is going to turn off their iCloud sync and they won’t be a target anymore?

croutonwagon · on Aug 19, 2021

Well according to reports that are generally the source of these collisions, the hashing code has been on the device since around December 2020 (14.3)

https://old.reddit.com/r/MachineLearning/comments/p6hsoh/p_a...

If Apple hasn't been honest about WHEN it was built into and added to their code base, why would anyone take their word for HOW its being used, or many of the other statements they are putting in their documents as of yet, at least until they are verified

heavyset_go · on Aug 19, 2021

It doesn't necessarily mean that it will stop them from being a target, because Apple says this[1]:

> This program is ambitious, and protecting children is an important responsibility. These efforts will evolve and expand over time.

[1] https://www.apple.com/child-safety/

copperx · on Aug 19, 2021

> This program is ambitious, and protecting children is an important responsibility. These efforts will evolve and expand over time.

"Think of the children" is the most recognizable trope in TV and film. They couldn't have phrased that to be more Orwellian.

copperx · on Aug 19, 2021

Yes, until they add local scanning to macOS / iOS / iPad OS.

cryptonector · on Aug 19, 2021

The attacker faces no charges because the colliding image can be a harmless meme.

robertoandred · on Aug 19, 2021

LEO is not alerted automatically, where’d you get that idea?

silisili · on Aug 19, 2021

They'd more or less have to be. Well, not necessarily 'police', but NCMEC.

I did work in automating abuse detection years back, and the US govt clearly tells you are not to open/confirm suspected, reported, or happened upon cp. There's a lot of other seemingly weird laws and rules around it.

robertoandred · on Aug 19, 2021

Those laws don’t apply if it’s part of the reporting process. Apple’s stated that they do a manual to decide whether to send a report to NCMEC or not, just like other companies do.

silisili · on Aug 19, 2021

Of course they do. If they didn't, every seedy pedo would be in the process of making a "report." It's probably also why Apple is using 'visual derivatives' for confirmation, rather than the image, though I can't find info on exactly how low resolution 'visual derivatives' are.

It is of course possible that companies may get some special sign off from LE/NCMEC to do this kind of work - I won't argue with you on that as I truly don't know. I can just tell you my company did not, and was very harshly told how to proceed despite knowing the nature of what we were trying to accomplish. But, we weren't anywhere near Apple big.

I remember chatting with our legal team, who made it explicit that laws didn't to cover carve outs - basically 'seeing' was illegal. But as you can imagine, police didn't come busting down our doors for happening upon it and reporting it. If you have links to law where this is not the case, I'll gladly eat crow. I've never looked myself and relied on what the lawyers had said.

hypothesis · on Aug 19, 2021

They will be if you collide a low-res image that resembles CSAM.

Why would person doing manual review risk his job in case if he’s unsure? Naturally he will just play it safe and report images.

ec109685 · on Aug 19, 2021

Not resembles. The adversarial image has to match a private perceptual hash function of the same CSAM image that the NeuralHash function matched before a human reviewer ever looks at it.

onepunchedman · on Aug 19, 2021

Do you have any material on this private function?

ec109685 · on Aug 19, 2021

Not beyond the documents Apple has shared. Presumably it will be kept that way given it prevents an adversarial attack against it.

shuckles · on Aug 19, 2021

Why wait? Just send them the pictures on Facebook Messenger or Gmail or Dropbox today.

SCLeo · on Aug 19, 2021

I can't tell if you are being sarcastic. In case you are not, isn't the act of sending those pictures completely illegal?

shuckles · on Aug 19, 2021

People here are proposing intentionally creating image assets which collide with perceptual hashes of known CSAM (ignoring whether that is legal or ethical) and sharing those assets to effectively SWAT unaware targets.