MAI-Thinking-1

keeda · 2026-06-02T19:55:52 1780430152

> Second, clean data. MAI-Thinking-1 was trained on clean and appropriately licensed data, with AI-generated content excluded from pre-training. This matters for quality, provenance, and control. If we cannot account for what shaped a model, we cannot fully understand its behavior or credibly improve it.

Shots fired?

It would be interesting to see how far "clean data" can go on the scaling laws.

foresterre · 2026-06-02T21:20:54 1780435254

I would really like to see what "appropriately licensed data" means. Cannot imagine they didn't copy all open repo's on GitHub, and can't imagine they asked for permission, or are reproducing license texts from these repo's now. It sounds hand wavy.

P.S. A fairly basic website otherwise, but it unfortunately seems to be hacking scroll for no good reason.

ralph84 · 2026-06-03T01:27:12 1780450032

Presumably their position remains that training on public repos is fair use and doesn't require a license. If it doesn't require a license it's still "appropriately licensed".

stingraycharles · 2026-06-02T21:24:50 1780435490

I assume they took the actual repos’ licenses info account. I don’t understand why they should ask for permission when the license would already allow for it.

foresterre · 2026-06-02T22:07:28 1780438048

Almost all licenses have requirements to redistribute copies of the work, or derivatives thereof. Even permissive licenses do. It's very little to ask when open source dev's provided thousands of hours of free work.

For example, the Apache 2.0 license requires in just 4.c:

  You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works;

Just because they're tokenized and transformed into a probabilistic mapping, doesn't suddenly mean that they weren't copied.

I find it morally unethical that they (likely) just ingest IP of all open source repo's without asking, but also importantly without any attribution.

Let me also note that I'm not against LLM's in general. But I do think training on open source must be opt-in, and I look forward to a world with actually ethical, and traceable (i.e. on what they were trained on, like a bill of materials (BOM)), models.

stingraycharles · 2026-06-04T00:55:32 1780534532

But that’s what I meant with taking it into account. They would likely only use BSD and MIT licensed repos, which is a lot.

rocqua · 2026-06-02T21:30:18 1780435818

Which licenses allow usage for training? MIT, BSD, etc likely do. But I would expect it gets weird for all the various copyleft licences.

cortesoft · 2026-06-02T21:56:47 1780437407

Why would it get weird for those?

rzmmm · 2026-06-02T22:02:40 1780437760

Theoretically it mandates that derivative works use same license but it's unclear if that applies to LLM outputs.

VortexLain · 2026-06-02T22:34:08 1780439648

Recently, GitHub has changed their terms of service to use all user data for AI training unless users explicitly opt out. This is probably the way Microsoft has obtained "appropriately licensed data".

mattnewton · 2026-06-02T22:53:27 1780440807

this is almost certainly too recent to have been used for training data, no? Unless they optimistically included most repos somehow?

supermdguy · 2026-06-02T21:33:47 1780436027

It's interesting because their last model series (Phi) was based around the thesis that high-quality synthetic data is better than a large pre-training corpus.

vdfs · 2026-06-02T20:21:56 1780431716

I doubt any lab would say otherwise, they all _claim_ to use licensed data

keeda · 2026-06-02T20:41:18 1780432878

Maybe, but Microsoft, through their partnership with OpenAI, is already involved in major copyright lawsuits. That is probably a driving force for this move, actually... I doubt they would want to tempt fate while those lawsuits are on-going.

vanuatu · 2026-06-02T22:05:49 1780437949

all the labs "clean" their pretraining data, and you can have your pretraining data to be minimally ai generated but also spam synthetic post-training data

swalsh · 2026-06-02T21:45:25 1780436725

I'd assume it's not up to par with Qwen-3.5 then, which has been distilling Claude, and the quality of the model is probably a direct result of that.

onlyrealcuzzo · 2026-06-02T19:59:07 1780430347

I'm interested how much "Clean Data" is synthetic data from "unclean" models...

bicx · 2026-06-02T20:53:58 1780433638

So, laundered data?

ertgbnm · 2026-06-02T20:19:14 1780431554

> with AI-generated content excluded from pre-training.

> without distillation from third-party models

sounds like zero unless they are lying.

zamalek · 2026-06-02T20:31:12 1780432272

> with AI-generated content excluded from pre-training.

Though this is largely impossible these days, unless they pre-trained on pre-AI era data.

stymaar · 2026-06-02T22:16:35 1780438595

That could be. Just use pre-training for language understanding and let the post-training on synthetic data do the heavy lifting.

saghm · 2026-06-02T21:20:19 1780435219

"how many of those shapes are rectangles?" "sounds like zero unless they are squares"

Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause. I find it hard to believe that a company willing to violate licenses would have scruples about lying about it.

rocqua · 2026-06-02T21:35:05 1780436105

Not vacuous, but tautological. Which is different, because tautologies can actually be quite directly informative. Whereas vacuous truths tend to be oblique.

Also, “Microsoft is lying” is not a logically stronger statement, because they might be lying about something other than whether they distilled or trained on AI output.

chongli · 2026-06-02T21:28:53 1780435733

Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause

I think that's the point. "How do I say they're lying without outright saying they're lying?"

It's a common rhetorical trick.

Leynos · 2026-06-03T08:45:18 1780476318

Or the speaker is just not in the mood to argue with someone whose response will be, "you trust anything Microsoft say?"

xavriley · 2026-06-02T20:01:47 1780430507

“ We trained it from the ground up on enterprise grade, clean and commercially licensed data, without distillation from third-party models.”

azinman2 · 2026-06-02T20:13:04 1780431184

aka all of GitHub OSS

rurban · 2026-06-03T04:47:18 1780462038

Not OSS only, likely also the enterprise private repos, with a lot of business secrets.

ChicagoDave · 2026-06-02T21:17:51 1780435071

Yeah this is exactly what I was thinking.

andai · 2026-06-02T22:02:25 1780437745

Interesting. Wasn't their previous attempt (Phi) trained mostly on synthetic data?

__natty__ · 2026-06-02T21:18:04 1780435084

It's good there is a new player on the market, I take benchmark tables with a grain of salt, however. Speaking about model presentation it's funny to see how clearly their website is inspired by other AI company blogs with extra innovation of hijacked scrollbar.

jampekka · 2026-06-02T21:39:16 1780436356

The benchmarks are a bit of a disaster? It's at about DeepSeek V3.2 level, but with about 50% more parameters. Loses handily to the also smaller GLM-5.1, and even worse to the similarly sized Kimi K2.6.

sailingparrot · 2026-06-02T21:50:27 1780437027

Yes and no. Yes from a user PoV, I don't really see a great reason to use this other than for enterprises that care about using a model not trained on copyrighted data (not sure what the market really is for this anymore, feels like this concern has been forgotten by most customers).

From a strategic PoV for MS, all the models you cited are distilling GPT/Claude/Gemini and wouldn't be anywhere as good as they are without this distillation, which in turn means you are dependent on OAI/Anthropic/G first shipping a good model to generate data for your training. This MAI model is trained from scratch with no synthetic data or distillation. So in term of benchmark its obviously much harder to get strong score and thus not a disaster if they can keep on improving.

usef- · 2026-06-02T21:54:40 1780437280

They claim to not be training to the benchmarks at all. It'll be interesting to see how it stacks up in actual use.

nojito · 2026-06-02T23:10:51 1780441851

No distillation. Comparing it to DeepSeek or GLM doesn't make much sense.

pixeldash928 · 2026-06-02T18:52:17 1780426337

Looks like the OAI divergence is finally taking place. Seems like the comparisons are mainly with Opus 4.6 and GPT 5.4 though. Still, exciting to see a new frontier player.

i_have_an_idea · 2026-06-02T20:35:43 1780432543

Is it a frontier player though, or perhaps a new benchmaxxed model? People were saying similar things about Grok but it ultimately amounted to little.

wasabi991011 · 2026-06-02T20:55:06 1780433706

"preferred by humans over Sonnet 4.6" makes it pretty clearly not benchmaxxed though.

At least when you define benchmaxxed as "good in benchmarks but not human preference".

dude250711 · 2026-06-02T21:24:36 1780435476

Post 4.6 Anthropic models do not exactly have a stellar reputation, so that choice is smart.

Centigonal · 2026-06-02T21:19:59 1780435199

> MAI-Thinking-1 is a 35B-active, ~1T-total parameters, sparse Mixture of Experts model, a smaller inference footprint than much larger models.

This seemingly nonsensical sentence (of course this will have a smaller inference footprint than larger models) suggests this model's competitors have larger inference footprints and total parameter sizes.

dr_kiszonka · 2026-06-03T08:41:43 1780476103

When would a larger model have a smaller inference footprint? If the larger was MoE and the smaller was dense?

Centigonal · 2026-06-03T16:10:04 1780503004

yes, MoE reduces the inference compute requirements (inference memory reqs remain the same)

rajveerb · 2026-06-04T03:13:37 1780542817

As someone who has spent quite a lot of time on inference, I would a add a small note:

Deployment looks very different for MoE than dense style models so I would say that it is more nuanced than "inference memory reqs remain the same". Memory can be very different for MoE style models.

Alifatisk · 2026-06-02T21:14:57 1780434897

> MAI-Thinking-1 is built with enterprise readiness in mind. It supports long context with a 256k token window

Isn’t 1M becoming the norm?

vb-8448 · 2026-06-02T21:32:38 1780435958

1M it's only marketing, in my experience above 150k quality noticeable drops.

Claude code will suggest you to start a new session or compact if you go above 100k.

Bolwin · 2026-06-03T15:34:55 1780500895

In my experience above 60k quality noticeably drops.

30k for open source models

stingraycharles · 2026-06-02T21:26:07 1780435567

Yes it is, but I can imagine that they want to start out a bit smaller to see how well things scale, and/or did not yet have the time to work on optimizing for the large context windows.

droidjj · 2026-06-02T21:28:12 1780435692

I struggle to get quality results from the frontier models at contexts > 256k anyway.

stingraycharles · 2026-06-02T21:58:44 1780437524

Yup, same experience, it’s because the attention basically has exponential complexity. So at large context windows, they need to compress the attention (eg group multiple tokens together), when then leads to loss in accuracy.

It’s almost always better to keep your context windows small.

aesthesia · 2026-06-03T05:44:40 1780465480

What's interesting is that although they don't seem to be releasing the model weights, they have published a technical report (https://microsoft.ai/wp-content/uploads/2026/06/main_2026060...) that's more extensive than the typical open-weights model gets.

dang · 2026-06-02T22:07:33 1780438053

Related ongoing thread:

MAI-Code-1-Flash - https://news.ycombinator.com/item?id=48374466 - June 2026 (131 comments)

BeetleB · 2026-06-02T20:39:24 1780432764

Based on the first table, why would I pick this over GLM?

missedthecue · 2026-06-02T21:00:02 1780434002

Because your employer might make you exclusively use enterprise copilot.

BeetleB · 2026-06-02T21:30:11 1780435811

As long as my employer is footing the bill, fine.

For personal stuff this release is not noteworthy.

deflator · 2026-06-04T13:38:21 1780580301

Does this mean that work created with it can be copyrighted? Since the courts ruled that the inclusion of pilfered IP was the reason other model's work cannot be copyrighted, I would think so! In that case, this is a completely different beast. It can maybe be used for things that need a durable copyright.

lordmauve · 2026-06-02T20:06:46 1780430806

We need to see DeepSWE scores. SWE Bench Pro is junk.

hartator · 2026-06-02T20:56:30 1780433790

I like it so much when a website hijacks the way my scroll works. This is truly innovative.

campital · 2026-06-02T22:06:51 1780438011

Yeah, you might get disoriented and throw up if they didn't smooth it out.

wmf · 2026-06-02T20:11:27 1780431087

At least there shouldn't be any complaints about benchmaxing this time.

i_have_an_idea · 2026-06-02T20:37:36 1780432656

Just because it is performing rather poorly by comparison, it doesn’t mean it isn’t benchmaxxed. It can still be worse than it appears.

wasabi991011 · 2026-06-02T20:56:25 1780433785

It isn't benchmaxxed because they are using human preference as an evaluation.

bossyTeacher · 2026-06-02T20:03:28 1780430608

7 modes launched. 5 models in the dropdown. Only 4 actually usable :(

About time Microsoft joined the fray. After the OpenAI divorce, it really looked like Microsoft was going to become another Uber.

giancarlostoro · 2026-06-02T20:11:25 1780431085

They still own 27% of OpenAI, this IPO will feed them a lot of easy cash.

adt · 2026-06-02T22:25:42 1780439142

https://lifearchitect.ai/models-table/

kstenerud · 2026-06-02T20:19:51 1780431591

They've hijacked scrolling. They've hijacked the spacebar. It flickers like crazy when I try to move through the article. Trying to get through it is an exercise in madness.

t-sauer · 2026-06-02T20:35:53 1780432553

I do not understand how scroll hijacking is still a thing. Who thinks this is a better experience?

maelito · 2026-06-02T20:43:09 1780432989

Designers.

bensyverson · 2026-06-02T22:19:19 1780438759

As a designer, let me tell you: scroll jacking is not good design

AirMax98 · 2026-06-02T20:23:44 1780431824

I normally don't comment on matters of taste like this, but wow this is brutal. It's like someone threw the site in a vat of molasses.

grassfedgeek · 2026-06-02T21:02:21 1780434141

Even without flicker it is very distracting. Why do people think this is a good idea?

aniceperson · 2026-06-02T20:37:23 1780432643

there is also a gap between the header and the top of the page... they should ask the ai to make it better a few more times...

blisstonia · 2026-06-02T20:52:26 1780433546

I gave up after the first scroll.

vcryan · 2026-06-02T20:59:29 1780433969

It really looks like they used Claude to design this webpage. I guess the color taupe it the marker of good AI today.

Handy-Man · 2026-06-02T21:10:52 1780434652

Inflection AI

basilikum · 2026-06-03T01:16:37 1780449397

Why is microsoft.ai hosted on an ASN called WPEngine and not by Microsoft themselves?

kaicianflone · 2026-06-02T21:25:53 1780435553

Is that a pretext zoom effect when changing screen dimensions? Very cool.

euphetar · 2026-06-02T23:03:02 1780441382

Honestly, a lame release of mediocre models.

I was most excited about the "frontier tuning." Like, it will actually watch you do stuff and learn to do it for you? That would be actually interesting.

But no, it's just a data labelling interface: https://learn.microsoft.com/en-us/microsoft-365/copilot/copi.... You have to provide the instruction and give feedback and there is a whole UI with hour-lonf wait between steps. So basically they want you to do the labelling to train a model, or at least that's how it looks from the outside

Also the mission statement of Humanist AI is the most boring, but tries to sound way too grand. Like "all the cool labs have a mission statement, so we should also have one" vibes

gigatexal · 2026-06-02T21:31:09 1780435869

Anyone believing those benchmark numbers from a 35B model?

jeffdn · 2026-06-02T21:34:13 1780436053

It says right at the top, 35B active, 1T total.

simjnd · 2026-06-02T18:42:18 1780425738

Absolutely disgusting scroll jacking, even when "Accessibility mode" is turned on

dang · 2026-06-02T19:38:05 1780429085

I'm sure most of us agree, but:

"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

https://news.ycombinator.com/newsguidelines.html

simjnd · 2026-06-02T20:39:22 1780432762

Forgot about this, my bad!

throwawayffffas · 2026-06-02T23:08:08 1780441688

Meh, 1T parameters no weights? I am running a better model right now on 40GB of VRAM.