No, you are just oversimplifying the issue. Of course, if you were regularly riding before and very fit and change to an e-bike out of laziness, the net effect might be negative, tho even that part is not conclusively proven.
But if you found biking way too exhausting, maybe living in a hilly area, riding an e-bike is ten times better than doing nothing. Would it be even better to ride a non-e-bike? Maybe. Would it happen? Probably not.
we're now in a world where the youth of developed countries are way more overweight than they were before, because of a radical shift in diet and exercise
upon reading that using electric bikes (that are basically mopeds disguised as bikes, i.e. which do not induce "real" physical activity due to assistance from the motor) are the healthiest thing to ever happen to teenagers, I have the feeling that the title is being a bit over the top
I'm not even talking about being fit or anything, just that actual bike riding (normal bikes) or just walking maybe would be actual good news. I understand that people in hilly areas benefit from ebikes, but is this the majority of the people mentioned here? or is it just that we're all like "well, I guess it's better than staying inside all the time eating doritos"? That's why I'm saying that this is a sign that we've given up, we're counting this as some sort of win, which I don't think it is
> upon reading that using electric bikes (that are basically mopeds disguised as bikes, i.e. which do not induce "real" physical activity due to assistance from the motor) are the healthiest thing to ever happen to teenagers, I have the feeling that the title is being a bit over the top
Maybe you should read more than the title than? Like say...the article? Just a suggestion.
that line of "someone says something I don't agree with so I will insinuate they didn't read the article" is tired, seriously
an excerpt from the article you pretend I didn't read (which I did)
But let’s be honest: even throttle-only riding is more active than sitting on a couch
the article is literally saying that
it's also saying things about mental well-being, the sense of community, which I can get behind of course, but there's some sense of course to the bottom in our expectations
> that line of "someone says something I don't agree with so I will insinuate they didn't read the article" is tired, seriously
Except... you clearly didn’t read it, or you read it and didn’t understand it. I actually gave you the benefit of the doubt by assuming it was the former.
> the article is literally saying that
The article is literally saying a lot more than just that.
You cherry picked one point out of a multi-faceted discussion about health benefits, ignored the rest, and then acted like that narrow takeaway was the entire message. Pro tip: “health” doesn’t just mean physical exertion.
So again, either you didn’t read the article, or you didn’t get it.
> it's also saying things about mental well-being, the sense of community, which I can get behind of course
Cool. But no one’s asking whether you can “get behind it” and nor does anyone care. The point is those are part of what makes it beneficial.
Being daft on purpose? I haven't heard that using an alternative browser suddenly increases the traffic that a user generates by several orders of magnitude to the point where it can significantly increase hosting cost. A web scraper on the other hand easily can and they often account for the majority of traffic especially on smaller sites.
So your comparison is at least naive assuming good intentions or malicious if not.
Double-blind review is a mirage that does not hold up. While I was in academia I reviewed a paper that turned out to be a blatant case of plagiarism. It was a clear Level 1 copy according to the IEEE plagiarism levels (Uncredited Verbatim Copy of more than 50% of a single paper). I submitted all of these findings with the original paper and what parts were copied (essentially all of it) as my review.
A few days later I got an email from the author (some professor) who wanted to discuss this with me, claiming that the paper was written by some of his students who were not credited as authors. They were unexperienced, made a mistake, yaddah yaddah yaddah. I forwarded the mail to the editors and never heard from this case again. I don't expect that anything happened, the corrective actions for a level-1 violation are pretty harsh and would have been hard to miss.
The fact that this person was able to obtain my name and contact info shattered any trust I had in the "blind" part of the double-blind review process.
The other two reviewers had recommended to accept the paper without revisions, by the way.
This seems like an issue of administration rather than an issue with the idea of a double-blind review. If you conduct a review that isn't properly blinded, and doesn't have an observable effect, can it really be called a double blind review?
Maybe more that a non-idealistic model of the real world, and common direct experience, show that incentives strongly favor an administrative approach that compromises the double blind.
Unless there's a better way to do it, I this shows a need for better structures for governance and auditing of review boards... Information and science care not for our human folly, it's up to us to seek and execute them properly.
I remember attending ACL one year, where the conference organisers ran an experiment to test the effectiveness of double blind reviews. They asked reviewers to identify the institution that submitted the anonymised paper. Roughly 50% of reviewers were able to correctly identify the institutions. I think there was a double digit percentage of being able to predict authors.
The organisers then made the argument that double blind was working because 50% of papers were not identified correctly! I was amazed that even with strong evidence that double blind was not working, the organisers were still able to convince themselves to continue with business as usual.
You're saying "not working" when you only have presented evidence for "not perfect".
That experiment showed that even when asked to put effort into identifying the source of an anonymized paper—something that most reviewers probably don't put any conscious effort into normally—the anonymization was having a substantial effect compared to not anonymizing the papers.
Am I missing some obvious reason why double-blind reviews should only be attempted if the blinding can be achieved with a near-perfect success rate, or are you just setting the bar unreasonably high?
The subtext to this whole comment chain is that you need to have hands-on experience with qualitative to quantitative conversions if you want to reason about the scientific process.
> Am I missing some obvious reason why double-blind reviews should only be attempted if the blinding can be achieved with a near-perfect success rate, or are you just setting the bar unreasonably high?
OP thinks you are looking at either signal or noise, instead of determining where the signal begins for yourself.
When we criticize without proposing a fix or alternative, we promote the implicit alternative of tearing something down without fixing it. This is often much worse than letting the imperfect thing stand. So here's a proposal: do what we do in software.
No, really: we have the same problem in software. Software developers under high pressure to move tickets will often resort to the minor fraud of converting unfinished features into bugs by marking them complete when they are not in fact complete. This is very similar to the minor fraud of an academic publishing an overstated / incorrect result to stay competitive with others doing the same. Often it's more efficient in both cases to just ignore the problem, which will generally self-correct with time. If not, we have to think about intervention -- but in software this story has played out a thousand times in a thousand organizations, so we know what intervention looks like.
Acceptance testing. That's the solution. Nobody likes it. Companies don't like to pay for the extra workers and developers don't like the added bureaucracy. But it works. Maybe it's time for some fraction of grant money to go to replication, and for replication to play a bigger role in gating the prestige indicators.
> This is very similar to the minor fraud of an academic publishing an overstated / incorrect result to stay competitive with others doing the same.
I completely disagree.
For one, academic standards of publishing are not at all the same as the standards for in-house software development. In academia, a published result is typically regarded as a finished product, even if the result is not exhaustive. You cannot push a fix to the paper later; an entirely new paper has to be written and accepted. And this is for good reason: the paper represents a time-stamp of progress in the field that others can build off of. In the sciences, projects can range from 6 months to years, so a literature polluted with half-baked results is a big impediment to planning and resource allocation.
A better comparison for academic publishing would be a major collaborative open source project like the Linux kernel. Any change has to be thoroughly justified and vetted before it is merged because mistakes cause other people problems and wasted time/effort. Do whatever you like with your own hobbyist project, but if you plan for it to be adopted and integrated into the wider software ecosystem, your code quality needs to be higher and you need to have your interfaces speced out. That's the analogy for academic publishing.
The problems in modern academic publishing are almost entirely caused by the perverse incentives of measuring academic status by publication record (number of publications and impact factor). Lowering publishing standards so academics can play this game better is solving the wrong problem. Standards should be even higher.
Seems to me that the review worked, you caught the plagiarism, even though the other two missed it. It's disturbing that somehow the paper author found your contact information though!
I think the complaint is that you only get told that you require a paid plan AFTER signing up. At least on a brief look on the Play Store page and your website, it does not immediately mention it prominently.
That seems like a very dark pattern and is, honestly, pretty scummy.
This is not a dark pattern, it's just a constraint that the app stores place on the pricing disclosure that is very non-intuitive. You have to mark your app as "free" to download if you charge a recurring subscription fee. You can only mark it as paid if there is a one time fee to download the app.
Our FAQ and pricing pages all list that it is a paid only app. All of our ads explain that it's subscription based. Anyone who asks we're very transparent about it. If there's somewhere else where you think we can list it to make it more clear I'm happy to add it, just not sure where that would be.
What FAQ and pricing pages? Your website makes no mention of pricing at all.
Edit: The "dark" pattern is in the registration flow. It doesn't mention that the app requires a subscription anywhere until after you've created an account. Surely you could add a disclaimer before creating your account? This has nothing to do with the App Store.
Edit 2: I'm not saying you intended to implement a dark pattern. Just perhaps a UX oversight.
Edit 3: The download page would be another great place to put this info, since that's the primary CTA on the home page (there's 4 prominent download buttons).
Looks like we used to have it in the description on the app store along with the FAQ but a team member made the decision to remove it because of complaints about it being inconsistent with the way Apple was localizing pricing to different currencies and regions.
We can't hit people with the paywall before they've registered because we need to assign the trial to their user record. We've tried adding more language during onboarding but no one reads any of it, they just click through.
You're mistaking challenges in building a global app for malicious intent. I left a job paying a lot more to do this because I wanted to help people.
We'll add something back to the FAQ on this, thank you all for pointing it out.
I don't see any mention of the price in the FAQ[0], which I had to guess the url of because it doesn't have a link anywhere on the homepage. Trying to guess the url of the pricing page doesn't yield any results.
With LLMs I am in this strange place where I read all of these amazing things that they supposedly all can do. And I have no doubt in my mind that they probably can. And then I ask ChatGPT a simple question such as:
Here is a logic puzzle that I need some help solving: Samantha is a girl and has two brothers and four sisters. Alex is a man and also one of Samantha's brothers. How many brothers and sisters does Alex have? Assume that Samantha and Alex share all siblings.
And I get back a very well written, multi-step response that leaves no doubt in anyones mind that:
To solve this logic puzzle:
Samantha has 2 brothers and 4 sisters.
This means there are 7 children in total (Samantha, her 2 brothers, and her 4 sisters).
Alex is one of Samantha's brothers. Since Samantha and Alex share all siblings, Alex has:
1 brother (the other brother besides himself).
4 sisters.
Final Answer:
Alex has 1 brother and 4 sisters.
Maybe it's like with Apple and I am using it wrong.
To get back to the "intern"-comparison. I could usually tell when an intern was struggling, there just were human telltale signs. When AI is wrong, it still presents its results with the confidence of someone who is extremely deep in the Dunning-Kruger hole but can still write like a year-long expect on the topic.
One of the tricks of using AI effectively is getting a good feel / mental model of it. And that mental model is in fact extremely weird, because you have this thing, which quacks like a human, has an immense (superhuman) amount of knowledge, and can read/write very quickly (tokens / s), but at the same time, logic wise, is often extremely dumb, and is happy to bullshit with full confidence.
What I've learned is that it's good primarily for tasks like the following:
- Tasks which take time to do, but are then easy to verify.
- Tasks which effectively boil down to translating something from one format to another. Which might e.g. be "read this technical document and implement it in code, as for style, look at these sample code files as a reference”.
- Tasks which are about exploring unknown unknowns. E.g. I write down a design, and then I ask the AI to roast it. The point is not that all the points it'll make are good and I need to please the AI, it's that out of 20 points it will list, 2-3 might both make sense, and haven't been thought of by myself.
Finally, AI requires good writing skills, and asking questions in an unbiased way, otherwise the AI will gladly hallucinate to reinforce your bias.
Logic exercises which are easy to verify are a moderately good fit for "reasoning models" which will go through many iterations of an LLM and basically write out the whole reasoning process. In practice though, this can be very expensive to get good results with.
Have you tried asking it to do something useful rather than ask it to solve gotcha word problems?
> I read all of these amazing things that they supposedly all can do.
You seem to be implying people are confused (or lying?) about the things they are able to get LLMs to do.
If you give it an honest effort to solve some real problems you are facing then you may be able to speak with more authority. Often it comes down to prompting skill. Try to read about different prompting approaches as that may help you.
In general, you need to be specific about what you need, and you need to give all relevant details. Like the post author said, treat it like a junior programmer or an intern.
> Have you tried asking it to do something useful rather than ask it to solve gotcha word problems?
What you call "gotcha word problem", I'd compare to typical math problems where you need to understand a text, extract the required information, solve the issue, and then present your results. Maybe this is a toy-example, but compared to reading the specs of some Microprocessors, this is rather easy. These AIs seem apparently be able to solve school or even college level math problems. Shouldn't my example be a walk in the park, then? Especially since it's a large LANGUAGE model?
> You seem to be implying people are confused (or lying?) about the things they are able to get LLMs to do.
I am merely stating observations and was hoping for an explanation. What good does it me if I accuse people of lying?
> Often it comes down to prompting skill. Try to read about different prompting approaches as that may help you.
"You are using it wrong" it is, then. So how do I differentiate between a good sounding but wrong answer, whether that came to be due to my apparently lack of prompting skills or else? They all sound equally well, it just starts "being wrong" at some point.
> In general, you need to be specific about what you need, and you need to give all relevant details.
What details should I have added in the given example? The prompt was probably more comprehensive and detailed than if this task was given in primary school.
> Like the post author said, treat it like a junior programmer or an intern.
I would, if it acted like a junior programmer or like an intern. For them, you can usually see if they are unsure or making things up (if they do these things). For an AI I've yet to see something like "hey, I might be wrong about this, but this is my best effort, maybe we can have a look together."
I just copy/pasted your exact logic puzzle into Claude-3.5-sonnet and it solved it right away. Here is the response:
Let me help you solve this step by step.
First, let's identify what we know about Samantha:
Samantha has 2 brothers
Samantha has 4 sisters
We also know that:
Alex is one of Samantha's brothers
Alex and Samantha share all siblings
Now, from Alex's perspective:
Alex is one of Samantha's brothers, so he has 1 other brother (since Samantha has 2 brothers total)
Alex has Samantha as a sister, plus her other 4 sisters
So Alex has 5 sisters total (Samantha + her 4 sisters)
And Alex has 1 brother (the other brother besides himself)
Therefore, Alex has:
1 brother
5 sisters
why not ask it to produce said program, and then evaluate it on that output, rather than proxy it via asking a logic puzzle?
It's like doing a coding interview when hiring an employee with these brain teaser puzzles, and when they fail you disqualify them, rather than asking them to do a real task that would be something they'd encounter on the job.
I have. I was feeling lazy one day and used it to write a small python script to graph some data.
The back and forth wasn’t fun and it flat out refused to use seaborn for some reason, but it worked and was fine overall.
I then used aider+claude to help me work with yjs. Led me down a rabbit hole based on an incorrect description of the yjs sync protocol. Took 2 days to untangle everything. Yjs is fairly new though, so I didn’t fault it too much.
I thin tried using it for work to deal with some surprisingly intricate back button logic. Again, incorrect understanding (on both our parts) of the underlying API caused a few days of headache. I would’ve been better off just reading the docs than trying to use an AI assistant.
Using AI actually frustrated me to the point where it convinced me to suck it up and just read the Specs and sources of the tools I’m using. I’ve been doing that for a few months now and just RTFM is better for me than AI assistants have been.
Been on this train for a while, but have had some success getting AI assistants to RTFM for me and tell me what it says, always making them to source where they are getting their info. It's only slightly better than just grepping I suspect. I am also extremely suspicious of people's claims of wild success. It's very easy to do some experimentation with it over a course of a few months and see where the pain points are, and it's exactly what you described - if I'm not a domain expert and I lean into these tools, I can definitely prototype stuff quickly than without, but when you run into these "misunderstandings" all the gains are tossed right out of the window, with the additional frustration on top of that (which limits the ability to troubleshoot). And if I am a domain expert - why would I need these tools at all? They're not likely, at least at this point, going to do much to accelerate you. In the one or two areas I consider myself an expert, they are positively a hindrance, at least in their current state.
How often do you need to solve a problem like that in practice? Try asking it something that actually comes up in your day-to-day and you might have a different experience.
I asked ChatGPT last year to give me the top five breakfast restaurants in Houston according to Reddit, sources included, no fake links. It gave me three or four real restaurants and one fake one. All (all!) of the links were fake.
I do this every time I'm in a different city. Most of the data that I use is from the last three years.
Just tried it and ChatGPT’s 4o model got it wrong like you said until I added the line “think step by step”, which is always a good idea for these kinds of logic puzzles.
ChatGPT o1 got the answer correct with no tweaks to the prompt.
The question you've got to ask yourself is what does that show? What would it mean to you if it got that right? Would you question your priors and deeply think about how to make use of LLMs despite their shortcomings, or are you just trolling and will just wait for the next Rs in strawberry question to come out for you to stochastically parrot?
I mean, an LLMs ability to solve a logic puzzle seems like a weird way to gauge how it could be USEFUL to you (even if it otherwise would be a good test of how 'intelligent' the LLM is).
Do you search Google or Reddit and wish you could just 'get the answer' instead of wading into pages/posts?
Do you compare two long documents together and not want to invest a few hours into a close reading of them?
Do you write code that consists of trivial functions or trivial text manipulation?
Do you want a 3-hour podcast summarized into a few bullet points for a particular audience?
Do you want to send a saucy limerick to your friend on their birthday?
Do you want to compare Kant's view on <topic> with <new_metaphysical_school_of_thought>?
Do you want to analyze 250k rows in an Excel file of user support tickets and summarize the top issues?
etc etc etc.
Totally fine if you don't do any of these things, but these are the things most people are using LLMs for.
> Maybe it's like with Apple and I am using it wrong.
Well, yeah. You are. It's built to answer questions people actually ask, not solve new logic puzzles.
Despite what the marketing says, it's not a perfect-infinite-knowledge oracle. You should think of it more as a really, really big database with all of the Internet's "knowledge". When you ask it "2 + 2 = ?", it isn't parsing those into numbers and math operations, it's searching its database for occurrences on the Internet where someone answered the question "2 + 2 = ?" and filling in the closest answer it found. If you ask it what "120938120938120931 + 1209389120381208390" is, it'll probably get it wrong, because no one has asked that before. But you should probably be using a calculator instead.
If you ask it something it hasn't seen before such as your logic puzzle, it's not going to parse it like a person would and synthesize an answer. It's going to try to find something similar to what it's seen before and return that. Odds are good this will be a wrong answer, since it's not addressing what you actually asked.
However, if you ask it something it has seen before, like a programming problem, it will return something appropriate. It turns out the Internet is pretty big, so it has seen a lot of stuff, and so often works pretty well. Hence the success you're seeing from others who are using it as-intended, i.e., asking it real questions, not logic puzzles.
It's not very hard to come up with a scenario that has never been put on the Internet, so it's pretty easy to make it dig up the "wrong" answer and do something stupid, as you've found.
The real trouble is that it can't tell you whether it's guessing, or found an actual match. Hence the "confidently wrong" thing, which absolutely destroys user trust. If it's confidently wrong about this thing I know a lot about, how can I trust it to be accurate for something I know little about?
Ok, maybe logic puzzles are "unfair", but there are other situations where you ask it a question that is similar (but different in an important way) to a problem it was trained on, and then it will provide the answer to what it "thinks" you asked, not what you actually asked. Hence, hallucinations.
Oh yeah it's got all sorts of problems. To be honest, I've personally never found it terribly useful. But, I find the "it can't answer this basic math/logic problem" criticism really dull. It's not built for that.
> Well, yeah. You are. It's built to answer questions people actually ask, not solve new logic puzzles.
Isn't the causality inverted here? It's trained on questions people have asked before, so that's what it's better at. New logic puzzles illustrate this flaw
Maybe it’s besides the point, but ChatGPT gives this now
> ”First, list out Samantha’s siblings explicitly:
• Brothers (2 total): Alex + one other brother
• Sisters (4 total): Samantha + three other sisters
Since Alex shares all siblings with Samantha, let’s see it from Alex’s perspective. Alex himself is one of the 2 brothers. Therefore, from Alex’s point of view:
• He has 1 other brother (the second brother besides himself).
• He has 4 sisters (including Samantha).
Thus, Alex has 1 brother and 4 sisters.”
This just moves the goalpost. I’m sure someone can give the next example where it fails. I find it useless as well, but at the same time it really feels like criticizing a talking dog about their lack of understanding.
> Why would I want a clock that is, on average, a half minute off?
Because in 99.9% of the cases I don't care about the seconds, it takes away space in the top status bar, and the constant changing of seconds in the top-left corner of the screen is distracting. And for the remaining 0.1% of cases, there is the clock app that shows seconds.
What benefit do you gain in daily life by having the time down to the second? The argument "so it's not half a minute off on average" seems a bit self-referential.
> What benefit do you gain in daily life by having the time down to the second?
I commute by public transport and am sometimes cutting it fine, so knowing whether it is hh:mm:05 or hh:mm:55 does make a difference in how much I have to hurry up sometimes.
Cool story bro. Let me tell the guys at work that we should just run .NET on our multi-core digital signal processors.
And we should also teach them not to get their delicate timing out of whack because they'll immediately stop dropping incoming samples if any of them gets interrupted by a debugger.
I'm sure I want it. Unfortunately my house is too close to the center of my backyard.
My neighbours are too close too. Any stray parts would cut into the proceeds of ticket sales to the bunker at the center where you can watch live streams of the parcels hitting on top of you.
It wouldn't. Small objects, like satellites, decelerate to terminal velocity. The impact of one of those would be the same as if they fell from an aeroplane.
But if you found biking way too exhausting, maybe living in a hilly area, riding an e-bike is ten times better than doing nothing. Would it be even better to ride a non-e-bike? Maybe. Would it happen? Probably not.