Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> How many human beings do you personally know who were able to solve a dynamic programming problem at first sight without ever having seen anything but greedy algorithms?

Zero, which is why if a trained network could do it, that would be "impressive" to me, given my personal biases.

>. If you could get a machine that takes in all of Github and can solve "any" DP problem you describe in natural language with a couple of examples, that is AI above and beyond what many humans can do, which is "awesome" no matter how you put it.

I agree with you that such a machine would be awesome, and AlphaCode is certainly a great step closer towards that ideal. However, I would like to have a number measures the "awesomeness" of the machine (not elo rating because that depends on a human reference), so I will have something as a benchmark to refer to when the next improvement arrives.



I understand wanting to look at different metrics to gauge progress, but what is the issue with this?

> not elo rating because that depends on a human reference


The Turing Test (https://en.wikipedia.org/wiki/Turing_test) for artificial intelligence required the machine to convince a human questioner that it was a human. Since then, most AI methods rely on a human reference of performance to showcase their prowess. I don't find this appealing because:

1) It's an imprecise target: believers can always hype and skeptics can always downplay improvements. Humans can do lots of different things somewhat well at the same time, so a machine beating human-level performance in one field (like identifying digits) says little about other fields (like identifying code vulnerabilities).

2) ELO ratings, or similar metrics are measurements of skill, and can be brute-forced to some extent, equivalent to grinding up levels in a video game. Brute-forcing a solution is "bad", but how do we know a new method is "better/more elegant/more efficient"? For algorithms we have Big-O notation, so we know (brute force < bubble sort < quick sort), perhaps there is an analogue for machine learning.

I would like performance comparisons that focus on quantities unique to machines. I don't compare the addition of computer processors with reference to human addition, so why not treat machine intelligence similarly?

There are many interesting quantities with which we can compare ML models. Energy usage is a popular metric, but we can also compare the structure of the network, the code used, the hardware, the amount of training data, the amount of training time, and the similarity between training and test data. I think a combination of these would be useful to look at every time a new model arrives.


Using my previous chess analogy, the world's smartest chess bot has played a million games to beat the average grandmaster, who has played less than 10,000 games in her lifetime. So while they both will have the same elo rating, which is a measure of how well they are at the narrow domain of chess, there is clearly something superior about the how the human grandmaster learns from just a few data points i.e. strong generalization vs the AI's weak generalization. Hence the task-specific elo rating does not give enough context to understand how well a model adapts to uncertainty. For instance - a Roomba would beat a human hands down if there was an elo rating for vacuuming floors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: