




ECO中文网 门户 优秀译文推荐 科技 查看内容

[2011.04.28] 科学中的科学

2011-5-5 08:27| 发布者: Somers| 查看: 10113| 评论: 12|原作者: 胖白兔

摘要: 如何通过网络理解观点演化


Apr 28th 2011 | from the print edition










from the print edition | Science and Technology

本文由译者 胖白兔 提供 点击此处阅读双语版







引用 BTnuts 2011-5-5 16:47
本帖最后由 BTnuts 于 2011-5-5 16:56 编辑

For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”.  比如“大爆炸”和“黑洞”常常一起出现,但是这两个词都更容易和“星系”这个词同时出现。  

This captures the intuition that the first three terms, but not the fourth, are part of a single topic.  这样的情况使我们产生前三个词,不包括第四个,是属于某个共同主题的认知。

Of course, much depends on how narrow you want a topic to be 当然,这很大程度上取决于你想使你的主题精确到什么程度。

引用 天各一方 2011-5-5 21:48
He starts with defining topics as sets of words that tend to crop up in the same document.
He starts with defining topics as sets of words that tend to crop up in the same document.
                ~ 妄加点评,旨在切磋,不当之处,望能海涵~
引用 aubreychen 2011-5-5 21:51
COMPUTER scientists have long tried to foist order on the explosion of data that is the internet.
这句话的that指代的是前面整个的COMPUTER scientists have long tried to foist order on the explosion of data
引用 astrolanguage 2011-5-7 13:17
回复 aubreychen 的帖子

如果按照你的理解的话,COMPUTER scientists have long tried to foist order on the explosion of data就是主语从句的主语,这时,那个that应该放在句首,成了That COMPUTER scientists have long tried to foist order on the explosion of data is the Internet.
引用 aubreychen 2011-5-7 13:45
回复 astrolanguage 的帖子

恩。我原来以为是that指代前面整个句子。重新看了一下。指的是explosion of data,这个explosion of data is the internet。不是说internet 上的数据。而是,这些数据的explosion本身is the internet。是从句没错,但是不是data的定语从句,从主体来说是exposion的从句。
引用 astrolanguage 2011-5-7 14:09
回复 aubreychen 的帖子

引用 胖白兔 2011-5-7 15:16
引用 胖白兔 2011-5-7 15:32
本帖最后由 胖白兔 于 2011-5-7 15:33 编辑

回复 BTnuts 的帖子

For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”.    比如“大爆炸”和“黑洞”常常一起出现,但是这两个词都更容易和“星系”这个词同时出现。  
偶认为:not as often as是“不经常”的,是否定含义;而each指代的是“Big Bang” 或 “black hole”

This captures the intuition that the first three terms, but not the fourth, are part of a single topic.  这样的情况使我们产生前三个词,不包括第四个,是属于某个共同主题的认知。
偶认为:captures the intuition是一个整体:直觉认知。显然are part of a single topic是修饰captures的,are是复数形式。This做为单数指代这种捕捉行为。

Of course, much depends on how narrow you want a topic to be 当然,这很大程度上取决于你想使你的主题精确到什么程度。
这里how narrow偶漏了翻译,翻译成“精确”比较妥当。
引用 BTnuts 2011-5-7 16:29
For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”.    比如“大爆炸”和“黑洞”常常一起出现,但是这两个词都更容易和“星系”这个词同时出现。  
偶认为:not as often as是“不经常”的,是否定含义;而each指代的是“Big Bang” 或 “black hole”
对啊,not as often as是“不如……更经常”,所以这两个词才更容易和galaxy这个词一起出现啊。
直译:比如“大爆炸”和“黑洞”常常一起出现,但是不如这两个词分别与“星系”这个词一起出现更为频繁。这样译有点绕。所以改成“这两个词都更容易和“星系”这个词同时出现。 ”

This captures the intuition that the first three terms, but not the fourth, are part of a single topic.  这样的情况使我们产生前三个词,不包括第四个,是属于某个共同主题的认知。
偶认为:captures the intuition是一个整体:直觉认知。显然are part of a single topic是修饰captures的,are是复数形式。This做为单数指代这种捕捉行为。
captures the intuition是一个整体,后面that引导同位语从句解释说明这个intuition是什么。are的主语是 first three terms.this指代前面 For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”. Neither, however, would be expected to pop up next to “genome”.这种现象。

引用 Jackyang 2011-5-11 21:54
He starts with defining topics as sets of words that tend to crop up in the same document. For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”. Neither, however, would be expected to pop up next to “genome”. This captures the intuition that the first three terms, but not the fourth, are part of a single topic. Of course, much depends on how narrow you want a topic to be. But Dr Blei’s model, which he developed with John Lafferty, of Carnegie Mellon University, allows for that.

引用 飞龙在天 2011-5-14 19:55
black hole” often will co-occur, but not as often as each does with “galaxy”

大家讨论的这么激烈:其实是语法若得祸:not as often as 是as often as not的变通,意思是:常常
引用 join_soon 2011-5-17 00:07
回复 胖白兔 的帖子

extremely interesting article, everybody should read it!!

Organising the web网络组织化
The science of science科学中的科学
How to use the web to understand the way ideas evolve 如何通过网络理解观点演化

Apr 28th 2011 | from the print edition

COMPUTER scientists have long tried to foist order on the explosion of data that is the internet. One obvious way is to group information by topic, but tagging it all comprehensively by hand is impossible. David Blei, of Princeton University, has therefore been trying to teach machines to do the job.

He starts with defining topics as sets of words that tend to crop up in the same document. For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”. Neither, however, would be expected to pop up next to “genome”. This captures the intuition that the first three terms, but not the fourth, are part of a single topic. Of course, much depends on how narrow you want a topic to be. But Dr Blei’s model, which he developed with John Lafferty, of Carnegie Mellon University, allows for that.

The user decides how fine-grained he wants the analysis to be by picking the number of topics. The computer then creates a virtual bin for each topic and begins to read the documents to be analysed. After removing common words that it finds evenly spread through the original documents, it assigns each of the remaining ones, at random, to a bin. The computer then selects pairs of words in a bin to see if they co-occur more often than they would by chance in the original documents. If so, the association is preserved. If not, the words (together with others to which they have already been tied) are dropped at random into another bin. Repeat this process and networks of linked words will emerge. Repeat it enough and each network will correspond with a single bin.

And it works. When Dr Blei and Dr Lafferty asked their software to find 50 topics in papers published in Science between 1980 and 2002, the words it threw up as belonging together were instantly recognisable as being related. One topic included “orbit”, “dust”, “Jupiter”, “line”, “system”, “solar”, “gas”, “atmospheric”, “Mars” and “field”. Another contained “computer”, “methods”, “number”, “two”, “principle”, “design”, “access” and “processing”.

All of which is interesting as a way of dealing with information overload, and tagging papers so that they can be searched in a more useful way. But Dr Blei found himself wondering if his method could yield any truly novel insights into the scientific method. And he thinks it can. In tandem with Sean Gerrish, a doctoral student at Princeton, he has now produced a version that not only peruses text for topics, but also tracks how these topics evolve, by looking at how the patterns in each topic bin change from year to year.
作为一种处理过量信息的方式,所有这些操作都十分有趣。通过对论文的标识,就能将论文用更有用的方式检索出来。但布莱博士一直都很想知道,他的方法是否能产生一些【对于科学方法的】真正新颖的深刻见解 /并融入科学方法中/。最后他认为这完全可行。他的合作者、普林斯顿的博士生肖恩·格里什开发出的某版本的软件【了一个新版本】,不仅可以通过精读文本获得主题,而且可以通过观察每个主题存储文件中模式年复一年的变化,追踪这些主题的演化过程。

The new version is able to trace a topic over time. For example, a 1903 paper with the evocative title “The Brain of Professor Laborde” was correctly assigned to the same topic bin as “Reshaping the Cortical Motor Map by Unmasking Latent Intracortical Connections”, published in 1991. This allows important shifts in terminology to be tracked down to their origins, which offers a way to identify truly ground-breaking work—the sort of stuff that introduces new concepts, or mixes old ones in novel and useful ways that are picked up and replicated in subsequent texts. So a paper’s impact can be determined by looking at how big a shift it creates in the structure of the relevant topic.

In effect, Dr Blei and Mr Gerrish have devised an alternative to the citation indices beloved of scientific publishers. These reflect how often a particular publication or author is cited as a source by others. High scores are treated as a proxy for high impact. But a proxy is all they are.

Dr Blei and Mr Gerrish are not claiming their method is necessarily a better proxy. But it can cast its net more widely, depending on the set of documents fed into it at the beginning. Citation indices, which work only where publications refer to their sources explicitly, form a tiny nebula in the digital universe. News articles, blog posts and e-mails often lack a systematic reference list that could be used to make a citation index. Yet they, too, are part of what makes an idea influential.

Besides, despite academia’s pretensions to objectivity, it is as subject to political considerations as any area of human endeavour. Many authors cite colleagues, bosses and mentors out of courtesy or supplication rather than because such citations are strictly required. More rarely, an author may undercite. Albert Einstein’s original paper on special relativity, for example, had no references at all, even though it drew heavily on previous work. The upshot is that the Blei-Gerrish method may get closer to the real ebb and flow of scientific ideas and thus, in its way, offer a more scientific approach to science.


QQ|小黑屋|手机版|网站地图|关于我们|ECO中文网 ( 京ICP备06039041号  

GMT+8, 2025-1-7 08:27 , Processed in 0.068421 second(s), 27 queries .

Powered by Discuz! X3.3

© 2001-2017 Comsenz Inc.
