<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Thinking Machine</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/" />
    <link rel="self" type="application/atom+xml" href="http://www.perkowitz.net/blog/tech/atom.xml" />
   <id>tag:www.perkowitz.net,2006:/blog/tech//4</id>
    <link rel="service.post" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4" title="Thinking Machine" />
    <updated>2006-06-06T06:55:45Z</updated>
    <subtitle>computers, people, and interaction</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.2</generator>
 
<entry>
    <title>Registration and Unintended Consequences</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2006/04/registration_and_unintended_co.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2209" title="Registration and Unintended Consequences" />
    <id>tag:www.perkowitz.net,2006:/blog/t//4.2209</id>
    
    <published>2006-04-21T01:30:20Z</published>
    <updated>2006-06-06T06:55:45Z</updated>
    
    <summary>Usually the law of unintended consequences decrees that things will be much worse than expected, but every now and then there&apos;s a pleasant surprise. Story from Topix via Greg Linden. I won&apos;t rehash the story in too much detail, but...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Community" />
            <category term="Reputation" />
            <category term="Spam" />
            <category term="Web" />
            <category term="Web Stuff" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>Usually the law of unintended consequences decrees that things will be much worse than expected, but every now and then there's a pleasant surprise. Story from <a href="http://blog.topix.net/archives/000106.html">Topix</a> via <a href="http://glinden.blogspot.com/2006/03/removing-registration-and-topixnet.html">Greg Linden</a>. I won't rehash the story in too much detail, but the upshot is that Topix got rid of their registration requirement on their user forums and not only did participation increase dramatically (as expected), spam rate actually decreased (pleasant surprise). </p>

<p>The thing is, requiring registration has some unintended consequences of its own. In particular, lots of people won't do it. And, on the other hand, spammers, trolls, and immature jerks are happy to do it. So (proportionately at least), you end up with more crappy posts. These observations are from the <a href="http://wakaba.c3.cx/shii/shiichan">"2ch principles"</a>; 2ch is an anything-goes web forum in Japan, and people have modeled their forums on 2ch's, having observed the benefits of open posting.</p>

<p>Another one of the mentioned principles is "anonymity counters vanity". The idea is that registered users will become cliqueish, protecting their turf and jockeying over pride and identity. An open, more anonymous system essentially gives less reward to pride of identity, and posts are more topic-focused. Though this seems to counter another bit of web conventional wisdom, which is that reputation is an incentive for good behavior. The 2ch observation is that, at least, reputation is a mixed blessing; people will behave better when they want to protect their reputation, but they will also spend more time maintaining, promoting, and jockeying over that reputation. The cost/benefit tradeoff is probably different for different uses. On eBay, reputation is needed to provide for some trust when making costly exchanges; on a message forum, perhaps it's just a distraction.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Blog Spamming Gets Worse</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2005/10/blog_spamming_gets_worse.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2210" title="Blog Spamming Gets Worse" />
    <id>tag:www.perkowitz.net,2005:/blog/t//4.2210</id>
    
    <published>2005-10-26T06:34:59Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>CNET reports on a sudden flurry of blog spam activity, apparently due to one clever and obnoxious spammer. It&apos;s not email spam, comment spam, or trackback spam, but thousands and thousands of blogs created on Blogspot with links to the...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Blogging" />
            <category term="Web Stuff" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>CNET reports on a <a href="http://news.com.com/Tempted+by+blogs,+spam+becomes+splog/2100-1032_3-5903409.html?tag=nl.caro">sudden flurry of blog spam activity</a>, apparently due to one clever and obnoxious spammer. It's not email spam, comment spam, or trackback spam, but thousands and thousands of blogs created on Blogspot with links to the spammer's sites embedded in text snagged from popular blogs. </p>

<p>It's pretty annoying, not to mention depressing, what with the apparently eternal arms race we're locked into here. But the most surprising thing in the article to me was that Google doesn't seem to do any serious account verification on its Blogspot service (or didn't until last week). It's not like <a href="http://en.wikipedia.org/wiki/CAPTCHA">captchas</a> are top secret advanced technology -- pretty much everyone uses them now. Where was Google?</p>

<p>The other thing I don't quite get is: why did all these crap blogs create so much trouble? I mean, it's the nature of the web that there's all kinds of crap out there -- some spammer adding more crap sites shouldn't make it appreciably worse. This isn't like spam mail or comment spam, where someone is shoving their message into your inbox. The problem here seems to have come from the fact that the spammer cleverly made all his fake blogs highly appealing to search engines. They started to appear in people's search results and RSS feeds (why? do people have RSS feeds of open searches?), and that caused the problem. </p>

<p>So, why? You know, if people had been doing that search on Google, I don't think they would have gotten all those crap results, because Google takes into account reputation (in terms of incoming links) in its results rankings -- crap spam sites that are only linked to by other crap spam sites shouldn't get a reputation boost. So to Technorati and PubSub and so on: do what Google does. Not that Google is infallible (see above about captcha), but these blog search engines are talking about blocking Blogspot from their results. Which will work precisely as long as spammers don't crack other blog hosting sites. Anyway, this is probably going to get worse before it gets better.</p>]]>
        
    </content>
</entry>
<entry>
    <title>KDD2004: Data Mining and Spam</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/08/kdd2004_data_mining_and_spam.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2211" title="KDD2004: Data Mining and Spam" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2211</id>
    
    <published>2004-08-23T23:11:52Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>Pedro Domingos (from UW) presented a paper (co-authored with a number of UWers) about data mining in the presence of an adversary who is deliberately trying to deceive the data miner. This was the big hit of the conference so...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>Pedro Domingos (from <a href="http://www.cs.washington.edu/">UW</a>) presented a paper (co-authored with a number of UWers) about data mining in the presence of an adversary who is deliberately trying to deceive the data miner. This was the big hit of the conference so far. He made the point that this happens in many cases -- spam detection, intrusion detection, counterterrorism, etc -- where there is an adversary who can alter the data to prevent the data miner detecting what he seeks to detect. He argued that this problem has not been addressed before in the data mining field but is interesting and important.<br />
</p>]]>
        <![CDATA[<p>The basic contribution of the paper was to formalize this situation as a game, where the miner and the adversary alternate turns -- the miner creates a system to extract data (e.g. a spam detector), and then the adversary alters the data specifically to foil the detector (e.g. the way spammers act to foil anti-spam filters). it's an adversarial game where each side is trying to maximize his own utility function. unfortunately, the problem is incredibly intractable (doubly exponential) because the space of possible detection algorithms and adversarial responses is exponential. instead they defined an iterated game and presented an algorithm that adapts a naive bayes classifier to account for the way the spammer adversary tries to get around the classifier. their approach makes many assumptions but is an interesting first step.</p>

<p>response to the talk was strong. the room was standing-room-only, and the questions went on and on until the session was broken up. people were still discussing the problem during the next break; it's a fertile idea. identifying and formalizing the problem is a great step; i expect more work will follow in this area.<br />
</p>]]>
    </content>
</entry>
<entry>
    <title>Learning Models of Human Behavior</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/07/learning_models_of_human_behav.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2212" title="Learning Models of Human Behavior" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2212</id>
    
    <published>2004-07-16T22:57:42Z</published>
    <updated>2006-06-06T06:55:44Z</updated>
    
    <summary>Wouldn&apos;t it be great if someone could develop a way to mine the web to figure out which activities are typically used in a given activity? Maybe one way that they could do it would be to look at how...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Learning/Data Mining" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>Wouldn't it be great if <a href="http://seattleweb.intel-research.net/projects/activity/">someone</a> could develop a way to mine the web to figure out which activities are typically used in a given activity?  Maybe one way that they could do it would be to look at how often an object term shows up in a Google query when paired with the related activity.  If you compared that to how often the object showed up in general then maybe you could get a probability of the object's use when performing that activity.</p>

<p>Of course this would completely fail if there were web-pages that mentioned activities in the same breath as objects which were completely unrelated.  For example, if there were a web-page somewhere that suggested "I like to eat tea-bags when I use the toilet", or "I frequently find that my television viewing is enhanced by sleeping with a vacuum", or "last night I changed a baby's diaper with a wooden spoon and a jar of peanut butter strapped to my dog"  Fortunately for such a hypothetical research project though, there aren't any web-pages that say things like that....</p>]]>
        
    </content>
</entry>
<entry>
    <title>What is Programming by Demonstration?</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/07/what_is_programming_by_demonst.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2213" title="What is Programming by Demonstration?" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2213</id>
    
    <published>2004-07-16T22:49:32Z</published>
    <updated>2006-06-06T06:55:44Z</updated>
    
    <summary>In the AI field people generally know what you mean by the term &quot;Programming by Demonstration.&quot; When you pin them down on the definition it seems to settle on: A computer learning a macro from what you are doing repetitively...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Artificial Intelligence" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>In the AI field people generally know what you mean by the term "Programming by Demonstration."  When you pin them down on the definition it seems to settle on: A computer learning a macro from what you are doing repetitively in a text editor.</p>

<p>But at a higher-level, more general level, what is it?  All a computer can do is execute a program, so any machine learning is "programming" a computer.  And all learning is learning from demonstrations of things.  So from at least a linguistic standpoint, the phrase Programming by Demonstration seems pretty meaningless.  Judging from the body of work that calls itself PBD though I would say that it has the following qualities:</p>

<p> 1) "It" learns from a very small number of examples (like maybe 1).<br />
 2) "It" learns a procedural language from the examples.<br />
 3) "It" operates in a discrete environment without non-determinism.</p>

<p>Discuss amongst yourselves...</p>]]>
        
    </content>
</entry>
<entry>
    <title>Similar words, categories, and ontologies</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/06/similar_words_categories_and_o.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2214" title="Similar words, categories, and ontologies" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2214</id>
    
    <published>2004-06-03T10:01:17Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>An interesting post on grammar in Agoraphilia raises some interesting questions about categories and ontologies. Perhaps it&apos;s just because we&apos;ve been thinking a lot about ontologies at Intel Research lately, with a view to classifying everyday objects that people use...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Cognitive Science" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p><a href="http://agoraphilia.blogspot.com/2004_05_23_agoraphilia_archive.html#108554563919409999">An interesting post on grammar</a> in Agoraphilia raises some interesting questions about categories and ontologies. Perhaps it's just because we've been thinking a lot about ontologies at Intel Research lately, with a view to classifying everyday objects that people use in their daily activities. In his post, he also references <a href="http://www.sis.pitt.edu/~mbsclass/hall_of_fame/rosch.htm">Eleanor Rosch</a>, a pioneering cognitive scientist who has thought a lot about categories. All of which reminds me that beyond the practical questions of using ontologies in applications, there are all these interesting issues to philosophize about.</p>]]>
        
    </content>
</entry>
<entry>
    <title>WWW2004 thoughts</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/www2004_thoughts.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2215" title="WWW2004 thoughts" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2215</id>
    
    <published>2004-05-27T01:53:52Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>WWW is always an interesting conference. The range of relevant topics is quite wide, from cache-and-network type stuff for optimizing performance to speculative artificial-intelligence-type ideas, to sociological analysis and theory of what people actually do on the web. And so,...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Web Stuff" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>WWW is always an interesting conference. The range of relevant topics is quite wide, from cache-and-network type stuff for optimizing performance to speculative artificial-intelligence-type ideas, to sociological analysis and theory of what people actually do on the web. And so, going from one poster to another, or slipping out of your usual track into some other talk can be surprising, with the sometimes benefit of jolting you into a new idea. </p>

<p>The other interesting thing about WWW is that it does represent, in some ways, much of the brainpower at the center of web developments. Many of the people involved in standards and so on are there, and many of today's papers will be tomorrow's hot new ideas. On the other hand, so much of what happens on the web and affects it for regular users makes no appearance among the pointy-headed types at all. Some of it is just secretive (e.g., Google is well-represented at these things, but they never talk about what they're doing) and some of it just pays no attention to research papers (e.g., most ecommerce, publishing, and daily stuff that people use).</p>

<p>The tension here appears all the time, for example in the contrast between all the cool research ideas people have for search and data extraction, and what people actually do every day. Or the contrast between what WWWers hope to do with the semantic web and the reality of how much attention span most people have for such complexity. Or in the way that search engine response time and ease-of-use has basically eclipsed many clever ideas that would be too costly to add. </p>

<p>If pressed, I'd say this kind of contrast appears in many areas of computer science, the tension between what researchers can think of and what people actually will/do use. But it's all much more obvious at WWW, perhaps just because it's so widely used and develops so quickly and seems more like a force of nature (or, at least, an organic entity like a city or a nation) than a human-designed artifact. A lot of the time, we are just trying to keep up with its relentless development.<br />
</p>]]>
        
    </content>
</entry>
<entry>
    <title>memex</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/memex.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2216" title="memex" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2216</id>
    
    <published>2004-05-26T22:28:58Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>memex</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>"Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory."</p>

<p>-- Vannevar Bush, 1945</p>

<p>(quoted by Udi Manber at www 2004, and by many others)<br />
</p>]]>
        
    </content>
</entry>
<entry>
    <title>www2004: themes</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/www2004_themes.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2217" title="www2004: themes" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2217</id>
    
    <published>2004-05-26T21:22:52Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>themes at www2004: semantic web, learning/information extraction, search....</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Web Stuff" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>themes at www2004: semantic web, learning/information extraction, search. <br />
</p>]]>
        <![CDATA[<p>One big theme at www2004 was the semantic web. As I've indicated earlier, I'm skeptical, or at least unsure about how it might develop and be taken up by people other than wonky researchers. But it was definitely pervasive at the conference, appearing prominently in Tim Berners-Lee's keynotes and essentially representing a paper track throughout. Topics ranged from practical application-focused stuff like semantic email, semantic browsers, semantic search, etc. to more esoteric papers about languages, specifications, and theories. I didn't catch enough of this stuff to really know what the state of the art is, but it's possible that tools will begin to filter out to the general public in a significant way. I do buy that we'll see certain communities using it, e.g. scientific communities that are sharing lots of data.</p>

<p>Another theme was various approaches to learning, classification, and information extraction. There was perhaps more focus on unsupervised techniques than in the past (which I find encouraging; my bias in many real-world applications is against any requirement of human labeling). Approaches to classifying web pages included comparing to a pre-specified topic hierarchy (e.g. from Yahoo!), analyzing page structure (i.e. for the visual/structure cues people use to find headlines and information), and breaking a page into component "blocks" and determining the importance of each block. Approaches to extracting information from the web included various ways of (shallowly) analyzing sentence structure to determine facts, such as looking for indicative phrases, learning probabilistic phrase structures, and figuring out entities and types of facts about them.</p>

<p>Search was of course on everybody's mind. Aside from scoring mentions in the invited talks, papers explored everything from link graph analysis to combining full-text search with database-type search to other stuff I didn't get to see. Of course, the impending (or at least much-hyped) Google-Microsoft-Yahoo battle over search was a constant subtext.</p>

<p>Of course, WWW is a widely diverse conference; I focused on a few particular topics, and I'm sure other people would report something totally different as the themes of the conference.</p>]]>
    </content>
</entry>
<entry>
    <title>Smart glasses detect eye contact</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/smart_glasses_detect_eye_conta.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2218" title="Smart glasses detect eye contact" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2218</id>
    
    <published>2004-05-25T00:14:11Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>New Scientist: Smart glasses detect eye contact Now this is an interesting idea. Though even aside from the laughably ugly glasses, it only works within a meter -- and if you can&apos;t tell if someone is making eye contact with...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Gadgets" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p><a href="http://www.newscientist.com/news/news.jsp?id=ns99995015">New Scientist: Smart glasses detect eye contact</a></p>

<p>Now this is an interesting idea. Though even aside from the laughably ugly glasses, it only works within a meter -- and if you can't tell if someone is making eye contact with you from a meter away, you need more help than smart glasses can provide. </p>]]>
        
    </content>
</entry>
<entry>
    <title>www2004 talk</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/www2004_talk.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2219" title="www2004 talk" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2219</id>
    
    <published>2004-05-24T21:38:17Z</published>
    <updated>2006-06-06T06:55:44Z</updated>
    
    <summary>My research group at Intel had a paper titled Mining Models of Human Activities from the Web in WWW 2004. I gave the talk at the conference on Friday. Here is the powerpoint....</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Learning/Data Mining" />
            <category term="Web Stuff" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>My <a href="http://seattleweb.intel-research.net/projects/guide/">research group</a> at Intel</a> had a paper titled <a href="http://seattleweb.intel-research.net/projects/guide/pubs/papers/www04_guide.pdf">Mining Models of Human Activities from the Web</a> in <a href="http://www2004.org">WWW 2004</a>. I gave the talk at the conference on Friday. Here is the <a href="http://seattleweb.intel-research.net/projects/guide/pubs/talks/www2004.ppt">powerpoint</a>. <br />
</p>]]>
        
    </content>
</entry>
<entry>
    <title>www2004: Udi Manber talk</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/www2004_udi_manber_talk.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2220" title="www2004: Udi Manber talk" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2220</id>
    
    <published>2004-05-21T21:38:23Z</published>
    <updated>2006-06-06T06:55:44Z</updated>
    
    <summary>Udi Manber, formerly of Yahoo and academia, now running Amazon&apos;s search engine offshoot a9. His topic was &quot;Customer-centric innovations in search and e-commerce&quot;. He made some observations about search in general, talked about some Amazon and a9 projects, and ended...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Web Stuff" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>Udi Manber, formerly of Yahoo and academia, now running Amazon's search engine offshoot <a href="http://www.a9.com/">a9</a>. His topic was "Customer-centric innovations in search and e-commerce". He made some observations about search in general, talked about some Amazon and a9 projects, and ended with some what-ifs.</p>]]>
        <![CDATA[<p>Some observations about search:<br />
- Ease of use. It's very important, but the fact that most users do simple searches of a few keywords is a barrier to more advanced techniques. e.g., maybe the engines could offer a lot more, but only if people are willing to make complex query specifications<br />
- Relevancy. Hard to measure, changes all the time, not well-understood, highly person- and context-dependent. <br />
- Anecdotes lead you astray. Often search engine designers will be motivated by anecdotes about things that don't work, forgetting what the great mass of people are trying to do.<br />
- Quality. Response speed and scaling to web-size are (mostly) solved.</p>

<p>Amazon's "search inside the book"<br />
- What Udi and some of the a9 team did before leaving Amazon for a9<br />
- 120k books, 33M pages<br />
- Books scanned (bindings just cut off) and OCRed<br />
- Took six months to build</p>

<p>Personalization at a9<br />
- Personal search history<br />
- Search results mark what's new since last search, what you clicked before<br />
- Diary</p>

<p>What If<br />
- Ask "what if", ignoring some current problems, and then perhaps figure out how to get there<br />
- What if we had an hour from every user? What would we tell them, what would we design for them?<br />
- What if everyone became an author? All the way from 1 bit (I like it/I don't on a review) to a book. How would we publish and consume all that?<br />
- What if everyone were trustworthy, cooperative, and working in the community interest? How would we harness everyone's contributions?<br />
- What if all media were accessible from one place (though distributed storage)? Assume copyright/compensation solved. How search, organize, and access it all? What tools would we need?</p>]]>
    </content>
</entry>
<entry>
    <title>www2004: finding new news</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/www2004_finding_new_news.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2221" title="www2004: finding new news" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2221</id>
    
    <published>2004-05-21T20:32:25Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>[...regarding a paper on figuring out how novel articles are in comparison to previously seen articles, for the purpose of presenting users with the maximally new information...] Don Patterson: I don&apos;t think this is a real problem. Mike Perkowitz: whys...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Web Stuff" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>[...regarding a paper on figuring out how novel articles are in comparison to previously seen articles, for the purpose of presenting users with the maximally new information...]</p>

<p><b>Don Patterson</b>: I don't think this is a real problem.<br />
<b>Mike Perkowitz</b>: whys that<br />
<b>Don Patterson</b>: Because news is almost all the same.<br />
<b>Mike Perkowitz</b>: true<br />
<b>Don Patterson</b>: Tons of places just repeat what others say.<br />
<b>Mike Perkowitz</b>: but what about a story breaking over time and you want to catch the scoop<br />
</p>]]>
        <![CDATA[<p><b>Don Patterson</b>: Why can't you just moniitor an AP feed?<br />
<b>Mike Perkowitz</b>: though i guess as soon as cnn has it, everyone else will follow<br />
<b>Don Patterson</b>: Exactly<br />
<b>Mike Perkowitz</b>: well<br />
<b>Mike Perkowitz</b>: think about reading blogs of people's opinions on some story<br />
<b>Mike Perkowitz</b>: perhaps this would help you get a range of opinions<br />
<b>Don Patterson</b>: But can this system identify novelty in opinion?<br />
<b>Mike Perkowitz</b>: not sure<br />
<b>Mike Perkowitz</b>: also, is named entity tagging a solved problem???<br />
<b>Don Patterson</b>: It can't possibly be.<br />
<b>Mike Perkowitz</b>: right<br />
<b>Don Patterson</b>: Especially at the word level.<br />
<b>Don Patterson</b>: How do you know John Doe is an entity?<br />
<b>Don Patterson</b>: And not a john and a female deer?<br />
<b>Mike Perkowitz</b>: maybe they jst look for the cap letters<br />
<b>Don Patterson</b>: That's probably likely.<br />
<b>Don Patterson</b>: A bunch of hueristics to manage the POLs<br />
<b>Mike Perkowitz</b>: what does POL stand for<br />
<b>Don Patterson</b>: People, Objects and Locations.<br />
<b>Mike Perkowitz</b>: ok<br />
<b>Don Patterson</b>: This just begs the question of news anyway.<br />
<b>Don Patterson</b>: My solution to news overload was to unsubscribe from all newspapers.<br />
<b>Don Patterson</b>: To not watch CNN...<br />
<b>Mike Perkowitz</b>: sure<br />
<b>Don Patterson</b>: and to subscribe to a two-week news magazine<br />
<b>Don Patterson</b>: :<br />
<b>Mike Perkowitz</b>: nice tasteless animation of the pizza delivery guy being blown up<br />
<b>Don Patterson</b>: US News and World Report<br />
<b>Don Patterson</b>: Ha!<br />
<b>Don Patterson</b>: Wasn't that black girl in a MS ad?<br />
<b>Don Patterson</b>: Great momemts at work<br />
<b>Don Patterson</b>: But the news magazine has *editors*!<br />
<b>Mike Perkowitz</b>: not sure<br />
<b>Mike Perkowitz</b>: well some people want to disintermediate the editors<br />
<b>Don Patterson</b>: So what I have done is delegated my news analysis and summarization to professionals who seem to do a good job.<br />
<b>Mike Perkowitz</b>: yeah<br />
<b>Don Patterson</b>: Well, so I'll extreme...<br />
<b>Don Patterson</b>: I also watch all the news feeds from BBC to keep track of buzz<br />
<b>Don Patterson</b>: But I use the editors to summarize and ensure coverage of the important facts and stories.<br />
<b>Don Patterson</b>: It's the same as stock trading.<br />
<b>Don Patterson</b>: You are not going to do a better job than a team of dedicated professionals working 80 hours a week with a dial-up connection and 2 hours on the weekend.<br />
<b>Mike Perkowitz</b>: it may depend on whether you can find a magazine that shares your interests/viewpoint<br />
<b>Don Patterson</b>: True. But there is a magazine for everyone view and topic.<br />
<b>Don Patterson</b>: every view.<br />
<b>Don Patterson</b>: This just screams MS cash cow.<br />
<b>Mike Perkowitz</b>: how do they make money off it<br />
<b>Mike Perkowitz</b>: i guess<br />
<b>Don Patterson</b>: It makes MSN Newsbot better - it's a premium service for newsbot...<br />
<b>Mike Perkowitz</b>: oh they have a paid news summary service?<br />
<b>Mike Perkowitz</b>: or you mean they could?<br />
<b>Don Patterson</b>: I've never checked, but he mentioned it at the begining.<br />
<b>Don Patterson</b>: Google has one, MS must be interested.<br />
<b>Mike Perkowitz</b>: i didn't know there was a pay version of googlenews<br />
<b>Don Patterson</b>: sorry, there isn't.<br />
<b>Don Patterson</b>: I was suggesting that MS could justify a paid news service with a feature like this.<br />
<b>Mike Perkowitz</b>: for busy professionals<br />
<b>Don Patterson</b>: you've got it!<br />
<b>Mike Perkowitz</b>: i doubt it would work enough that anyone would pay for it<br />
<b>Don Patterson</b>: One person would pay for it.<br />
<b>Don Patterson</b>: and then email it to everyone.<br />
<b>Mike Perkowitz</b>: ha<br />
<b>Mike Perkowitz</b>: instead of this, give me a news reader that looks at all the foods and combines all the novel info into one summary<br />
<b>Mike Perkowitz</b>: FEEDs<br />
<b>Don Patterson</b>: CNN has a breaking news service which is generated by editors.<br />
<b>Mike Perkowitz</b>: it seems like it would be hard to beat CNN<br />
<b>Don Patterson</b>: They will email you the garbage they think is interesting.<br />
<b>Mike Perkowitz</b>: at least for straight news<br />
<b>Don Patterson</b>: The other advantage of US News is that there is a mean of one week of reflection that the editors can apply before saying what is really important.<br />
</p>]]>
    </content>
</entry>
<entry>
    <title>www2004: Rick Rashid talk</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/www2004_rick_rashid_talk.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2222" title="www2004: Rick Rashid talk" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2222</id>
    
    <published>2004-05-20T17:23:48Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>The first talk this morning was from Rick Rashid, of Microsoft Research. His general theme was&quot;Empowering the Individual&quot;. Highlight of the talk: a 1994 Microsoft promo video clip about &quot;digital convergence&quot; to the tune of &quot;Surfin&apos; USA&quot; (with badly rewritten...</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
            <category term="Web Stuff" />
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>The first talk this morning was from Rick Rashid, of Microsoft Research. His general theme was"Empowering the Individual". Highlight of the talk: a 1994 Microsoft promo video clip about "digital convergence" to the tune of "Surfin' USA" (with badly rewritten lyrics, believe you me). Anyway he had three basic themes: democratization of information, getting your life back, and bending things around you to your will. Though his talk came across more like a laundry list of MSR projects,  with some themes perhaps detectable.</p>]]>
        <![CDATA[<p>First up was skyserver, a sort of global standardization and sharing of astronomical data from telescopes all over. This allows astronomers worldwide to do their work at any time of the day or not, even when it's cloudy, without direct access to major equipment. Which is great. And equally, students and anybody else can use the data as well. Similar to terraserver, their project to provide satellite photos of as much of the earth as possible.</p>

<p>In a way similar but with more of a self-publishing flavor is world-wide media exchange (WWMX), an attempt to build a shared collection of location-tagged media. People can submit photos, stories, etc, tagged with time and location. Then anyone can browse by locations for interesting media.</p>

<p>He talked a bit about Wallop, MSR's social networking thingy. In introducing it he said something like "we wanted to combine social networking with blogging and sharing of media" and I thought "isn't that what LiveJournal is?" Wallop, I think, adds explicit support for images and metadata; he talked about how his wallop page has pictures of him and his wife and Neal Stephenson, with the pictures annotated as to who was who. I can see LJ adding support for metadata (and photo hosting) and enabling this stuff in perhaps a more organic way -- i.e., you can put this stuff in your journal or not, and if it's there people can search/browse for it, with access control based on friend lists etc. I'd like to try out wallop, but it's just in limited testing for now.</p>

<p>Next, he talked about something he called "human scale computing". Soon, a terabyte of storage will be affordable for many people. With 1TB, you could record everything you say for your entire life, or take a picture every minute, or record a year of full video. You'd never need to delete anything. What changes when this is true? Interesting. He mentioned the "Stuff I've Seen" project, which basically turns all your computer stuff and everything (and everyone) you interact with into searchable stuff. </p>

<p>Finally, he talked a bit about MS's smart personal objects (eg the SPOT watch). I was surprised to hear that they're FM sub-band network (or whatever it's called) can reach 80% of the US population. I didn't know it was built out.</p>

<p>I've been hearing that MS is thinking about bringing together the web and people's own stuff, treating it as one big data collection and improving search/browse over it. So what he talked a bit fit into this. I have to admit, the possibility of integrating some of these things (think itunes meets soulseek meets livejournal meets a9 or something) sounds good to me, though I'm not sure what MS wants to do with it is what I'd want to do.<br />
</p>]]>
    </content>
</entry>
<entry>
    <title>New co-blogger, Don Patterson</title>
    <link rel="alternate" type="text/html" href="http://www.perkowitz.net/blog/tech/2004/05/new_coblogger_don_patterson.php" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.perkowitz.net/lib/mt/mt-atom.cgi/weblog/blog_id=4/entry_id=2223" title="New co-blogger, Don Patterson" />
    <id>tag:www.perkowitz.net,2004:/blog/t//4.2223</id>
    
    <published>2004-05-19T22:06:42Z</published>
    <updated>2006-06-06T06:55:46Z</updated>
    
    <summary>I&apos;ve added Don Patterson as a thinking machine co-blogger. He&apos;s a grad student at UW CSE, and we&apos;ve worked together on activity inference at Intel Research. I&apos;ll let him make any additional introduction if he likes....</summary>
    <author>
        <name>Mike Perkowitz</name>
        <uri>http://www.perkowitz.net</uri>
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://www.perkowitz.net/blog/tech/">
        <![CDATA[<p>I've added <a href="http://www.cs.washington.edu/homes/djp3/homepage">Don Patterson</a> as a thinking machine co-blogger. He's a grad student at <a href="http://www.cs.washington.edu/">UW CSE</a>, and we've worked together on activity inference at <a href="http://seattle.intel-research.net">Intel Research</a>. I'll let him make any additional introduction if he likes.</p>]]>
        
    </content>
</entry>

</feed> 

