From: Portwin, K (Katie) Sent: 31 May 2007 18:10 To: Lawrence, BN (Bryan); Matthews, BM (Brian) Subject: Claddier Ping: trackback/whitelist model. Document and Questions. I have documented the proposed technical architecture of the "ping" mechanism, as discussed at today's Claddier meeting. Please find doc attached; I would be grateful to receive your comments.
In particular, the last section contains some questions.
Katie
From: Bryan Lawrence [b.n.lawrence@…] Sent: 31 May 2007 22:03 To: Portwin, K (Katie) Cc: Matthews, BM (Brian) Subject: Re: Claddier Ping: trackback/whitelist model. Document and Questions.
hi Katie
Tried to edit the doc, but my version of O-O munged up the figure all over the text. So, my answers: 0 I don't think the match url adds much value; BM may differ, but either way, why not put it in version 2 (or later :-)? 1 Two methods on the registry seem fairly obvious: a) give me the white list (all of it, in xml), and b) is this host in the white list (yes/no). 2 I don't really see the point in the publishing I support trackback, that requires a get, and I'll find that out straight away with a get to the resource in question. seems redundant. 3) Not sure I understand this one. The ping target is defined in the trackback ping itself, what value is in adding a DC version of something for which you are pinging? 4) I did that at 0 :0-)
Cheers Bryan
From: Portwin, K (Katie) Sent: 01 June 2007 10:25 To: Lawrence, BN (Bryan) Cc: Matthews, BM (Brian) Subject: Re: Claddier Ping: trackback/whitelist model. Document and Questions.
Hi Bryan,
Thanks for your answers.
> 1 Two methods on the registry seem fairly obvious: a) give me the
white list
Understood about the 2 methods a) and b).
Any thoughts on what this xml doc should look like for a, (any existing ontology for making these kinds of declarations?)
> 2 I don't really see the point in the publishing I support trackback,
that
> requires a get, and I'll find that out straight away with a get to the > resource in question. seems redundant.
I was thinking that the doc would contain some information about the trackback support - eg here is my IP address to go on your whitelist, here is my url prefix (if we were doing that.) Otherwise how does the central registry / local registry, potentially get populated?
> 3) Not sure I understand this one. The ping target is defined in the
trackback
> ping itself, what value is in adding a DC version of something for
which you
> are pinging?
Understood, it is redundant in the context of the whole ping process. I guess I was just being squeamish about declarative information encoded in the transport layer. Eg some day someone might want to use the 'ping' contents/process/code to do something other than ping (eg publish it as a feed or something). Maybe this is feature-itis.
Katie.
From: Matthews, BM (Brian) Sent: 01 June 2007 17:26 To: Portwin, K (Katie); Lawrence, BN (Bryan) Subject: FW: Claddier Ping: trackback/whitelist model. Document and Questions.
Comments inlined prefixed with <BM>.
Original Message From: Portwin, K (Katie) Sent: 01 June 2007 10:25 To: Lawrence, BN (Bryan) Cc: Matthews, BM (Brian) Subject: Re: Claddier Ping: trackback/whitelist model. Document and Questions.
Hi Bryan,
Thanks for your answers.
> 1 Two methods on the registry seem fairly obvious: a) give me the white list
Understood about the 2 methods a) and b).
Any thoughts on what this xml doc should look like for a, (any existing ontology for making these kinds of declarations?)
<BM> A quick look on google doesn't give a lot of clue - do they just keep IPs ?
Testing on IPs sounds rather rigid - services move around from machine to machine quite often.
Surely the whitelist should have more than an acceptable IP addresses and domain names. A repository admin is going to want to know who is sending the stuff, for usage reporting at least I would have thought.
So ultimately
- IP address
- domain name
- Repository Name
- Repository contact
- Public Key
But the last at least would be for the future
</BM>
> 2 I don't really see the point in the publishing I support trackback,
that
> requires a get, and I'll find that out straight away with a get to the > resource in question. seems redundant.
I was thinking that the doc would contain some information about the trackback support - eg here is my IP address to go on your whitelist, here is my url prefix (if we were doing that.) Otherwise how does the central registry / local registry, potentially get populated?
<BM> Yes this would be the info to go into the Whitelist </BM>
> 3) Not sure I understand this one. The ping target is defined in the
trackback
> ping itself, what value is in adding a DC version of something for
which you
> are pinging?
Understood, it is redundant in the context of the whole ping process. I guess I was just being squeamish about declarative information encoded in the transport layer. Eg some day someone might want to use the 'ping' contents/process/code to do something other than ping (eg publish it as a feed or something). Maybe this is feature-itis.
<BM> With Bryan on this one. Let keep it simple. </BM>
<BM> On item 0/4 - old fashioned s/w engineer - why do work if you just throw it away later? In a bandwidth rich world, I guess it does not matter, but still strikes me as wasteful.
On the otherhand, omitting it also would be attractive - you would start trackingback to repositories you didn't previously know, so improve the sum total of cross-citations (nice). But then you are still blocked by the white list (not so nice).
A more sophisticated protocol would be to send a request for permission to trackback on discovery of a citation from a source we have not seen before. This would need a manual check of the request to confirm legitimacy. Maybe featureitis again at this point. </BM>
<BM> Other issue we touched on yesterday in the corridor - what to do if the reciever repository is down? Do we put in a pending queue and retry, or just give up. Resending a "nice to have". </BM>
Katie.
<BM> Thanks B </BM>
From: Portwin, K (Katie) Sent: 04 June 2007 11:01 To: Matthews, BM (Brian); Lawrence, BN (Bryan) Subject: RE: Claddier Ping: trackback/whitelist model. Document and Questions.
Hi Brian,
Thank you very much for your comments.
Just to summarise the the position that has coalesced then:
1. "How should the central registry implement support for query-for-whitelist"
- Support 'return the whitelist' - Support 'is this host in the whitelist yes/no' - Format of whitelist to include IP address, domain name, Repository Name, Repository contact, and room for extension
Just one question left: format - is there a standard, or shall we create our own? If latter, shall I make an accompanying OWL and/or XSD.
2. "Does each registry also need to publish an "I support trackback" document,"
No.
3. "In the metadata=X overloaded ping, should X contain info only about the citing resource?"
No.
4. "Requirement to match URL prefix? As above."
I think you are both agreeed on No, (at least, not on the grounds of wasted requests, in a bandwith rich world). However, nice-to-have would be a protocol aimed at confirming legitimacy of previously unknown citation sources. Should I put this on the "phase 2" list for now?
5. How should we handle delivery failures.
My view would be that we would ideally persist a queue of undelivered letters - JMS might be a good solution. Should I put this on the "phase 2" list too?
If you agree on the above two suggestions I think I have solid reqs. for everthing except very first query - format of the whitelist.
Thanks,
Katie.
From: Matthews, BM (Brian) Sent: 04 June 2007 11:13 To: Portwin, K (Katie); Lawrence, BN (Bryan) Subject: RE: Claddier Ping: trackback/whitelist model. Document and Questions. The movabletype whitelist just seems to be a "full or partial domain or IP address on a line by itself." http://www.sixapart.com/movabletype/docs/3.2/k_preinstalled_plugins/spamlookup/lookup_whitelists.html
So yes to OWL/XSD.
Rest seems reasonable - doing a rapid development to get something going and then adding features.
B
On Friday 08 June 2007 12:33:05 Matthews, BM (Brian) wrote:
Bryan, Talking this over with Katie yesterday, I still have reservations about the whitelist mechanism - as it uses IPs as the distinguishing characteristic. We were running into problem with this for testing - Katie coming in via VPN was being set a dynamic IP address and so failing the whitelist. IP addresses move and change quite dynamically (though arguably a repository is quite stable, you don't want to have to inform all those (potentially unknown) whitelists if you decide to change your host machine). Fundamentally, it is testing the wrong thing - the validity of a machine rather than a service. So in the long run I think that there should be a different mechanism - but I am not sure what to do which is both flexible and hard to spoof. We'll leave it as it is at the moment, but worth thinking about. Brian
From: Bryan Lawrence b.n.lawrence@… Sent: 09 June 2007 20:20 To: Matthews, BM (Brian) Cc: Portwin, K (Katie) Subject: Re: Claddier Ping: trackback/whitelist model. Document and Questions.
Fair criticisms, perhaps about how the whitelisting is done, not about whitelisting per se, but as you say, worth thinking about ...
Cheers Bryan
From: Portwin, K (Katie) Sent: 11 June 2007 13:10 To: Lawrence, BN (Bryan); Matthews, BM (Brian) Subject: RE: Claddier Ping: trackback/whitelist model. Document and Questions.
I was just wondering what other alternatives to spam detection are available (and what approach the big blog engines are considering) and found 'pingback':
See: http://en.wikipedia.org/wiki/Pingback
Ie: When you receive a Ping, first follow the URL of the supposedly citing resource and scrape that HTML page looking for the supposed link to you.
(However, presumably the spammer can easily get overcome this check by setting up a page with a link for every spam trackback they send?
Also, this approach may stop malicious spam, but does not help with 'accidental' spam. Eg, what bout genuine trackbacks from WordPress? or Blogger, when someone links to our repository article in their blog post - we don't want these, but they are not 'spam' in the normal sense.
I think that perhaps this approach is focusing on linking and sidesteps the question of actually /identifying/ the ping-er, which is what we really need.)
KP.
