11 "bugs" found in 15 minutes

List overview All Threads
Download

newer

older

translation to esperanto

Questions for the EP

Jelle Hermsen

13 Jul 2011 13 Jul '11

1:28 p.m.

Hey everybody, Scratch my idea to create a MAS, or webcrawler to automate the search. Searching manually is faster. Using the list of local authority websites I found 11 "bugs" in about 5 minutes, just going through the 'A' part.

http://www.aalten.nl/index.php?simaction=content&mediumid=1&pagid=15... http://www.albrandswaard.nl/index.php?simaction=content&mediumid=1&p... http://www.alkmaar.nl/eCache/23220/Aanvraagformulieren http://www.almere.nl/dienstverlening/logo http://www.alphenaandenrijn.nl/Smartsite.shtml?id=12248 http://www.ameland.nl/index.php?simaction=content&mediumid=1&pagid=5... http://www.amersfoort.nl/smartsite.shtml?id=189546 http://www.amsterdam.nl/jeugd_onderwijs/onderwijsbeleid/publicaties/publicat... http://www.annapaulowna.nl/index.php?mediumid=1&pagid=196&simaction=... http://www.appingedam.nl/index.php?simaction=content&mediumid=1&pagi... http://www.arnhem.nl/content.jsp?objectid=113726

Maybe we can split the ODS-up alphabetically into manageable chunks and make it a distributed humanoid search system? Who's in?

Cheerio, Jelle

On 07/13/2011 12:36 PM, Jelle Hermsen wrote:

...

...
I still have a pretty big database of government website urls lying around somewhere, so I might be able to automate some of the "hunt":) I'm very interested.

It's attached to this e-mail. If I have some time I might quickly whip up a tiny multi agent system to automatically search the websites.

Show replies by date

Sam Geeraerts

13 Jul 13 Jul

10:56 p.m.

Jelle Hermsen wrote:

...

Maybe we can split the ODS-up alphabetically into manageable chunks and make it a distributed humanoid search system? Who's in?

I can help search for issues. I think I'll leave contacting them up to you natives, though. :)

Sam Geeraerts

14 Jul 14 Jul

8:39 p.m.

Jelle Hermsen wrote:

...

Maybe we can split the ODS-up alphabetically into manageable chunks and make it a distributed humanoid search system? Who's in?

I can help search for issues. I think I'll leave contacting them up to you natives, though. :)

Matthias Kirschner

15 Jul 15 Jul

2:20 p.m.

* Jelle Hermsen jelle@fsfe.org [2011-07-13 13:28:26 +0200]:

...

Scratch my idea to create a MAS, or webcrawler to automate the search. Searching manually is faster.

I don't know if the attached script might help with that. Would be interested in your view.

Regards, Matthias

-- Matthias Kirschner - FSFE - Fellowship Coordinator, German Coordinator Free Software is important to you? Join today! (fsfe.org/join) Weblog (blogs.fsfe.org/mk) - Contact (fsfe.org/about/kirschner)

Jelle Hermsen

2:50 p.m.

Thanks Matthias!

Yep, that script would definitely do the job. 1 problem is that it uses the Google search API, and that bugger has a bit of nasty EULA, which only allows you to do 100 or so automated queries per day. That's not really a big problem in this case, but since we'll also need to find the information so we can contact the specific local government in question there's already a bit of manual labor involved. Adding a manual search using "site:" doesn't make the manual part that much bigger.

I do however see that this might be perceived as a "brain-dead" job. Doesn't really worry me much. I have worked as a code-monkey for 2 years at a firm, banging out Typo3 and Joomla sites round the clock, so "brain-dead" doesn't really bother me much :-) But anyone who wants to chip in with half a brain can use this script. I tried it and it works pretty well. It gives an error when it doesn't find anything, but I guess the programmer was in a bit of an pessimistic mood when he made it ;)

It needs the simplejson module. To install this on Debian Squeeze you can use:

sudo aptitude install python-simplejson

You can run it using (I know Almere's website has adobe reader advertisments on it): python find-acrobat-commercial.py almere.nl

Cheerio, Jelle

On 07/15/2011 02:20 PM, Matthias Kirschner wrote:

...

Jelle Hermsenjelle@fsfe.org [2011-07-13 13:28:26 +0200]:

...
Scratch my idea to create a MAS, or webcrawler to automate the search. Searching manually is faster.

I don't know if the attached script might help with that. Would be interested in your view.

Regards, Matthias

Matthias Kirschner

2:54 p.m.

Hi Jelle,

just a short reply:

* Jelle Hermsen jelle@fsfe.org [2011-07-15 14:50:54 +0200]:

...

I do however see that this might be perceived as a "brain-dead" job.

I think most important is that we fix the http://fsfe.org/campaigns/pdfreaders/buglist.en.html and than continue with others whenever we have fun doing so (e.g. at a bug fixing party with drinks and pizza :) ).

All the best, Matthias

Jelle Hermsen

2:58 p.m.

...

I think most important is that we fix the http://fsfe.org/campaigns/pdfreaders/buglist.en.html and than continue with others whenever we have fun doing so (e.g. at a bug fixing party with drinks and pizza :) )

Yeah, you're probably right. I have been known for jumping the gun :)

Sam Geeraerts

19 Jul 19 Jul

5:22 p.m.

Matthias Kirschner wrote:

...

Jelle Hermsen jelle@fsfe.org [2011-07-13 13:28:26 +0200]:

...
Scratch my idea to create a MAS, or webcrawler to automate the search. Searching manually is faster.

I don't know if the attached script might help with that. Would be interested in your view.

Note that Adobe Acrobat Reader was renamed to Adobe Reader many years ago, so it would be useful to include a search pattern for it.

Matthias Kirschner

20 Jul 20 Jul

9:08 a.m.

Hi Sam,

* Sam Geeraerts samgee@fsfe.org [2011-07-19 17:22:18 +0200]:

...

...
...
Scratch my idea to create a MAS, or webcrawler to automate the search. Searching manually is faster.

I don't know if the attached script might help with that. Would be interested in your view.

Note that Adobe Acrobat Reader was renamed to Adobe Reader many years ago, so it would be useful to include a search pattern for it.

Thanks for the notice. I CCed Ole who wrote the scrit.

Regards, Matthias

Ole Tange

11:05 a.m.

On Wed, Jul 20, 2011 at 9:08 AM, Matthias Kirschner mk@fsfe.org wrote:

...

Hi Sam,

Sam Geeraerts samgee@fsfe.org [2011-07-19 17:22:18 +0200]:

...
Note that Adobe Acrobat Reader was renamed to Adobe Reader many years ago, so it would be useful to include a search pattern for it.

Thanks for the notice. I CCed Ole who wrote the scrit.

According to http://en.wikipedia.org/wiki/Adobe_Acrobat the names have been Adobe Reader and Acrobat Reader. Searching for "Reader" is probably not a good idea, but doing both a search for 'Adobe' and a search for 'Acrobat' would probably make sense.

I just tested searching for Adobe and that gives a lot of false positives (Adobe Flash), however, if I instead search for "Adobe Reader" that works better.

Should we put the script on a public git repository so we can maintain it together?

/Ole

Matthias Kirschner

1:47 p.m.

* Ole Tange tange@gnu.org [2011-07-20 11:05:44 +0200]:

...

Should we put the script on a public git repository so we can maintain it together?

That sounds good.

Thanks a lot, Matthias

Matthias Kirschner

21 Jul 21 Jul

10:07 a.m.

Hi Ole,

* Matthias Kirschner mk@fsfe.org [2011-07-20 13:47:53 +0200]:

...

Ole Tange tange@gnu.org [2011-07-20 11:05:44 +0200]:

...
Should we put the script on a public git repository so we can maintain it together?

That sounds good.

Is it possible to generate a output text file with:

- Domain Name, one or two example URL of the advertisement

With this format it is easier for others to follow-up.

Regards, Matthias

Ole Tange

12:32 p.m.

On Thu, Jul 21, 2011 at 10:07 AM, Matthias Kirschner mk@fsfe.org wrote:

...

Matthias Kirschner mk@fsfe.org [2011-07-20 13:47:53 +0200]:

...

Ole Tange tange@gnu.org [2011-07-20 11:05:44 +0200]:

...
Should we put the script on a public git repository so we can maintain it together?

That sounds good.

I have requested a git account at Savannah, but apparently they need to manually approve it - so it may take some time.

...

Is it possible to generate a output text file with:

Domain Name, one or two example URL of the advertisement

With this format it is easier for others to follow-up.

This version outputs:

domain name \t Whether it is only in Google's cache or still exists \t URL \t title of page

It imports directly into LibreOffice as TSV, just name the output file foo.csv.

I have run it on um.dk which is attached.

/Ole

Sam Geeraerts

10:08 p.m.

Ole Tange wrote:

...

This version outputs:

domain name \t Whether it is only in Google's cache or still exists \t URL \t title of page

It imports directly into LibreOffice as TSV, just name the output file foo.csv.

I have run it on um.dk which is attached.

I suppose there's no way around the limited number of queries? I don't have a Google account and I don't plan on creating one. But 100 queries is probably enough most of the time. Can we undo needing double the amount of queries by using an OR clause in the query?

Ole Tange

22 Jul 22 Jul

2:15 a.m.

On Thu, Jul 21, 2011 at 10:08 PM, Sam Geeraerts samgee@fsfe.org wrote:

...

I suppose there's no way around the limited number of queries?

It is possible to raise the limit by paying.

Maybe Seeks can be used instead of Google? I, however, have no experience in programing stuff against Seeks.

...

I don't have a Google account and I don't plan on creating one. But 100 queries is probably enough most of the time. Can we undo needing double the amount of queries by using an OR clause in the query?

I could not get the OR clause to work.

However, the script now pauses, so if you run the script in serial (i.e not parallel) then it should stay below the 100/day. So leave running for a few days to slowly work its way through the domains.

/Ole

Sam Geeraerts

24 Jul 24 Jul

1:58 p.m.

Ole Tange wrote:

...

Maybe Seeks can be used instead of Google? I, however, have no experience in programing stuff against Seeks.

Me neither.

...

I could not get the OR clause to work.

My test with 'site:'+domain+' acrobat OR "adobe reader" -filetype:pdf' seemed to work.

...

However, the script now pauses, so if you run the script in serial (i.e not parallel) then it should stay below the 100/day. So leave running for a few days to slowly work its way through the domains.

I think it's useful to limit the number of search results per requested domain. We only really want to know if the website still has at least one issue and maybe just get a few examples to point out to the website maintainers when we contact them.

Ole Tange

9:50 p.m.

On Sun, Jul 24, 2011 at 1:58 PM, Sam Geeraerts samgee@fsfe.org wrote:

...

Ole Tange wrote:

...
Maybe Seeks can be used instead of Google? I, however, have no experience in programing stuff against Seeks.

Me neither.

I just asked the Seeks people: Seeks is a meta-frontend so it should would.

...

...
I could not get the OR clause to work.

My test with 'site:'+domain+' acrobat OR "adobe reader" -filetype:pdf' seemed to work.

Great.

...

...
However, the script now pauses, so if you run the script in serial (i.e not parallel) then it should stay below the 100/day. So leave running for a few days to slowly work its way through the domains.

I think it's useful to limit the number of search results per requested domain. We only really want to know if the website still has at least one issue and maybe just get a few examples to point out to the website maintainers when we contact them.

That is not a good service: If they fix the problem for the 2 examples and for all future pages, we will still be getting back to them next time we do this. I think it is a much better service to help them find what pages have the problem - we are most likely better at doing that than they are anyway.

/Ole

Matthias Kirschner

25 Jul 25 Jul

11:55 a.m.

Hi Ole,

* Ole Tange tange@gnu.org [2011-07-24 21:50:09 +0200]:

...

That is not a good service: If they fix the problem for the 2 examples and for all future pages, we will still be getting back to them next time we do this. I think it is a much better service to help them find what pages have the problem - we are most likely better at doing that than they are anyway.

Sounds reasonable, especially if we send them e-mails, where we can attache all the URLs.

Thanks, Matthias

Sam Geeraerts

1:35 p.m.

Ole Tange wrote:

...

On Sun, Jul 24, 2011 at 1:58 PM, Sam Geeraerts samgee@fsfe.org wrote: I just asked the Seeks people: Seeks is a meta-frontend so it should would.

There seems to be an API [1].

...

...
I think it's useful to limit the number of search results per requested domain. We only really want to know if the website still has at least one issue and maybe just get a few examples to point out to the website maintainers when we contact them.

That is not a good service: If they fix the problem for the 2 examples and for all future pages, we will still be getting back to them next time we do this. I think it is a much better service to help them find what pages have the problem - we are most likely better at doing that than they are anyway.

I was assuming that they'd realize that their website can have multiple references or that the reference would be handled by one component of a CMS. Your reasoning makes sense and is safer.

[1] http://seeks-project.info/wiki/index.php/Seeks_JSON_Search_API

Ole Tange

24 Jul 24 Jul

10:39 p.m.

On Wed, Jul 20, 2011 at 11:05 AM, Ole Tange tange@gnu.org wrote:

...

On Wed, Jul 20, 2011 at 9:08 AM, Matthias Kirschner mk@fsfe.org wrote:

...

Should we put the script on a public git repository so we can maintain it together?

git clone git://git.savannah.nongnu.org/pdfcom.git

I found both Sam and Matthias in Savannah, so you are added as members.

/Ole

Sam Geeraerts

25 Jul 25 Jul

1:47 p.m.

Ole Tange wrote:

...

git clone git://git.savannah.nongnu.org/pdfcom.git

I found both Sam and Matthias in Savannah, so you are added as members.

Damn, I've been able to avoid learning git until now. I guess I'll have to look into it. :)

By the way, I should have mentioned this earlier, but the name "PDF commercial identifier" is a bit unfortunate. "Commercial" is probably meant in the sense of "advertisement", but it gives the impression that it looks for "commercial software". That suggests that only non-free software can be commercial, which is of course not true.

P.S.: I thought it was about time I subscribe to this list, but I can't find any information webpage about it. [1] or [2] would be a logical place to find something like that, IMO.

[1] http://fsfe.org/campaigns/pdfreaders/pdfreaders.en.html [2] http://lists.fsfe.org/

Matthias Kirschner

2:13 p.m.

* Sam Geeraerts samgee@fsfe.org [2011-07-25 13:47:02 +0200]:

...

Ole Tange wrote:

...
git clone git://git.savannah.nongnu.org/pdfcom.git

I found both Sam and Matthias in Savannah, so you are added as members.

Damn, I've been able to avoid learning git until now. I guess I'll have to look into it. :)

Sorry, we are forcing you to learn new stuff to participate ;)

...

By the way, I should have mentioned this earlier, but the name "PDF commercial identifier" is a bit unfortunate. "Commercial" is probably meant in the sense of "advertisement", but it gives the impression that it looks for "commercial software". That suggests that only non-free software can be commercial, which is of course not true.

That's true. Ole is it ok, if we brainstrom about a cool name? Things like GATA: "Government Adds Terminator Assistant" come into my mind ;)

...

P.S.: I thought it was about time I subscribe to this list, but I can't find any information webpage about it. [1] or [2] would be a logical place to find something like that, IMO.

That's because at the beginning this was mainly a list for internal coordination. As you want to be active here, please subscribe under: https://lists.fsfe.org/mailman/listinfo/pdfreaders

All the best, Matthias

Ole Tange

3:47 p.m.

On Mon, Jul 25, 2011 at 2:13 PM, Matthias Kirschner mk@fsfe.org wrote:

...

Sam Geeraerts samgee@fsfe.org [2011-07-25 13:47:02 +0200]:

...
Ole Tange wrote:

...
git clone git://git.savannah.nongnu.org/pdfcom.git

I found both Sam and Matthias in Savannah, so you are added as members.

Damn, I've been able to avoid learning git until now. I guess I'll have to look into it. :)

Do initial checkout:

$ git clone yourlogin@git.sv.gnu.org:/srv/git/pdfcom.git

Daily work:

$ git pull <<do your changes>> (If others are busy checking in, too. Then get their changes: $ git pull ) $ git commit -a $ git push

The only hard part I found from coming form other VCSs is that 'commit' does not send the changes to the server: You also have to push.

...

...
By the way, I should have mentioned this earlier, but the name "PDF commercial identifier" is a bit unfortunate. "Commercial" is probably meant in the sense of "advertisement", but it gives the impression that it looks for "commercial software". That suggests that only non-free software can be commercial, which is of course not true.

You are right: "PDF advertisment finder" would clearly be less ambiguous.

...

That's true. Ole is it ok, if we brainstrom about a cool name? Things like GATA: "Government Adds Terminator Assistant" come into my mind ;)

Sure. If I were you I would choose a name so that if people knew the function but not the name then they would be able to find it using Google. So PDF should probably be part of the name.

As I do bioinformatics and we work with DNA then the letters A, C, G, and T take on a whole new meaning - thus words containing only A, C, G, and T are always assumed to be DNA related.

/Ole

Sam Geeraerts

8:32 p.m.

Ole Tange wrote:

...

Do initial checkout:

$ git clone yourlogin@git.sv.gnu.org:/srv/git/pdfcom.git

Daily work:

$ git pull <<do your changes>> (If others are busy checking in, too. Then get their changes: $ git pull ) $ git commit -a $ git push

The only hard part I found from coming form other VCSs is that 'commit' does not send the changes to the server: You also have to push.

Cool. So, much like the bzr I'm more used to.

...

Sure. If I were you I would choose a name so that if people knew the function but not the name then they would be able to find it using Google. So PDF should probably be part of the name.

Makes sense.

Sam Geeraerts

8:29 p.m.

Matthias Kirschner wrote:

...

That's true. Ole is it ok, if we brainstrom about a cool name? Things like GATA: "Government Adds Terminator Assistant" come into my mind ;)

Yay, brainstorming! Let me go crazy for a bit:

- PRONTO: PDF Readers Ought Not To Offend - PReSTo: PDF Reader Search Tool - PROSIT: PDF Reader On Site Investigation Tool - PROBE: PdfReaders.Org {Baring,Batch,Bettering,Bias-finding} Equipment, PdfReaders.Org Bug Evincer - PDFreeder

...

That's because at the beginning this was mainly a list for internal coordination. As you want to be active here, please subscribe under: https://lists.fsfe.org/mailman/listinfo/pdfreaders

Subscription request pending.

Matthias Kirschner

26 Jul 26 Jul

10:03 a.m.

New subject: New list member Sam Geeraerts (was: Re: 11 "bugs" found in 15 minutes)

* Sam Geeraerts samgee@fsfe.org [2011-07-25 20:29:36 +0200]:

...

...
That's because at the beginning this was mainly a list for internal coordination. As you want to be active here, please subscribe under: https://lists.fsfe.org/mailman/listinfo/pdfreaders

Subscription request pending.

I just approved your request. Can you write a firm introducation for the rest here?

Thanks, Matthias

Matthias Kirschner

10:30 a.m.

New subject: New list member Sam Geeraerts (was: Re: 11 "bugs" found in 15 minutes)

* Matthias Kirschner mk@fsfe.org [2011-07-26 10:03:50 +0200]:

...

I just approved your request. Can you write a firm introducation for the rest here?

Do you also want to have access to the pdfreaders.org website. Than you could help to improve things there when we get requests.

Thanks, Matthias

Sam Geeraerts

11:31 a.m.

New subject: New list member Sam Geeraerts

Matthias Kirschner wrote:

...

Do you also want to have access to the pdfreaders.org website. Than you could help to improve things there when we get requests.

I think I'll pass (for now) and stay in the fringes. I'm busy enough as it is. :)

Matthias Kirschner

11:53 a.m.

New subject: New list member Sam Geeraerts

* Sam Geeraerts samgee@fsfe.org [2011-07-26 11:31:08 +0200]:

...

Matthias Kirschner wrote:

...
Do you also want to have access to the pdfreaders.org website. Than you could help to improve things there when we get requests.

I think I'll pass (for now) and stay in the fringes. I'm busy enough as it is. :)

Ok. I understand.

Regards, Matthias

Sam Geeraerts

11:24 a.m.

New subject: New list member Sam Geeraerts

Matthias Kirschner wrote:

...

I just approved your request. Can you write a firm introducation for the rest here?

My name is Sam Geeraerts. I live in Belgium. I've been using free software for over 10 years. I'm one of the maintainers of gNewSense. I've been looking with interest at FSFE for some years, but I've only recently really became a member, mainly because I never got around to actually doing it. The two most important FSFE activities for me personally currently are the Dutch branch and the PDFReaders campaign. It's one of the best free software campaign ideas I've seen so far.

Matthias Kirschner

10:15 a.m.

Hi Sam,

* Sam Geeraerts samgee@fsfe.org [2011-07-25 20:29:36 +0200]:

...

Matthias Kirschner wrote:

...
That's true. Ole is it ok, if we brainstrom about a cool name? Things like GATA: "Government Adds Terminator Assistant" come into my mind ;)

Yay, brainstorming! Let me go crazy for a bit:

PRONTO: PDF Readers Ought Not To Offend

PReSTo: PDF Reader Search Tool

PROSIT: PDF Reader On Site Investigation Tool

PROBE: PdfReaders.Org {Baring,Batch,Bettering,Bias-finding}

Equipment, PdfReaders.Org Bug Evincer

PDFreeder

Cool, they all sind nice. I think I like PROSIT most :) What do other think? New ideas? Comments to the suggestions?

All the best, Matthias

Nicolas JEAN

5:19 p.m.

Le 26/07/2011 10:15, Matthias Kirschner a écrit :

...

Sam Geeraertssamgee@fsfe.org [2011-07-25 20:29:36 +0200]:

...
Yay, brainstorming! Let me go crazy for a bit:

PRONTO: PDF Readers Ought Not To Offend

PReSTo: PDF Reader Search Tool

PROSIT: PDF Reader On Site Investigation Tool

PROBE: PdfReaders.Org {Baring,Batch,Bettering,Bias-finding}

Equipment, PdfReaders.Org Bug Evincer

PDFreeder

Cool, they all sind nice. I think I like PROSIT most :) What do other think? New ideas? Comments to the suggestions?

PROSIT is very good in my opinion too. I would put an 's' to "Reader", what do you think?

Cheers, Nico

-- Nicolas JEAN - Intern, Web coordinator Free Software Foundation Europe (http://fsfe.org) Is Free Software important to you? Join us! (http://fsfe.org/join)

Matthias Kirschner

6:05 p.m.

* Nicolas JEAN nicoulas@fsfe.org [2011-07-26 17:19:49 +0200]:

...

PROSIT is very good in my opinion too. I would put an 's' to "Reader", what do you think?

Yes, that makes sense. Matthias

4831

Age (days ago)

4844

Last active (days ago)

pdfreaders@lists.fsfe.org

32 comments

6 participants

tags (0)

participants (6)

Jelle Hermsen
Matthias Kirschner
Nicolas JEAN
Ole Tange
Sam Geeraerts
Sam Geeraerts