root/gootrude/trunk/README

Revision 53, 3.6 kB (checked in by mbr, 4 months ago)

minor wording fix

Line 
1
2 Gootrude - The Search Engine Results Trender
3
4 The purpose of this project is to accept a set of search terms to be input
5 into Yahoo and (optionally) Google and graph the numeric result returned over
6 time.  This allows changes in search results to be visualized graphically, and
7 large changes can indicate significant usage of related search terms in high
8 profile websites around the Internet.  Moving averages are supported, and
9 Gnuplot is used (currently) as the graphing software.  Gootrude is an open
10 source answer to the problem where search engines do not generally expose
11 trends in how they build their search indexes for arbitrary search terms.  All
12 information collected by Gootrude is returned by Yahoo through normal usage of
13 the Yahoo website - Gootrude just collects this information and graphs it over
14 time.
15
16 Gootrude is released under the GPL as free and open source software.  Please
17 email Michael Rash (mbr[at]cipherdyne[dot]org) with any questions or concerns.
18
19 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
20 WARNING:  According to the Google Terms of Service, automated web queries of
21 Google are not permitted.  Hence, the default search engine that Gootrude uses
22 is Yahoo for all queries, but Gootrude supports Google searches if you choose
23 to enable that feature.  However, you do this EXCLUSIVELY AT YOUR OWN RISK -
24 YOU HAVE BEEN WARNED - Google may choose to block your queries.  That said,
25 some things to keep in mind though are that Google is probably mostly
26 interested in making sure that the quality of its services is as high as
27 possible, and so it doesn't want huge numbers of automated queries against
28 its infrastructure.  So, if you choose to configure Gootrude to query Google,
29 then please make sure that Gootrude is not run more than once per day (as per
30 its design), and also make sure that the number of search terms is not high
31 (say, less than 20).
32 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
33
34 QUICK START:
35
36 The basic deployment scenario for Gootrude on a Linux system is:
37
38 0) Gootrude requires the following software to be installed:
39
40 perl, and the Date::Calc CPAN module
41 wget
42 gnuplot
43
44 1) Download and extract the Gootrude source tarball within a directory, say
45 within /home/mbr/src:
46
47 $ cd /home/mbr/src
48 $ wget http://www.cipherdyne.org/gootrude/download/gootrude-0.1.tar.gz
49 $ tar xfz gootrude-0.1.tar.gz
50
51 2) Edit the searchterms.conf file:
52
53 $ cd gootrude-0.1
54 $ vim searchterms.conf
55
56 3) Within the searchterms.conf file, add search terms that you want to trend
57 over time accorind to the following format (with the brackets):
58
59 [search term] [search_engine:type] [file]
60
61 Here are two examples in the searchterms.conf file:
62
63 [Linux "highspeed firewalls"] [yahoo:count] [Linux_highspeed_firewalls.dat]
64 [http://www.slashdot.org] [google:link] [slashdot_links.dat]
65
66 The first [Linux "highspeed firewalls"] search term is input (with quotes)
67 into Yahoo as a normal search (as defined by [yahoo:count]), and the results
68 will be stored in the file [Linux_highspeed_firewall.dat] (within the
69 gootrude_plot/ directory - see the GOOTRUDE_PLOT_DIR variable in the
70 gootrude.conf file).
71
72 The second [http://www.slashdot.org] queries Google for the number of
73 backwards links to http://www.slashdot.org (see the [google:link] type) and
74 places the results in the [slashdot_links.dat] file (also within the
75 gootrude_plot/ directory).
76
77 All Gootrude data is stored within the gootrude_plot/ directory, and .png
78 graphics files are also created within this directory.
79
80 4) It is recommend to run the gootrude script on a regular basis out of cron
81 like so:
82
83 0 1 * * * cd /home/mbr/src/gootrude && ./gootrude
84
85 This will run Gootrude once per day at 1am.
Note: See TracBrowser for help on using the browser.