The log statistics system plays an important role in the user behavior analysis of the site, especially for keyword access statistics from search engines: it is a very effective source of user behavior analysis data. With the development of the Internet over the years, WEB log statistics tools have become more and more mature and have more and more functions. Many of them are open source, and AWStats is one of the best ones.
Brief installation instructions are as follows:
Install
http://sourceforge.net/projects/awstats/ After downloading the installation package:
GNU/Linux: tar zxf awstats-version.tgz
By default, the scripts and static files of awstats are in the wwwroot directory: deploy all the files in the cgi-bin directory to the cgi-bin/ directory: /home/apache/cgi-bin/awstats/
mv awstats-version/wwwroot/cgi-bin /path/to/apache/cgi-bin/awstats
Copy the icon and other file directories to the WEB HTML file publishing directory, for example: /home/apache/htdocs/ and publish them
More batch update scripts are in the tools directory and can be placed in the cgi-bin/awstats/ directory.
Upgrade the definitions of major domestic search engines and spiders, and install the GeoIP application library: C
http://www.maxmind.com/download/geoip/api/c/ Unpack, compile and install
perl -MCPAN -e 'install "Geo::IP"' or use the pure Perl package perl -MCPAN -e 'install "Geo::IP::PurePerl"'
Download the GeoIP/GeoIPCityLite package: unpack and deploy to the awstats directory
http://www.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
http://www.maxmind.com/download/geoip/database/GeoIP.dat.gz
Configuration
Name the default awstats.model.conf to common.conf
Modify some of these configuration options:
LoadPlugin="decodeutfkeys"
LoadPlugin="geoip GEOIP_STANDARD /home/apache/chedong.com/cgi-bin/awstats/GeoIP.dat"
LoadPlugin="geoip_city_maxmind GEOIP_STANDARD /home/apache/chedong.com/cgi-bin/awstats/GeoLiteCity.dat"
Create the data directory under awstats for statistical data output
Set up the configuration file according to the following example:
Include "common.conf"
LogFile="/home/apache/logs/access_log.%YYYY-24%MM-24%DD-24"
SiteDomain="www.chedong.com"
HostAliases="chedong.com"
DefaultFile="index.html"
DirData="/home/apache/cgi-bin/awstats/data/"
Content summary: Introduction to the use of AWStats and some configuration improvement instructions. It's great to see that starting with AWStats version 6.3: Chinese users basically only need to enable LoadPlugin="decodeutfkeys" in the configuration file. Basically, there will be no statistical problems with Chinese search engines. Currently, # Minor chinese search engines ' baidu.com', 'search.sina.com', 'search.sohu.com', these three search engines. Contains patches for major domestic search engines and spider definitions (just overwrite the original program directory in the lib directory after unpacking)
[Cut-Page]
The log statistics system plays an important role in the user behavior analysis of the site, especially for keyword access statistics from search engines: it is a very effective source of user behavior analysis data. With the development of the Internet over the years, WEB log statistics tools have become more and more mature and have more and more functions. Many of them are open source, and AWStats is one of the best ones.
AWStats: Advanced Web Statistics
AWStats is a Perl-based WEB log analysis tool developed rapidly on Sourceforge. Compared with Webalizer, another excellent open source log analysis tool, AWStats has the following advantages:
Friendly interface: You can directly call the corresponding language interface according to the browser (simplified Chinese version is available)
Reference output sample: http://www.chedong.com/cgi-bin/awstats/awstats.pl? config=chedong
Based on Perl: and solves the cross-platform issue very well. The system itself can run on GNU/Linux or Windows (after ActivePerl is installed); the analyzed logs directly support Apache format (combined) and IIS format (need to be modified). Although Webalizer also has a Windows platform version, it currently lacks maintenance;
AWStats can completely use one system to complete unified statistics on different WEB servers of your own site: GNU/Linux/Apache and Windows/IIS servers.
Relatively high efficiency: AWStats output statistics items are much richer than Webalizer, and the speed can still reach about 1/3 of Webalizer. For a site with millions of daily visits, this speed is sufficient;
Convenient configuration/customization: The system provides sufficiently flexible but reasonable default configuration rules. No more than 3 or 4 default configurations need to be modified to start running, and there are quite a few plug-ins that can be modified and expanded;
The designers of AWStats are designed for precise "Human visits", so many search engine robot visits are filtered out, so the numbers may be lower than other log statistics tools. Visits from within the company can also be passed through IP filtering settings filter out.
Provides many extended parameter statistics functions: using ExtraXXXX series configuration to generate parameter analysis for specific applications will be very useful for product analysis.
For more comparisons with other tools: Webalizer, analog, please refer to:
http://awstats.sourceforge.net/#COMPARISON
[Cut-Page]
AWStats installation reminder
The operating mode of AWStats is as follows:
Analyze logs: After running, archive such log statistical results into an AWStats database (plain text);
Then there is the output: in two forms
One is to read the statistical result database output through the cgi program;
One is to run a background script to export the output into a static file;
The following are 2 examples of log statistics for a single site:
One is output through CGI on GNU/Linux,
One is based on static page export on Windows 2000
Download/install
http://sourceforge.net/projects/awstats/ After downloading the installation package:
GNU/Linux: tar zxf awstats-version.tgz
The scripts and static files of awstats are in the wwwroot directory by default: deploy the awstats.pl program in the cgi-bin directory to /home/apache/cgi-bin/awstats/
mv awstats-version/wwwroot/cgi-bin /path/to/apache/cgi-bin/awstats
#Copy the icon and other file directories to the WEB HTML file publishing directory: /home/apache/htdocs/ and publish them under
More batch update scripts are in the tools directory and can be placed together in the cgi-bin/awstats/ directory.
Windows 2000: Run in background script mode, unpack directly, and then move to the D:AWStats directory
Copy the icon directory to the IIS release directory: inetpub/icon
[Cut-Page]
Data source log format and truncation rules by day
For Apache: The log format is easy to set: just set it to combined format. Log truncation is a little more troublesome: you need to install the cronolog tool and set the log to truncate by day:
CustomLog "|/usr/local/sbin/cronolog /path/to/apache/logs/access_log.%Y%m%d" combined
For example: logs/access_log.20030326
The log is in compressed format, you can use gzip -d < /home/apache/logs/access_log.%YYYY-24%MM-24%DD-24.gz | to dynamically decompress statistics.
For IIS: By default, there are better log truncation rules by day, but the log format of IIS is not suitable for AWStats statistics.
Therefore it is best to simply remove all log fields and set them strictly according to the following list
Date date
time time
Customer IP address c-ip
Usernamecs-username
methodcs-method
URI resource cs-uri-stem
Protocol status sc-status
Number of bytes sent sc-bytes
Protocol version cs-version
User agent cs (User-Agent)
Referencecs(Referer)
Compared to IIS default settings:
The reductions include:
Server IP address
Server port
URI query
Added are:
number of bytes sent
Protocol version
Reference
Naming rules for configuration files: awstats.sitename.conf
The main program of AWStats, awstats.pl, will automatically call the configuration file of the corresponding site according to the site name: awstats.sitename.conf
For example: running ./awstats.pl -config=chedong calls the awstats.chedong.conf configuration file in the same directory;
If -config is not specified, awstats.conf or /etc/awstats.conf in the current directory will also be found as the default configuration file.
So it is best to rename the default awstats.model.conf to awstats.yoursite.conf; for example: awstats.chedong.conf,
For statistics on multiple sites, the configuration file inclusion function of AWStats is still very useful. We can put the common configuration in one document, and then use the Include configuration (supported after version 5.4) to include the common configuration in each specific configuration file. header, and then use other configurations to override the corresponding properties in the general configuration, such as:
Include="common.conf"
LogFile="/path/to/bbs/access_log"
SiteName="bbs.chedong.com"
Minimal configuration file modification: LogFile SiteDomain LogFormat
For counting Apache logs on GNU/Linux, just modify these two options: LogFile SiteDomain
GNU/Linux LogFile="/path/to/apache/logs/access_log.%YYYY-24%MM-24%DD-24"
Windows 2000 LogFile="d:iis_logsW3SV3ex%YY-24%MM-24%DD-24.log"
This configuration means the log file name spelled out with the year, month, and date 24 hours ago;
SiteDomain="www.chedong.com"
The name of the site, which is empty by default. If it is empty, AWStats will refuse to run;
For statistical IIS logs, one more modification is required:
LogFormat=2
The default value is 1: Apache log, 2 is IIS log
[Cut-Page]
Other things to note:
AWStats does not filter swf files by default and will count .swf as PageView, so if the swf files on the site are mainly advertisements, it is best to filter them out:
Log analysis
./awstats.pl -update -config=sitename -lang=cn
For example: ./awstats.pl -update -config=chedong
The configuration file awstats.chedong.conf will be automatically called.
Statistical output
GNU/Linux http://localhost/cgi-bin/awstats/awstats.pl? config=chedong
Windows 2000 http://localhost/awstats/awstats.chedong.html
Log statistics run automatically
On GNU/Linux: crontab -e: run every day at 8:10
#updateawstats
10 8 * * * (cd /path/to/apache/cgi-bin/awstats/; ./awstats.pl -update -config=chedong)
On Windows 2000: Set to run at 8:10 every day
D:Perlbinperl.exe d:AWStatstoolsawstats_buildstaticpages.pl -update -config=chedong -lang=cn -dir=c:inetpubawstats -awstatsprog=d:awstatswwwroot cgi-binawstats.pl
Multi-site log statistics
AWStats comes with a batch processing tool: tools/awstats_updateall.pl, which can traverse all configuration files in a directory in batches and run statistics. Therefore, the remaining work is mainly about log synchronization.
For multiple sites, many configuration options are repeated. If each configuration file is modified and maintained, it will be very troublesome. AWStats has provided functions included in configuration files since version 5.4, so we can configure a common configuration, such as: common. conf
Then the configuration of other sites is set to: you can use the following options to override the configuration that is inconsistent with the default.
awstats.bbs.chedong.conf
Include "chedong.common.conf"
LogFile "/path/to/bbs_log"
SiteName "bbs.chedong.com"
awstats.www.chedong.conf
Include "chedong.common.conf"
LogFile "/path/to/www_log"
SiteName "www.chedong.com"
HostAliases="chedong.com"
Description of statistical indicators
Visitors: Statistics based on unique IP addresses of visitors, one IP represents one visitor;
Number of visits: A visitor may visit multiple times in one day (for example: once in the morning, once in the afternoon), so the number of visits by the visitor is calculated based on the number of unique IPs within a certain period of time (for example: 1 hour);
Number of web pages: The total number of pure page visits excluding images, CSS, JavaScript files, etc., but if a page uses multiple frames, each frame counts as a page request;
Number of files: The total number of file requests from the browser client, including pictures, CSS, JavaScript, etc. The user requests a page. If the page contains pictures, etc., multiple file requests will be made to the server. The number of files is generally much larger than the number of files. number;
Bytes: the total amount of data transmitted to the client;
Data from REFERER: The reference (REFERER) field in the log records the address before accessing the corresponding web page. Therefore, if the user clicks to enter the website through the search results of the search engine, there will be the user's query in the corresponding search engine in the log. Address, from this address, the keywords used in user queries can be extracted through parsing:
for example:
2003-03-26 15:43:58 123.123.123.123 - GET /index.html 200 192 HTTP/1.1 Mozilla/4.0+(compatible; +MSIE+5.01; +Windows+NT+5.0) http://www.google .com/search? q=chedong
AWStats has relatively complete functions in search engine key phrases and keyword statistics: it can identify more than 300 machine crawlers around the world, and can identify most mainstream international search engines and local language search engines in many regions. .
[Cut-Page]
Hacking AWStats
Plug-in installation based on geographic information:
GeoIP and Geo::IPfree (awstats 5.5+)
Both GeoIP and Geo::IPfree are free country/IP mapping tables, which are more accurate and faster than the statistics obtained by DNS reverse domain name resolution. GeoIP APIs are free, the default library is free, and its data update service is charged. Not only is the code of Geo::IPfree public, but the library data is also public.
GeoIP installation:
Download the C library first: GeoIP C after unpacking
%./configure; make
#make install
Then download the Perl library: GeoIP Perl after unpacking
%perl MakeFile.PL; make
#make install
Geo::IPfree installation:
After downloading Geo::IPfree and unpacking it
%perl Makefile
%make
#make install
Configuration: By enabling GEOIP related plug-ins in the configuration file:
LoadPlugin="geoip GEOIP_STANDARD /home/apache/chedong.com/cgi-bin/awstats/GeoIP.dat"
LoadPlugin="geoip_city_maxmind GEOIP_STANDARD /home/apache/chedong.com/cgi-bin/awstats/GeoLiteCity.dat"
MaxMind currently provides GeoIP and GeoIPCityLite data packages for free: they can be downloaded regularly every month from the following address
http://www.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
http://www.maxmind.com/download/geoip/database/GeoIP.dat.gz
[Cut-Page]The log statistics system plays an important role in the user behavior analysis of the site, especially for keyword access statistics from search engines: it is a very effective source of user behavior analysis data. With the development of the Internet over the years, WEB log statistics tools have become more and more mature and have more and more functions. Many of them are open source, and AWStats is one of the best ones.
AWStats: Advanced Web Statistics
AWStats is a Perl-based WEB log analysis tool developed rapidly on Sourceforge. Compared with Webalizer, another excellent open source log analysis tool, AWStats has the following advantages:
Friendly interface: You can directly call the corresponding language interface according to the browser (simplified Chinese version is available)
Reference output sample: http://www.chedong.com/cgi-bin/awstats/awstats.pl? config=chedong
Based on Perl: and solves the cross-platform issue very well. The system itself can run on GNU/Linux or Windows (after ActivePerl is installed); the analyzed logs directly support Apache format (combined) and IIS format (need to be modified). Although Webalizer also has a Windows platform version, it currently lacks maintenance;
AWStats can completely use one system to complete unified statistics on different WEB servers of your own site: GNU/Linux/Apache and Windows/IIS servers.
Relatively high efficiency: AWStats output statistics items are much richer than Webalizer, and the speed can still reach about 1/3 of Webalizer. For a site with millions of daily visits, this speed is sufficient;
Convenient configuration/customization: The system provides sufficiently flexible but reasonable default configuration rules. No more than 3 or 4 default configurations need to be modified to start running, and there are quite a few plug-ins that can be modified and expanded;
The designers of AWStats are designed for precise "Human visits", so many search engine robot visits are filtered out, so the numbers may be lower than other log statistics tools. Visits from within the company can also be passed through IP filtering settings filter out.
Provides many extended parameter statistics functions: using ExtraXXXX series configuration to generate application-specific parameter analysis will be very useful for product analysis.
For more comparisons with other tools: Webalizer, analog, please refer to:
http://awstats.sourceforge.net/#COMPARISON