1.9:Stats
Contents |
How Denora currently works
Upon installation, there is the main denora.sql file which needs to be loaded, and an ircd-specific .sql file, which adds proper mode_* fields to the chan and user tables.
We have the following tables:
Admin
- admin: denora admin users from admin.db and from config file
Network
- current: current values (users, chans, daily users, servers, opers), one record per type
- maxvalues: max values (users, channels, servers, opers), one record per type
- ctcp: global ctcp stats (version, current count, overall count)
- tld: global tld stats (country code, country name, current count, overall count)
- glines: type, user, host, setby, setat, expires, reason
- sgline: mask, reason
- sqline: mask, reason
- spamfilters: target, action, setby, expires, setat, duration, reason, regex
Servers
- server: info (server name, hops, linked to, connected time, online Y/N, lastsplit, version, uptime, motd, uline) and stats (current users, max users, max user time, ping, highest ping, max ping time, last ping time, ircop kills, server kills, opers, max opers, max oper time)
Channels
- chan: channels with basic stats (current users, max users, max user time, kick count) and information (topic, topic author, topic time) and all the mode_* fields (set to Y/N)
- chanbans: one record per ban mask / channel
- chanexcept: one record per except mask / channel
- chaninvites: one record per invite mask / channel
- chanquiet: one record per quiet mask / channel
Users
- user: users with following info: (nickname, hop count, real name, hostname, cloaked hostname, nickip, username, swhois, account (for some ircds), connectedtime, server id, server, away Y/N, away message, ctcp version, online Y/N, last quit time, last quit message, country code, country) and all the mode_* fields (set to Y/N)
Daily Stats
These tables get updated once every hour per internal cron event
- stats: stores user count (one record per day, one column per hour)
- channelstats: same as above but for channels
- serverstats: same as above but for servers
Chanstats
- cstats: contains one record per channel and type (type being day, week, month, total) with following information (amount of letters, words, lines, actions, smileys, kicks, modes, topics, last spoken time, time added) and one column per hour for written lines
- ustats: contains one record per stats username, channel (including the 'global' channel which means network-wide) and type (type being day, week, month, total) with following information (amount of letters, words, lines, actions, smileys, kicks, kicked, modes, topics, ljoin, wasted time, last spoke time, time added) and one column per hour for written lines
n-n Tables
- ison: n-n relationship which says which nick id is in which channel ip and with which channel modes (one column per mode)
- aliases: n-n relationship between nicknames and unames (stats usernames) with ignore flag
Current problems with Denora
There are a few problems with the current Denora implementation which I would like to discuss
Inconsistency
Naming of tables / columns is inconsistent across the database. This makes it a bit confusing at times when coding.
Unoptimized
There are many columns which are not well defined. For example some channel names set to varchar of length 255. Columns should have the right type and the maximum needed length set, not more. Also, I don't like it that we have one column per hour (see daily stats tables and chanstats tables) and one column per mode (see chan, user and ison tables).
Incompatible between ircds
The fact that we use one column per mode implies that the schema is not compatible between all ircds. Moving between ircds involves tinkering with the db schema.
High query load
There are a number of situations where one event triggers a series of SQL queries which, especially on larger networks, have a bad impact on performance. I will list those that come to my mind:
- When a user joins the network, this involves adding (or updating) him to the user table, alter the users record in the current table (in some cases also in the maxvalues table), alter the current users value in the server table for the user's server, updating the tld table, updating the ctcp table, and in some cases altering the ison table to update nick/uname relationships
- When a user joins a channel, this involves updating the ison table, updating the channel record in the chan table, then if the channel is chanstats enabled create 4 records in the ustats table for that channel and for global, if not already present
- When a user says something in a chanstats monitored channel, this involves updating 4 records in the cstats table and 8 records in the ustats table
How Anope could do better
As you can see above, there is some crazy shit going on in Denora. I will now try to explain how Anope can do better, and propose an optimized DB schema. There will be three things to focus on:
- Better performance
- Yet more information
- Flexibility (one scheme fits all)
The stats implementation in Anope should be divided into two modules: StatServ and ChanStats. StatServ will be collecting stats in the background (servers, channels, users, ctcp versions, countries) while ChanStats will extend BotServ in that the bot will track channel activity. This feature can be enabled per channel via a /bs command, while each user must enable it via a /ns command that allows botserv to track him. Users with chanstats disabled will be considered as anonymous in chanstats monitored channels.
Here is my proposal for the StatServ SQL schema:
Admins
Since Anope has OperServ, we do not need any special stat admin users now.
Network
- We do not need the current and maxvalues tables, since we can easily sum up the values from the other tables. This saves us 1-2 queries per user connect/quit, channel creation, and so on.
CREATE TABLE `anope_ss_clients` ( `id` int(11) NOT NULL AUTO_INCREMENT, `version` varchar(255) NOT NULL, `count` int(11) NOT NULL, `overall` int(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `version` (`version`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `anope_ss_countries` ( `id` int(11) NOT NULL AUTO_INCREMENT, `code` varchar(3) NOT NULL, `name` varchar(32) NOT NULL, `count` int(11) NOT NULL, `overall` int(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `code` (`code`), UNIQUE KEY `name` (`name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `anope_ss_spamfilter` ( `id` int(11) NOT NULL AUTO_INCREMENT, `target` varchar(16) NOT NULL, `action` varchar(16) NOT NULL, `set_by` varchar(31) NOT NULL, `set_time` int(11) NOT NULL, `expiry_time` int(11) NOT NULL, `reason` varchar(127) NOT NULL, `regex` varchar(255) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- This is almost the same as in denora now.
CREATE TABLE `anope_ss_lines` ( `id` int(11) NOT NULL AUTO_INCREMENT, `type` varchar(2) NOT NULL, `user` varchar(31) DEFAULT NULL, `host` varchar(255) DEFAULT NULL, `mask` varchar(255) DEFAULT NULL, `set_by` varchar(255) DEFAULT NULL, `set_time` int(11) DEFAULT NULL, `expiry_time` int(11) DEFAULT NULL, `reason` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- This combines the glines, sgline and sqline tables into one. Actually this allows any type of *lines to be stored here, using the type field.
- To be honest, I don't know if the ss_lines and the ss_spamfilter features are of any meaningful use. Maybe they could be implemented as a separate module at a later point.
Servers
CREATE TABLE `anope_ss_servers` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(64) NOT NULL, `hops` tinyint(3) NOT NULL, `comment` varchar(255) NOT NULL, `linked_id` int(11) DEFAULT NULL, `link_time` int(11) DEFAULT NULL, `online` tinyint(1) NOT NULL DEFAULT '1', `split_time` int(11) DEFAULT NULL, `version` varchar(127) NOT NULL, `uptime` mediumint(9) NOT NULL, `motd` text NOT NULL, `users_current` smallint(6) NOT NULL, `users_max` smallint(6) NOT NULL, `users_max_time` int(11) NOT NULL, `ping` smallint(6) NOT NULL, `ping_max` smallint(6) NOT NULL, `ping_time` int(11) NOT NULL, `ping_max_time` int(11) NOT NULL, `ulined` tinyint(1) NOT NULL DEFAULT '0', `kills_ircops` mediumint(9) NOT NULL, `kills_server` mediumint(9) NOT NULL, `opers` tinyint(4) NOT NULL, `opers_max` tinyint(4) NOT NULL, `opers_max_time` int(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `name` (`name`), KEY `linked_id` (`linked_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- Not much difference to Denora here.
Channels
CREATE TABLE `anope_ss_channels` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(64) NOT NULL, `users_max` smallint(6) NOT NULL DEFAULT '0', `users_max_time` int(11) NOT NULL, `topic` text, `topic_author` varchar(31) DEFAULT NULL, `topic_time` int(11) DEFAULT NULL, `kick_count` mediumint(9) NOT NULL DEFAULT '0', `ban_count` mediumint(9) NOT NULL DEFAULT '0', `modes` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- Instead of having one field per mode, we just have one modes field which contains a string of the modes. Looking for a specific mode can be done by using
... WHERE modes LIKE '%m%' - We do not have a users_current value, we can gather that by summing up the users in the channel in the anope_ss_channels_users table. This saves us a query per user join/part and one per channel on user quit.
CREATE TABLE `anope_ss_channels_users` ( `user_id` int(11) NOT NULL, `channel_id` int(11) NOT NULL, `modes` varchar(8) NOT NULL, PRIMARY KEY (`user_id`,`channel_id`), KEY `channel_id` (`channel_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; ALTER TABLE `anope_ss_channels_users` ADD CONSTRAINT `anope_ss_channels_users_ibfk_2` FOREIGN KEY (`channel_id`) REFERENCES `anope_ss_channels` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, ADD CONSTRAINT `anope_ss_channels_users_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `anope_ss_users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
- This links the users to the channels using foreign keys, meaning that deleting a user or a channel will automatically delete them from this table without the nedd of additional queries.
- As above, only one modes field containing the +vhoaq stuff.
CREATE TABLE `anope_ss_channels_masks` ( `id` int(11) NOT NULL AUTO_INCREMENT, `channel_id` int(11) NOT NULL, `mode` char(1) NOT NULL, `mask` varchar(255) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `channel_id_2` (`channel_id`,`mode`,`mask`), KEY `channel_id` (`channel_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; ALTER TABLE `anope_ss_channels_masks` ADD CONSTRAINT `anope_ss_channels_masks_ibfk_1` FOREIGN KEY (`channel_id`) REFERENCES `anope_ss_channels` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
- This combines the bans / excepts / invites into one table
- Deleting a channel will automatically delete the masks, saving 3 queries.
Users
CREATE TABLE `anope_ss_users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `nickname` varchar(31) NOT NULL, `nickserv_display` varchar(255) DEFAULT NULL, `server_id` int(11) DEFAULT NULL, `realname` varchar(51) NOT NULL, `username` varchar(64) NOT NULL, `hostname` varchar(64) NOT NULL, `account` varchar(100) NOT NULL, `vhost` varchar(64) NOT NULL, `swhois` varchar(255) NOT NULL, `ip` varchar(32) NOT NULL, `connect_time` int(11) NOT NULL, `away` tinyint(1) NOT NULL DEFAULT '0', `away_message` varchar(255) DEFAULT NULL, `away_time` int(11) DEFAULT NULL, `client_id` int(11) DEFAULT NULL, `online` tinyint(1) NOT NULL DEFAULT '1', `quit_time` int(11) DEFAULT NULL, `quit_message` varchar(255) DEFAULT NULL, `country_id` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `server_id` (`server_id`), KEY `client_id` (`client_id`), KEY `country_id` (`country_id`), KEY `nickserv_display` (`nickserv_display`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; ALTER TABLE `anope_ss_users` ADD CONSTRAINT `anope_ss_users_ibfk_4` FOREIGN KEY (`nickserv_display`) REFERENCES `anope_ns_core` (`display`) ON DELETE SET NULL ON UPDATE CASCADE, ADD CONSTRAINT `anope_ss_users_ibfk_1` FOREIGN KEY (`server_id`) REFERENCES `anope_ss_servers` (`id`) ON DELETE SET NULL ON UPDATE CASCADE, ADD CONSTRAINT `anope_ss_users_ibfk_2` FOREIGN KEY (`client_id`) REFERENCES `anope_ss_clients` (`id`) ON DELETE SET NULL ON UPDATE CASCADE, ADD CONSTRAINT `anope_ss_users_ibfk_3` FOREIGN KEY (`country_id`) REFERENCES `anope_ss_countries` (`id`) ON DELETE SET NULL ON UPDATE CASCADE;
- There is a foreign key to the nickserv display name
- Instead of saving the server name, we store the server id which is a foreign key as well
- We do not need the number of hops, we can get them from the server table
- We do not need to store country code and country name, we store the country_id from the foreign country table
- We do not store the client version, but use the client id from the client table
Hourly Stats
CREATE TABLE `anope_ss_hourly` ( `id` int(11) NOT NULL AUTO_INCREMENT, `day` date NOT NULL, `hour` tinyint(2) NOT NULL, `servers` tinyint(4) NOT NULL DEFAULT '0', `channels` smallint(6) NOT NULL DEFAULT '0', `users` mediumint(9) NOT NULL DEFAULT '0', `opers` tinyint(4) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), UNIQUE KEY `day` (`day`,`hour`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- This combines the user/channel/server hourly tables into one. This saves just 2 queries per hour :)
- This table allows for much easier and flexible stats gathering (simpler and faster queries) since we do not need to sum up 24 columns but can simply use queries like
SELECT AVG(users) WHERE YEAR(day) = 2011.
BotServ Chanstats
CREATE TABLE `anope_bs_chanstats` ( `id` int(11) NOT NULL AUTO_INCREMENT, `chanserv_name` varchar(255) NOT NULL DEFAULT , `nickserv_display` varchar(255) DEFAULT NULL, `day` date NOT NULL, `hour` tinyint(2) NOT NULL, `letters` int(10) unsigned NOT NULL DEFAULT '0', `words` int(10) unsigned NOT NULL DEFAULT '0', `lines` int(10) unsigned NOT NULL DEFAULT '0', `actions` int(10) unsigned NOT NULL DEFAULT '0', `smileys_happy` int(10) unsigned NOT NULL DEFAULT '0', `smileys_sad` int(10) unsigned NOT NULL DEFAULT '0', `smileys_other` int(10) unsigned NOT NULL DEFAULT '0', `kicks` int(10) unsigned NOT NULL DEFAULT '0', `kicked` int(10) unsigned NOT NULL DEFAULT '0', `modes` int(10) unsigned NOT NULL DEFAULT '0', `topics` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`id`), UNIQUE KEY `chanserv_name` (`chanserv_name`,`nickserv_display`,`day`,`hour`), KEY `nickserv_display` (`nickserv_display`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; ALTER TABLE `anope_bs_chanstats` ADD CONSTRAINT `anope_bs_chanstats_ibfk_2` FOREIGN KEY (`nickserv_display`) REFERENCES `anope_ns_core` (`display`) ON DELETE CASCADE ON UPDATE CASCADE, ADD CONSTRAINT `anope_bs_chanstats_ibfk_1` FOREIGN KEY (`chanserv_name`) REFERENCES `anope_cs_info` (`name`) ON DELETE CASCADE ON UPDATE CASCADE;
- This merges the ustats and cstats tables.
- Channel name and nickname are linked to the respective table. The "NULL" nickname is the anonymous one, meaning that not-registered users, or users with chanstats disabled, will be handled anonymously and all added to the same record.
- Only 1 query instead of 12 per message!! This will dramatically reduce the sql load ;)
- This allows for hourly stats, in addition to daily, weekly, monthly, yearly and total which was available before. It is totally flexible which period you want to look at, it all depends on the query you make. You could for example look at how busy the channel was the whole year between 20:00 and 21:00.
- To get the stats for the complete channel, you would need to sum up the records. Same goes for user stats.
- I will have to evaluate if creating VIEWS, functions or stored procedures will increase performance when reading stats and which is the best way to do this. Stay tuned... :)
GeoIP
To gather country information based on the ip address, Denora makes use of the GeoIP country lite database. In Anope we could go a step further and implement the GeoIP City Lite database, which would allow us to store geolocation data for the users. That would make cool things possible like showing the users on a Google Map. The question would be: is it better to use the GeoIP library and the proprietary .db file, or shall we import their csv file to mysql and do the stuff with mysql queries? I suspect from a performance perspective using the .db files would be faster, but we should check that before implementing.