FusionAnalytics is a platform for simplifying complex data analysis. Part of its task is managing the storage requirements of analysis data. Typically, first installs use a “the more the better” mindset. However, for busy or poorly written systems, they can often generate a lot of data.
This technote article discusses where the data comes from, what data is needed and the different techniques to reduce the data volume requirements.
FusionAnalytics includes the “Analytics for FusionReactor” data analysis application. In a typical scenario, data is generated by FusionReactor, stored in log files, then sent using a HTTP interface to FusionAnalytics. FusionAnalytics stores the files, then after processing them into a DB will archive them.
This gives us several points where data needs to be stored, and options for managing that incoming & stored volume.
The first logical option is to look at reducing the amount of data coming into FusionAnalytics. The largest data volumes come from two log sets – the request logs and the JDBC logs. The request logs contain entries for each web application request hitting the server. The JDBC logs contain entries for every JDBC (SQL) query executed. Typically each request will contain one or more JDBC query entries and therefore the JDBC log is typically the largest.
Once you’re comfortable with your FusionReactor/FusionAnalytics setup, a simple method to reduce the data is to only log slow JDBC queries – because typically fast queries aren’t of so much concern for analysis.
To enable this, login to FusionReactor, click JDBC->JDBC Settings. Then under “JDBC Logging (Log File)” enter a number in the “Only queries slower than: (ms)” box. Whilst every system is different, values of 50 or 100ms should be a good starting point.
An alternative (or even additional) option is to not automatically send JDBC log data to FA from FR. Doing this keeps the (typically larger) JDBC log data on the application server only. This helps by keeping the FA log archive & DB data volume requirements lower.
To enable this, login to FusionReactor, click Plugins->Active Plugins. Then under the “FusionReactor Log Rotator” plugin, click “Configuration”.
Change the “JDBC Log” option from “Transfer and Archive” to “Archive only”.
Approximately 5-10% of all FA data is required for a) log queue (incoming, queued & processing) b) log archive.
Without actually altering the content of incoming logs, their size can be reduced by compressing them before they’re sent to FA. Although this is enabled by default, ensure it is enabled.
To enable this, login to FusionReactor, click Analytics->FusionAnalytics Targets. Ensure “Sent to targets by this instance (uncompressed)” is NOT selected.
Old log archives can be deleted after processing. For auditing or rebuilding a broken DB, it’s possible to keep the processed logs for a period of time. If neither of these issues are a concern, you can minimize the volume of archived log files to be kept.
Login to FusionAnalytics:DataCollector (FADC). “Stop” the application. Click “Configure”. Under the “FusionAnalytics Processed Log File Management” header, enable the option and set how long you want to keep log files for. Alternatively, you can choose a data volume (disk space) based strategy. To keep a minimal set of logs, set the retention period to “1 day” / “1 MB”.
Typically the value of the high resolution (/high detail) data degrades rapidly with time. EG it’s rarely useful to see the exact DB queries running on the same day in the previous year. However, the low resolution (/aggregate) data can in many way become more useful (eg capacity planning – show me a chart of how many DB queries I’ve run over the last year).
High resolution data takes a lot of DB space. Deleting it sooner (but keeping low resolution, aggregate data) can save lots of disk space.
EG: On a busy system receiving over 10,000,000 requests per day, changing high-resolution storage from 1year to 1week made a better than 98% DB size saving.
Login to FusionAnalytics:DataCollector (FADC). “Stop” the application. Click “Configure”. Under the “FusionAnalytics DB Data Management” header, enable the option and set how long you want to keep high/low resolution data for.
Whilst specifics are too detailed for this technote, it’s worth noting that your DB can likely be configured to optimize/return space to the host system. Some of these options would also benefit performance. For example, using our recommended SQL Server DB: