[FNS-30] Reducing data volume requirements of FusionAnalytics

Description

Background

FusionAnalytics is a platform for simplifying complex data analysis. Part of its task is managing the storage requirements of analysis data. Typically, first installs use a “the more the better” mindset. However, for busy or poorly written systems, they can often generate a lot of data.
This technote article discusses where the data comes from, what data is needed and the different techniques to reduce the data volume requirements.

Where is the data?

FusionAnalytics includes the “Analytics for FusionReactor” data analysis application. In a typical scenario, data is generated by FusionReactor, stored in log files, then sent using a HTTP interface to FusionAnalytics. FusionAnalytics stores the files, then after processing them into a DB will archive them.

This gives us several points where data needs to be stored, and options for managing that incoming & stored volume.

Reducing incoming data

The first logical option is to look at reducing the amount of data coming into FusionAnalytics. The largest data volumes come from two log sets – the request logs and the JDBC logs. The request logs contain entries for each web application request hitting the server. The JDBC logs contain entries for every JDBC (SQL) query executed. Typically each request will contain one or more JDBC query entries and therefore the JDBC log is typically the largest.

Only log slow JDBC queries

Once you’re comfortable with your FusionReactor/FusionAnalytics setup, a simple method to reduce the data is to only log slow JDBC queries – because typically fast queries aren’t of so much concern for analysis.

How To?

To enable this, login to FusionReactor, click JDBC->JDBC Settings. Then under “JDBC Logging (Log File)” enter a number in the “Only queries slower than: (ms)” box. Whilst every system is different, values of 50 or 100ms should be a good starting point.
Documentation: https://intergral.atlassian.net/wiki/display/FR452/JDBC+Settings

  • For: Simple to implement
  • For: Requires less disk space on application server
  • For: Shorter data transfer time & less bandwidth used between FR & FA
  • For: Data still available in FR web interface
  • For: FA log file data volume requirements lowered
  • For: FA DB data volume requirements lowered
  • Against: Can hide issues with high volume of fast queries
  • Against: Fast queries won’t be logged to disk, even on the application server.

Don’t automatically send JDBC data to FA

An alternative (or even additional) option is to not automatically send JDBC log data to FA from FR. Doing this keeps the (typically larger) JDBC log data on the application server only. This helps by keeping the FA log archive & DB data volume requirements lower.

How To?

To enable this, login to FusionReactor, click Plugins->Active Plugins. Then under the “FusionReactor Log Rotator” plugin, click “Configuration”.
Change the “JDBC Log” option from “Transfer and Archive” to “Archive only”.
Documentation: https://intergral.atlassian.net/wiki/display/FR452/FusionReactor+Log+Rotator+Plugin

  • For: Shorter data transfer time & less bandwidth used between FR & FA
  • For: Data still available in FR web interface
  • For: FA log file data volume requirements lowered
  • For: FA DB data volume requirements lowered
  • Against: Requires manual action to get FA to analyze JDBC data.
  • Against: Can result in client wanting to keep log archive data on the application server for a longer period of time. (eg trade-off FA server log/DB space for application server space)

Reducing stored LOG files (on the FA server)

Approximately 5-10% of all FA data is required for a) log queue (incoming, queued & processing) b) log archive.

Reducing the storage requirements for incoming logs

Without actually altering the content of incoming logs, their size can be reduced by compressing them before they’re sent to FA. Although this is enabled by default, ensure it is enabled.

How To?

To enable this, login to FusionReactor, click Analytics->FusionAnalytics Targets. Ensure “Sent to targets by this instance (uncompressed)” is NOT selected.
Documentation: https://intergral.atlassian.net/wiki/display/FR452/FusionAnalytics+Settings

  • For: Simple to implement
  • For: Shorter data transfer time & less bandwidth used between FR & FA
  • For: Data still available in FR web interface
  • For: FA log file data volume requirements lowered

Deleting old archives

Old log archives can be deleted after processing. For auditing or rebuilding a broken DB, it’s possible to keep the processed logs for a period of time. If neither of these issues are a concern, you can minimize the volume of archived log files to be kept.

How To?

Login to FusionAnalytics:DataCollector (FADC). “Stop” the application. Click “Configure”. Under the “FusionAnalytics Processed Log File Management” header, enable the option and set how long you want to keep log files for. Alternatively, you can choose a data volume (disk space) based strategy. To keep a minimal set of logs, set the retention period to “1 day” / “1 MB”.
Documentation: https://intergral.atlassian.net/wiki/pages/viewpage.action?pageId=22479170#ApplicationConfigurationFADC-filemanagment

  • For: Simple to implement
  • For: FA log file data volume requirements lowered
  • Against: Log files no longer available for auditing or should FA DB fail

Reducing DB size (FA server DB)

Deleting historical high resolution data.

Typically the value of the high resolution (/high detail) data degrades rapidly with time. EG it’s rarely useful to see the exact DB queries running on the same day in the previous year. However, the low resolution (/aggregate) data can in many way become more useful (eg capacity planning – show me a chart of how many DB queries I’ve run over the last year).
High resolution data takes a lot of DB space. Deleting it sooner (but keeping low resolution, aggregate data) can save lots of disk space.
EG: On a busy system receiving over 10,000,000 requests per day, changing high-resolution storage from 1year to 1week made a better than 98% DB size saving.

How To?

Login to FusionAnalytics:DataCollector (FADC). “Stop” the application. Click “Configure”. Under the “FusionAnalytics DB Data Management” header, enable the option and set how long you want to keep high/low resolution data for.
Documentation: https://intergral.atlassian.net/wiki/pages/viewpage.action?pageId=22479170#ApplicationConfigurationFADC-datamanagment

  • For: Simple to implement
  • For: FA DB data volume requirements lowered
  • Against: Detailed data is only available for a shorter period of time

DB optimization

Whilst specifics are too detailed for this technote, it’s worth noting that your DB can likely be configured to optimize/return space to the host system. Some of these options would also benefit performance. For example, using our recommended SQL Server DB:

  • Set the recovery model to “Simple”
  • Set “Auto Shrink” to “True”

Further details

For more information including our installation & configuration service or product consulting, please contact

Issue Details

Type: Technote
Issue Number: FNS-30
Components: DataCollector
Environment:
Resolution: Fixed
Added: 08/10/2012 14:36:00
Affects Version: 1.0.4
Fixed Version: 1.1.0
Server:
Platform:
Related Issues: None