Uploaded image for project: 'CiviCRM'
  1. CiviCRM
  2. CRM-21079

Datetime logs don't supports timezones

    Details

    • Type: Bug
    • Status: Done/Fixed
    • Priority: Major
    • Resolution: Fixed/Completed
    • Affects Version/s: 4.7.0
    • Fix Version/s: None
    • Component/s: Core CiviCRM
    • Labels:
    • Versioning Impact:
      Patch (backwards-compatible bug fixes)
    • Documentation Required?:
      None
    • Funding Source:
      Needs Funding
    • Verified?:
      No

      Description

      DON'T PANIC.

       

      MySQL supports several ways of tracking dates and times, such as DATETIME and TIMESTAMP columns. The differences are subtle. For many organizations which work within one timezone, they're even indistinguishable. But for other organizations spread across multiple timezones, confusing DATETIME and TIMESTAMP can lead to inaccurate or confusing data.

      Unfortunately, some unknown person working on Civi v1.x in the antiquity of the early 2000's made a mistake: they flagged a column as DATETIME when the more correct and maintainable choice would have been TIMESTAMP. (They may have had reasons – such as CRM-16431, a long-standing flaw in DB_DataObject which had prevented correct handling of TIMESTAMP}}s.) Then other people saw the precedent and emulated it. Eventually, a large number of fields were created as {{DATETIME when TIMESTAMP would have been more appropriate.

      For a demonstration of what happens when these are confused, see the discussion of "civicrm_log.modified_date" in https://github.com/civicrm/org.civicrm.doctorwhen/blob/master/doc/PROBLEM.md

      Risk Profile

      There's been some awareness and some trepidation about this problem. For example, CRM-9683 and CRM-18516 fixed incorrect DATETIME fields in the context of CiviMail and group/ACL caches. But even there, it took a long time for the fix to go into the mainstream repos – and (for CiviMail) the current fix only helps new installations. Why the slow uptake? A couple reasons:

      • The problem is obscure. Most users don't look critically at log datetimes. Moreover, your own datetimes always seem accurate (and so would datetimes involving all other people in your timezone). You only get a problem when looking at datetimes generated in a different timezone.
      • The timestamp/timezone problem-space has many edges – involving SQL schema, SQL configuration, PHP configuration, CMS conventions, CMS addons, user-preferences, each organization's geography, etc. It's hard for most people to keep it all straight.
      • Third-party scripts/integrations may interpret the existing DATETIME fields in some way which would change if the database were fixed. (This doesn't mean that those scripts work correctly today. Maybe they're perfect, or maybe the symptoms are just too subtle.)

      Breadth

      All of the following tables have columns which are suspect. However, one would need to audit them individually to determine their correctness.

       * civicrm_activity
       * civicrm_batch
       * civicrm_campaign
       * civicrm_survey
       * civicrm_dashboard_contact
       * civicrm_subscription_history
       * civicrm_contribution
       * civicrm_contribution_page
       * civicrm_contribution_recur
       * civicrm_action_log
       * civicrm_custom_group
       * civicrm_dashboard
       * civicrm_email
       * civicrm_file
       * civicrm_log
       * civicrm_setting
       * civicrm_tag
       * civicrm_uf_group
       * civicrm_event
       * civicrm_participant
       * civicrm_financial_item
       * civicrm_financial_trxn
       * civicrm_payment_token
       * civicrm_mailing_abtest
       * civicrm_pledge
       * civicrm_pledge_payment
       * civicrm_price_field
       * civicrm_queue_item

       

      Resolution Process

      Generally, when updating Civi's schema, one would use an automated upgrade step. That's problematic for migrating DATETIME=>TIMESTAMP:

      • You shouldn't force a schema change in a minor upgrade – there are plenty of people happily chugging along with DATETIME. If making a change is risky, then they'd be unhappy about making the change right now. Let them do it whenever they take on another major testing-cycle.
      • You shouldn't wait until a major upgrade – if you actually collect data across many timezones, and if you want the data to be accurate, then you should go ahead and switch to TIMESTAMP. Waiting for a major upgrade would just be procrastinating.
      • At time of writing, it's not clear what data-cleanup logic makes the most sense.
      • New sites can be probably go ahead with improved schema – they don't have the cleanup questions.

      Instead, this issue proposes a more incremental, self-directed approach:

      Open Issue: Determine timezone adjustment logic

      Suppose we have a column like "civicrm_log.modified_date". It is initially flagged as DATETIME, and each value was stored relative to the active timezone for the user/session. But what happens if you change it to TIMESTAMP (which is stored relative to GMT)? Do we need to clean this data?

      • Maybe you just don't care. No one cared before today! It doesn't matter if historical records are wrong by a couple hours.
      • If all users/sessions use the same timezone, then maybe you should do a simple adjustment (adding or subtracting a fixed #hours). Or maybe that's the default behavior when you adjust the schema.
      • If users/sessions have multiple timezones, then what? Maybe:
        • Pick the most common timezone. Interpret all records as if they were part of that timezone.
        • Pick a middle-ground timezone. Interpret all records as if they were part of that timezone.
        • For each log record, look at the corresponding user's preferences and infer the most likely timezone.

      DoctorWhen is patch-welcome for implementing any of these cleanup strategies.

      Open Issue: Incrementally evaluate use-cases and update default schema

      There are currently ~30 tables with DATETIME columns. However, the code/business-processes/dataflows around each one are different – eg "civicrm_log.modified_date" and "civicrm_activity.activity_date_time" are quite different.

      For each one, I suggest this process:

      • Think about or investigate the use-case. Figure out what would make sense.
        • If keeping DATETIME makes more sense, note that here in JIRA.
        • If changing to TIMESTAMP makes more sense, proceed...
      • Prepare a new PR:
        • Update "xml/schema" and run GenCode to update the DAO. This will change the default on new deployments.
        • Update "CRM_Utils_Check_Component_Timestamps::getConvertedTimestamps()". This will display an advisory on upgraded deployments.
        • Run "Doctor When" on your dev site to change the schema.
        • Test whatever use-cases you can think of which touch this field.
        • In the PR description, document the scenarios you considered/tested.
        • (If there were things which you couldn't test, which you had doubts about, or which should be communicated to users.... document those in the PR as well.)
      • After the PR is merged and eventually published, new sites will use the updated schema, and existing deployments will be encouraged to switch.

       

      General Comments

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                timotten Tim Otten
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: