Wednesday, January 10, 2018

How to work with dates and time with Python

https://opensource.com/article/17/5/understanding-datetime-python-primer

Get a better understanding of datetime in Python with this primer.

How to work with dates and time with Python
Image by : 
Matteo Ianeselli. Modified by Opensource.com. CC-BY-3.0.
When trying to make things work with the datetime module, most Python users have faced a point when we resort to guess-and-check until the errors go away. datetime is one of those APIs that seems easy to use, but requires the developer to have a deep understanding of what a few things actually mean. Otherwise, given the complexity of date- and time-related issues, introducing unexpected bugs is easy.

Time standards

The first concept to grasp when working with time is a standard that defines how we can measure units of time. The same way we have standards to measure weight or length that define kilograms or meters, we need an accurate way to define what a 'second' means. We can then use other time references—such as days, weeks, or years—using a calendar standard as multiples of the second (see Gregorian Calendar as an example).

UT1

One of the simplest ways to measure a second is as a fraction of the day, given that we can reliably guarantee the sun will rise and set every day (in most places). This gave birth to Universal Time (UT1), the successor of GMT (Greenwich Mean Time). Today, we use stars and quasars to measure how long it takes for the Earth to perform a full rotation around the sun. Even if this seems precise enough, it still has issues; due to the gravitational pull of the moon, tides, and earthquakes, the days change length all year long. Although this is not a problem for most applications, it becomes a non-trivial problem when we require really precise measurements. GPS triangulation is a good example of a time-sensitive process, in which being a second off results in a completely different location on the globe.

TAI

As a result, the International Atomic Time (TAI) was designed to be as precise as possible. Using atomic clocks in multiple laboratories across the earth, we get the most accurate and constant measure of the second, which allows us to compute time intervals with the highest accuracy. This precision is both a blessing and a curse as TAI is so exact that it deviates from UT1 (or what we call civil time). This means that we will eventually have our clock noon deviate substantially from the solar noon.

UTC

That led to the development of Coordinated Universal Time (UTC), which brought together the best of the both units. UTC uses the measurement of a second as defined by TAI. This allows for accurate measurement of time while introducing leap seconds to ensure that the time does not deviate from UT1 by more than 0.9 seconds.

How all this plays together on your computer

With all this background, you should now be able to understand how the operating system is serving time at any given moment. While the computer doesn't have an atomic clock inside but uses an internal clock synchronized with the rest of the world via Network Time Protocol (NTP).
In Unix-like systems, the most common way to measure time is by using POSIX time, which is defined as the number of seconds that have elapsed the Unix epoch (Thursday, January 1, 1970), without taking leap seconds into account. As POSIX time does not handle leap seconds (nor does Python), some companies have defined their own way of handling time by smearing the leap second across the time around it through their NTP servers (see Google time as an example).

Time zones

Time zones maps
I've explained what UTC is and how it allows us to define dates and times, but countries like to have their wall time noon match with the solar time for noon, so the sun is at the top of the sky at 12 PM. That is why UTC defines offsets, so we can have 12 AM with an offset of +4 hours from UTC. This effectively means that the actual time without offset is 8 AM.
Governments define the standard offset from UTC that a geographical position follows, effectively creating a time zone. The most common database for time zones is known as the Olson Database. This can be retrieved in Python using dateutil.tz:
>>> from dateutil.tz import gettz >>> gettz("Europe/Madrid")
The result of gettz gives us an object that we can use to create time-zone-aware dates in Python:
>>> import datetime as dt >>> dt.datetime.now().isoformat() '2017-04-15T14:16:56.551778'  # This is a naive datetime >>> dt.datetime.now(gettz("Europe/Madrid")).isoformat() '2017-04-15T14:17:01.256587+02:00'  # This is a tz aware datetime, always prefer these
We can see how to get the current time via the now function of datetime. On the second call we pass a tzinfo object which sets the time zone and displays the offset in the ISO string representation of that datetime.
Should we want to use just plain UTC in Python 3, we don't need any external libraries:
>>> dt.datetime.now(dt.timezone.utc).isoformat() '2017-04-15T12:22:06.637355+00:00'

DST

Once we grasp all this knowledge, we might feel prepared to work with time zones, but we must be aware of one more thing that happens in some time zones: Daylight Saving Time (DST).
The countries that follow DST will move their clocks one hour forward in spring, and one hour backward in autumn to return to the standard time of the time zone. This effectively implies that a single time zone can have multiple offsets, as we can see in the following example:
>>> dt.datetime(2017, 7, 1, tzinfo=dt.timezone.utc).astimezone(gettz("Europe/Madrid")) '2017-07-01T02:00:00+02:00' >>> dt.datetime(2017, 1, 1, tzinfo=dt.timezone.utc).astimezone(gettz("Europe/Madrid")) '2017-01-01T01:00:00+01:00'
This gives us days that are made of 23 or 25 hours, resulting in really interesting time arithmetic. Depending on the time and the time zone, adding a day does not necessarily mean adding 24 hours:
>>> today = dt.datetime(2017, 10, 29, tzinfo=gettz("Europe/Madrid")) >>> tomorrow = today + dt.timedelta(days=1) >>> tomorrow.astimezone(dt.timezone.utc) - today.astimezone(dt.timezone.utc) datetime.timedelta(1, 3600)  # We've added 25 hours
When working with timestamps, the best strategy is to use non DST-aware time zones (ideally UTC+00:00).

Serializing your datetime objects

The day will come that you need to send your datetime objects in JSON and you will get the following:
>>> now = dt.datetime.now(dt.timezone.utc) >>> json.dumps(now) TypeError: Object of type 'datetime' is not JSON serializable
There are three main ways to serialize datetime in JSON:

String

datetime has two main functions to convert to and from a string given a specific format: strftime and strptime. The best way is to use the standard ISO_8601 for serializing time-related objects as string, which is done by calling isoformat on the datetime object:
>>> now = dt.datetime.now(gettz("Europe/London")) >>> now.isoformat() '2017-04-19T22:47:36.585205+01:00'
To get a datetime object from a string that was formatted using isoformat with a UTC time zone, we can rely on strptime:
>>> dt.datetime.strptime(now_str, "%Y-%m-%dT%H:%M:%S.%f+00:00").replace(tzinfo=dt.timezone.utc) datetime.datetime(2017, 4, 19, 21, 49, 5, 542320, tzinfo=datetime.timezone.utc)
In this example, we are hard-coding the offset to be UTC and then setting it once the datetime object has been created. A better way to fully parse the string including the offset is by using the external library dateutil:?
>>> from dateutil.parser import parse >>> parse('2017-04-19T21:49:05.542320+00:00') datetime.datetime(2017, 4, 19, 21, 49, 5, 542320, tzinfo=tzutc()) >>> parse('2017-04-19T21:49:05.542320+01:00') datetime.datetime(2017, 4, 19, 21, 49, 5, 542320, tzinfo=tzoffset(None, 3600))
Note, once we serialize and de serialize, we lose the time zone information and keep only the offset.

Integer

We are able to store a datetime as an integer by using the number of seconds that passed since a specific epoch (reference date). As I mentioned earlier, the most-known epoch in computer systems is the Unix epoch, which references the first second since 1970. This means that 5 represents the fifth second on January 1, 1970.
The Python standard library provides us with tools to get the current time as Unix time and to transform between datetime objects and their int representations as Unix time.
Getting the current time as an integer:
>>> import datetime as dt >>> from dateutil.tz import gettz >>> import time >>> unix_time = time.time()
Unix time to datetime:
>>> unix_time 1492636231.597816 >>> datetime = dt.datetime.fromtimestamp(unix_time, gettz("Europe/London")) >>> datetime.isoformat() '2017-04-19T22:10:31.597816+01:00'
Getting the Unix time given a datetime:
>>> time.mktime(datetime.timetuple()) 1492636231.0 >>> # or using the calendar library >>> calendar.timegm(datetime.timetuple())

Objects

The last option is to serialize the object itself as an object that will give special meaning at decoding time:
import datetime as dt from dateutil.tz import gettz, tzoffset def json_to_dt(obj):     if obj.pop('__type__', None) != "datetime":         return obj     zone, offset = obj.pop("tz")     obj["tzinfo"] = tzoffset(zone, offset)     return dt.datetime(**obj) def dt_to_json(obj):     if isinstance(obj, dt.datetime):         return {             "__type__": "datetime",             "year": obj.year,             "month" : obj.month,             "day" : obj.day,             "hour" : obj.hour,             "minute" : obj.minute,             "second" : obj.second,             "microsecond" : obj.microsecond,             "tz": (obj.tzinfo.tzname(obj), obj.utcoffset().total_seconds())         }     else:         raise TypeError("Cant serialize {}".format(obj))
Now we can encode JSON:
>>> import json >>> now = dt.datetime.now(dt.timezone.utc) >>> json.dumps(now, default=dt_to_json)  # From datetime '{"__type__": "datetime", "year": 2017, "month": 4, "day": 19, "hour": 22, "minute": 32, "second": 44, "microsecond": 778735, "tz": "UTC"}' >>> # Also works with timezones >>> now = dt.datetime.now(gettz("Europe/London")) >>> json.dumps(now, default=dt_to_json) '{"__type__": "datetime", "year": 2017, "month": 4, "day": 19, "hour": 23, "minute": 33, "second": 46, "microsecond": 681533, "tz": "BST"}'
And decode:
>>> input_json='{"__type__": "datetime", "year": 2017, "month": 4, "day": 19, "hour": 23, "minute": 33, "second": 46, "microsecond": 681533, "tz": "BST"}' >>> json.loads(input_json, object_hook=json_to_dt) datetime.datetime(2017, 4, 19, 23, 33, 46, 681533, tzinfo=tzlocal()) >>> input_json='{"__type__": "datetime", "year": 2017, "month": 4, "day": 19, "hour": 23, "minute": 33, "second": 46, "microsecond": 681533, "tz": "EST"}' >>> json.loads(input_json, object_hook=json_to_dt) datetime.datetime(2017, 4, 19, 23, 33, 46, 681533, tzinfo=tzfile('/usr/share/zoneinfo/EST')) >>> json.loads(input_json, object_hook=json_to_dt).isoformat() '2017-04-19T23:33:46.681533-05:00'

Wall times

After this, you might be tempted to convert all datetime objects to UTC and work only with UTC datetimes and fixed offsets. Even if this is by far the best approach for timestamps, it quickly breaks for future wall times.
We can distinguish two main types of time points: wall times and timestamps. Timestamps are universal points in time not related to anywhere in particular. Examples include the time a star is born or when a line is logged to a file. Things change when we speak about the time "we read on the wall clock." When we say "see you tomorrow at 2," we are not referring to UTC offsets, but to tomorrow at 2 PM in our local time zone, no matter what the offset is at this point. We cannot just map those wall times to timestamps (although we can for past ones) because, for future occurrences, countries might change their offset, which happens more frequently than you might think.
For those situations, we need to save the datetime with the time zone to which it refers, and not the offset.

Differences when working with pytz

Since Python 3.6, the recommended library to get the Olson database is dateutil.tz, but it used to be pytz.
They might seem similar, but, in some situations, their approaches to handling time zones is quite different. Getting the current time is simple as well:
>>> import pytz >>> dt.datetime.now(pytz.timezone("Europe/London")) datetime.datetime(2017, 4, 20, 0, 13, 26, 469264, tzinfo=<DstTzInfo 'Europe/London' BST+1:00:00 DST>)
A common pitfall with pytz it to pass a pytz time zone as a tzinfo attribute of a datetime:
>>> dt.datetime(2017, 5, 1, tzinfo=pytz.timezone("Europe/Helsinki")) datetime.datetime(2017, 5, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Helsinki' LMT+1:40:00 STD>) >>> pytz.timezone("Europe/Helsinki").localize(dt.datetime(2017, 5, 1), is_dst=None) datetime.datetime(2017, 5, 1, 0, tzinfo=<DstTzInfo 'Europe/Helsinki' EEST+3:00:00 DST>)
We always should call localize on the datetime objects we build. Otherwise, pytz will assign the first offset it finds for the time zone.
Another major difference can be found when performing time arithmetic. While we saw that the additions worked in dateutil as if we were adding wall time in the specified time zone, when the datetime has a pytz tzinfo instance, absolute hours are added and the caller must call normalize after the operation, as it won't handle DST changes. For example:
>>> today = dt.datetime(2017, 10, 29) >>> tz = pytz.timezone("Europe/Madrid") >>> today = tz.localize(dt.datetime(2017, 10, 29), is_dst=None) >>> tomorrow = today + dt.timedelta(days=1) >>> tomorrow datetime.datetime(2017, 10, 30, 0, 0, tzinfo=<DstTzInfo 'Europe/Madrid' CEST+2:00:00 DST>) >>> tz.normalize(tomorrow) datetime.datetime(2017, 10, 29, 23, 0, tzinfo=<DstTzInfo 'Europe/Madrid' CET+1:00:00 STD>)
Note that with the pytz tzinfo, it has added 24 absolute hours (23 hours on the wall time).
The following table resumes the way to get either wall/timestamps arithmetic with both pytz and dateutil:
  pytz dateutil
wall time obj.tzinfo.localize(obj.replace(tzinfo=None) + timedelta, is_dst=is_dst) obj + timedelta
absolute time obj.tzinfo.normalize(obj + timedelta) (obj.astimezone(pytz.utc) + timedelta).astimezone(obj.tzinfo)
Note that adding wall times can lead to unexpected results when DST changes occur.
Finally, dateutil plays nicely with the fold attribute added in PEP0495 and provides backward compatibility if you are using earlier versions of Python.

Quick tips

After all this, how should we avoid the common issues when working with time?
  • Always use time zones. Don't rely on implicit local time zones.
  • Use dateutil/pytz to handle time zones.
  • Always use UTC when working with timestamps.
  • Remember that, for some time zones, a day is not always made of 24 hours.
  • Keep your time zone database up to date.
  • Always test your code against situations such as DST changes.

Libraries worth mentioning

  • dateutil: Multiple utilities to work with time
  • freezegun: Easier testing of time-related applications
  • arrow/pendulum: Drop-in replacement of the standard datetime module
  • astropy: Useful for astronomical times and working with leap seconds
Mario Corchero will be speaking at PyCon 2017, delivering his talk, It's time for datetime, in Portland, Oregon.

No comments:

Post a Comment