https://opensource.com/article/17/5/understanding-datetime-python-primer
Get a better understanding of datetime in Python with this primer.
When
trying to make things work with the datetime module, most Python users
have faced a point when we resort to guess-and-check until the errors go
away. datetime is one of those APIs that seems easy to use, but
requires the developer to have a deep understanding of what a few things
actually mean. Otherwise, given the complexity of date- and
time-related issues, introducing unexpected bugs is easy.
In Unix-like systems, the most common way to measure time is by using POSIX time, which is defined as the number of seconds that have elapsed the Unix epoch (Thursday, January 1, 1970), without taking leap seconds into account. As POSIX time does not handle leap seconds (nor does Python), some companies have defined their own way of handling time by smearing the leap second across the time around it through their NTP servers (see Google time as an example).
I've explained what UTC is and how it allows us to define dates and
times, but countries like to have their wall time noon match with the
solar time for noon, so the sun is at the top of the sky at 12 PM. That
is why UTC defines offsets, so we can have 12 AM with an offset of +4
hours from UTC. This effectively means that the actual time without
offset is 8 AM.
Governments define the standard offset from UTC that a geographical position follows, effectively creating a time zone. The most common database for time zones is known as the Olson Database. This can be retrieved in Python using dateutil.tz:
Should we want to use just plain UTC in Python 3, we don't need any external libraries:
The countries that follow DST will move their clocks one hour forward in spring, and one hour backward in autumn to return to the standard time of the time zone. This effectively implies that a single time zone can have multiple offsets, as we can see in the following example:
The Python standard library provides us with tools to get the current time as Unix time and to transform between datetime objects and their int representations as Unix time.
Getting the current time as an integer:
We can distinguish two main types of time points: wall times and timestamps. Timestamps are universal points in time not related to anywhere in particular. Examples include the time a star is born or when a line is logged to a file. Things change when we speak about the time "we read on the wall clock." When we say "see you tomorrow at 2," we are not referring to UTC offsets, but to tomorrow at 2 PM in our local time zone, no matter what the offset is at this point. We cannot just map those wall times to timestamps (although we can for past ones) because, for future occurrences, countries might change their offset, which happens more frequently than you might think.
For those situations, we need to save the datetime with the time zone to which it refers, and not the offset.
They might seem similar, but, in some situations, their approaches to handling time zones is quite different. Getting the current time is simple as well:
Another major difference can be found when performing time arithmetic. While we saw that the additions worked in dateutil as if we were adding wall time in the specified time zone, when the datetime has a pytz tzinfo instance, absolute hours are added and the caller must call normalize after the operation, as it won't handle DST changes. For example:
The following table resumes the way to get either wall/timestamps arithmetic with both pytz and dateutil:
Note that adding wall times can lead to unexpected results when DST changes occur.
Finally, dateutil plays nicely with the fold attribute added in PEP0495 and provides backward compatibility if you are using earlier versions of Python.
Time standards
The first concept to grasp when working with time is a standard that defines how we can measure units of time. The same way we have standards to measure weight or length that define kilograms or meters, we need an accurate way to define what a 'second' means. We can then use other time references—such as days, weeks, or years—using a calendar standard as multiples of the second (see Gregorian Calendar as an example).UT1
One of the simplest ways to measure a second is as a fraction of the day, given that we can reliably guarantee the sun will rise and set every day (in most places). This gave birth to Universal Time (UT1), the successor of GMT (Greenwich Mean Time). Today, we use stars and quasars to measure how long it takes for the Earth to perform a full rotation around the sun. Even if this seems precise enough, it still has issues; due to the gravitational pull of the moon, tides, and earthquakes, the days change length all year long. Although this is not a problem for most applications, it becomes a non-trivial problem when we require really precise measurements. GPS triangulation is a good example of a time-sensitive process, in which being a second off results in a completely different location on the globe.TAI
As a result, the International Atomic Time (TAI) was designed to be as precise as possible. Using atomic clocks in multiple laboratories across the earth, we get the most accurate and constant measure of the second, which allows us to compute time intervals with the highest accuracy. This precision is both a blessing and a curse as TAI is so exact that it deviates from UT1 (or what we call civil time). This means that we will eventually have our clock noon deviate substantially from the solar noon.UTC
That led to the development of Coordinated Universal Time (UTC), which brought together the best of the both units. UTC uses the measurement of a second as defined by TAI. This allows for accurate measurement of time while introducing leap seconds to ensure that the time does not deviate from UT1 by more than 0.9 seconds.How all this plays together on your computer
With all this background, you should now be able to understand how the operating system is serving time at any given moment. While the computer doesn't have an atomic clock inside but uses an internal clock synchronized with the rest of the world via Network Time Protocol (NTP).In Unix-like systems, the most common way to measure time is by using POSIX time, which is defined as the number of seconds that have elapsed the Unix epoch (Thursday, January 1, 1970), without taking leap seconds into account. As POSIX time does not handle leap seconds (nor does Python), some companies have defined their own way of handling time by smearing the leap second across the time around it through their NTP servers (see Google time as an example).
Time zones
Governments define the standard offset from UTC that a geographical position follows, effectively creating a time zone. The most common database for time zones is known as the Olson Database. This can be retrieved in Python using dateutil.tz:
The result of gettz gives us an object that we can use to create time-zone-aware dates in Python:>>> from dateutil.tz import gettz >>> gettz("Europe/Madrid")
We can see how to get the current time via the now function of datetime. On the second call we pass a tzinfo object which sets the time zone and displays the offset in the ISO string representation of that datetime.>>> import datetime as dt >>> dt.datetime.now().isoformat() '2017-04-15T14:16:56.551778' # This is a naive datetime >>> dt.datetime.now(gettz("Europe/Madrid")).isoformat() '2017-04-15T14:17:01.256587+02:00' # This is a tz aware datetime, always prefer these
Should we want to use just plain UTC in Python 3, we don't need any external libraries:
>>> dt.datetime.now(dt.timezone.utc).isoformat() '2017-04-15T12:22:06.637355+00:00'
DST
Once we grasp all this knowledge, we might feel prepared to work with time zones, but we must be aware of one more thing that happens in some time zones: Daylight Saving Time (DST).The countries that follow DST will move their clocks one hour forward in spring, and one hour backward in autumn to return to the standard time of the time zone. This effectively implies that a single time zone can have multiple offsets, as we can see in the following example:
This gives us days that are made of 23 or 25 hours, resulting in really interesting time arithmetic. Depending on the time and the time zone, adding a day does not necessarily mean adding 24 hours:>>> dt.datetime(2017, 7, 1, tzinfo=dt.timezone.utc).astimezone(gettz("Europe/Madrid")) '2017-07-01T02:00:00+02:00' >>> dt.datetime(2017, 1, 1, tzinfo=dt.timezone.utc).astimezone(gettz("Europe/Madrid")) '2017-01-01T01:00:00+01:00'
When working with timestamps, the best strategy is to use non DST-aware time zones (ideally UTC+00:00).>>> today = dt.datetime(2017, 10, 29, tzinfo=gettz("Europe/Madrid")) >>> tomorrow = today + dt.timedelta(days=1) >>> tomorrow.astimezone(dt.timezone.utc) - today.astimezone(dt.timezone.utc) datetime.timedelta(1, 3600) # We've added 25 hours
Serializing your datetime objects
The day will come that you need to send your datetime objects in JSON and you will get the following:There are three main ways to serialize datetime in JSON:>>> now = dt.datetime.now(dt.timezone.utc) >>> json.dumps(now) TypeError: Object of type 'datetime' is not JSON serializable
String
datetime has two main functions to convert to and from a string given a specific format: strftime and strptime. The best way is to use the standard ISO_8601 for serializing time-related objects as string, which is done by calling isoformat on the datetime object:To get a datetime object from a string that was formatted using isoformat with a UTC time zone, we can rely on strptime:>>> now = dt.datetime.now(gettz("Europe/London")) >>> now.isoformat() '2017-04-19T22:47:36.585205+01:00'
In this example, we are hard-coding the offset to be UTC and then setting it once the datetime object has been created. A better way to fully parse the string including the offset is by using the external library dateutil:?>>> dt.datetime.strptime(now_str, "%Y-%m-%dT%H:%M:%S.%f+00:00").replace(tzinfo=dt.timezone.utc) datetime.datetime(2017, 4, 19, 21, 49, 5, 542320, tzinfo=datetime.timezone.utc)
Note, once we serialize and de serialize, we lose the time zone information and keep only the offset.>>> from dateutil.parser import parse >>> parse('2017-04-19T21:49:05.542320+00:00') datetime.datetime(2017, 4, 19, 21, 49, 5, 542320, tzinfo=tzutc()) >>> parse('2017-04-19T21:49:05.542320+01:00') datetime.datetime(2017, 4, 19, 21, 49, 5, 542320, tzinfo=tzoffset(None, 3600))
Integer
We are able to store a datetime as an integer by using the number of seconds that passed since a specific epoch (reference date). As I mentioned earlier, the most-known epoch in computer systems is the Unix epoch, which references the first second since 1970. This means that 5 represents the fifth second on January 1, 1970.The Python standard library provides us with tools to get the current time as Unix time and to transform between datetime objects and their int representations as Unix time.
Getting the current time as an integer:
Unix time to datetime:>>> import datetime as dt >>> from dateutil.tz import gettz >>> import time >>> unix_time = time.time()
Getting the Unix time given a datetime:>>> unix_time 1492636231.597816 >>> datetime = dt.datetime.fromtimestamp(unix_time, gettz("Europe/London")) >>> datetime.isoformat() '2017-04-19T22:10:31.597816+01:00'
>>> time.mktime(datetime.timetuple()) 1492636231.0 >>> # or using the calendar library >>> calendar.timegm(datetime.timetuple())
Objects
The last option is to serialize the object itself as an object that will give special meaning at decoding time:Now we can encode JSON:import datetime as dt from dateutil.tz import gettz, tzoffset def json_to_dt(obj): if obj.pop('__type__', None) != "datetime": return obj zone, offset = obj.pop("tz") obj["tzinfo"] = tzoffset(zone, offset) return dt.datetime(**obj) def dt_to_json(obj): if isinstance(obj, dt.datetime): return { "__type__": "datetime", "year": obj.year, "month" : obj.month, "day" : obj.day, "hour" : obj.hour, "minute" : obj.minute, "second" : obj.second, "microsecond" : obj.microsecond, "tz": (obj.tzinfo.tzname(obj), obj.utcoffset().total_seconds()) } else: raise TypeError("Cant serialize {}".format(obj))
And decode:>>> import json >>> now = dt.datetime.now(dt.timezone.utc) >>> json.dumps(now, default=dt_to_json) # From datetime '{"__type__": "datetime", "year": 2017, "month": 4, "day": 19, "hour": 22, "minute": 32, "second": 44, "microsecond": 778735, "tz": "UTC"}' >>> # Also works with timezones >>> now = dt.datetime.now(gettz("Europe/London")) >>> json.dumps(now, default=dt_to_json) '{"__type__": "datetime", "year": 2017, "month": 4, "day": 19, "hour": 23, "minute": 33, "second": 46, "microsecond": 681533, "tz": "BST"}'
>>> input_json='{"__type__": "datetime", "year": 2017, "month": 4, "day": 19, "hour": 23, "minute": 33, "second": 46, "microsecond": 681533, "tz": "BST"}' >>> json.loads(input_json, object_hook=json_to_dt) datetime.datetime(2017, 4, 19, 23, 33, 46, 681533, tzinfo=tzlocal()) >>> input_json='{"__type__": "datetime", "year": 2017, "month": 4, "day": 19, "hour": 23, "minute": 33, "second": 46, "microsecond": 681533, "tz": "EST"}' >>> json.loads(input_json, object_hook=json_to_dt) datetime.datetime(2017, 4, 19, 23, 33, 46, 681533, tzinfo=tzfile('/usr/share/zoneinfo/EST')) >>> json.loads(input_json, object_hook=json_to_dt).isoformat() '2017-04-19T23:33:46.681533-05:00'
Wall times
After this, you might be tempted to convert all datetime objects to UTC and work only with UTC datetimes and fixed offsets. Even if this is by far the best approach for timestamps, it quickly breaks for future wall times.We can distinguish two main types of time points: wall times and timestamps. Timestamps are universal points in time not related to anywhere in particular. Examples include the time a star is born or when a line is logged to a file. Things change when we speak about the time "we read on the wall clock." When we say "see you tomorrow at 2," we are not referring to UTC offsets, but to tomorrow at 2 PM in our local time zone, no matter what the offset is at this point. We cannot just map those wall times to timestamps (although we can for past ones) because, for future occurrences, countries might change their offset, which happens more frequently than you might think.
For those situations, we need to save the datetime with the time zone to which it refers, and not the offset.
Differences when working with pytz
Since Python 3.6, the recommended library to get the Olson database is dateutil.tz, but it used to be pytz.They might seem similar, but, in some situations, their approaches to handling time zones is quite different. Getting the current time is simple as well:
A common pitfall with pytz it to pass a pytz time zone as a tzinfo attribute of a datetime:>>> import pytz >>> dt.datetime.now(pytz.timezone("Europe/London")) datetime.datetime(2017, 4, 20, 0, 13, 26, 469264, tzinfo=<DstTzInfo 'Europe/London' BST+1:00:00 DST>)
We always should call localize on the datetime objects we build. Otherwise, pytz will assign the first offset it finds for the time zone.>>> dt.datetime(2017, 5, 1, tzinfo=pytz.timezone("Europe/Helsinki")) datetime.datetime(2017, 5, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Helsinki' LMT+1:40:00 STD>) >>> pytz.timezone("Europe/Helsinki").localize(dt.datetime(2017, 5, 1), is_dst=None) datetime.datetime(2017, 5, 1, 0, tzinfo=<DstTzInfo 'Europe/Helsinki' EEST+3:00:00 DST>)
Another major difference can be found when performing time arithmetic. While we saw that the additions worked in dateutil as if we were adding wall time in the specified time zone, when the datetime has a pytz tzinfo instance, absolute hours are added and the caller must call normalize after the operation, as it won't handle DST changes. For example:
Note that with the pytz tzinfo, it has added 24 absolute hours (23 hours on the wall time).>>> today = dt.datetime(2017, 10, 29) >>> tz = pytz.timezone("Europe/Madrid") >>> today = tz.localize(dt.datetime(2017, 10, 29), is_dst=None) >>> tomorrow = today + dt.timedelta(days=1) >>> tomorrow datetime.datetime(2017, 10, 30, 0, 0, tzinfo=<DstTzInfo 'Europe/Madrid' CEST+2:00:00 DST>) >>> tz.normalize(tomorrow) datetime.datetime(2017, 10, 29, 23, 0, tzinfo=<DstTzInfo 'Europe/Madrid' CET+1:00:00 STD>)
The following table resumes the way to get either wall/timestamps arithmetic with both pytz and dateutil:
pytz | dateutil | |
wall time | obj.tzinfo.localize(obj.replace(tzinfo=None) + timedelta, is_dst=is_dst) | obj + timedelta |
absolute time | obj.tzinfo.normalize(obj + timedelta) | (obj.astimezone(pytz.utc) + timedelta).astimezone(obj.tzinfo) |
Finally, dateutil plays nicely with the fold attribute added in PEP0495 and provides backward compatibility if you are using earlier versions of Python.
Quick tips
After all this, how should we avoid the common issues when working with time?- Always use time zones. Don't rely on implicit local time zones.
- Use dateutil/pytz to handle time zones.
- Always use UTC when working with timestamps.
- Remember that, for some time zones, a day is not always made of 24 hours.
- Keep your time zone database up to date.
- Always test your code against situations such as DST changes.
Libraries worth mentioning
- dateutil: Multiple utilities to work with time
- freezegun: Easier testing of time-related applications
- arrow/pendulum: Drop-in replacement of the standard datetime module
- astropy: Useful for astronomical times and working with leap seconds
No comments:
Post a Comment