Home

How to Use Python to Plot Time Series for Data Science

|
Updated:  
2016-03-26 7:35:10
|
From The Book:  
No items found.
Python Essentials For Dummies
Explore Book
Buy On Amazon

Nothing is truly static, especially in data science. When you view most data with Python, you see an instant of time — a snapshot of how the data appeared at one particular moment. Of course, such views are both common and useful. However, sometimes you need to view data as it moves through time — to see it as it changes. Only by viewing the data as it changes can you expect to understand the underlying forces that shape it.

Representing time on axes

Many times, you need to present data over time. The data could come in many forms, but generally you have some type of time tick (one unit of time), followed by one or more features that describe what happens during that particular tick. The following example shows a simple set of days and sales on those days for a particular item in whole (integer) amounts.

import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(columns=(‘Time’, ‘Sales’))
start_date = dt.datetime(2015, 7,1)
end_date = dt.datetime(2015, 7,10)
daterange = pd.date_range(start_date, end_date)
for single_date in daterange:
 row = dict(zip([‘Time’, ‘Sales’],
     [single_date,
     int(50*np.random.rand(1))]))
 row_s = pd.Series(row)
 row_s.name = single_date.strftime(‘%b %d’)
 df = df.append(row_s)
df.ix[‘Jul 01’:’Jul 07’, [‘Time’, ‘Sales’]].plot()
plt.ylim(0, 50)
plt.xlabel(‘Sales Date’)
plt.ylabel(‘Sale Value’)
plt.title(‘Plotting Time’)
plt.show()

The example begins by creating a DataFrame to hold the information. The source of the information could be anything, but the example generates it randomly. Notice that the example creates a date_range to hold the starting and ending date time frame for easier processing using a for loop.

An essential part of this example is the creation of individual rows. Each row has an actual time value so that you don’t lose information. However, notice that the index (row_s.name property) is a string. This string should appear in the form that you want the dates to appear when presented in the plot.

Using ix[] lets you select a range of dates from the total number of entries available. Notice that this example uses only some of the generated data for output. It then adds some amplifying information about the plot and displays it onscreen. Here’s typical output from the randomly generated data.

Use line graphs to show the flow of data over time.
Use line graphs to show the flow of data over time.

Plotting trends over time

As with any other data presentation, sometimes you really can’t see what direction the data is headed in without help. The following example starts with the plot from above and adds a trendline to it:

import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pylab as plb
df = pd.DataFrame(columns=(‘Time’, ‘Sales’))
start_date = dt.datetime(2015, 7,1)
end_date = dt.datetime(2015, 7,10)
daterange = pd.date_range(start_date, end_date)
for single_date in daterange:
 row = dict(zip([‘Time’, ‘Sales’],
     [single_date,
     int(50*np.random.rand(1))]))
 row_s = pd.Series(row)
 row_s.name = single_date.strftime(‘%b %d’)
 df = df.append(row_s)
df.ix[‘Jul 01’:’Jul 10’, [‘Time’, ‘Sales’]].plot()
z = np.polyfit(range(0, 10),
    df.as_matrix([‘Sales’]).flatten(), 1)
p = np.poly1d(z)
plb.plot(df.as_matrix([‘Sales’]),
   p(df.as_matrix([‘Sales’])), ‘m-’)
plt.ylim(0, 50)
plt.xlabel(‘Sales Date’)
plt.ylabel(‘Sale Value’)
plt.title(‘Plotting Time’)
plt.legend([‘Sales’, ‘Trend’])
plt.show()

Because the data appears within a DataFrame, you must export it using as_matrix() and then flatten the resulting array using flatten() before you can use it as input to polyfit(). Likewise, you must export the data before you can call plot() to display the trendline onscreen.

When you plot the initial data, the call to plot() automatically generates a legend for you. MatPlotLib doesn’t automatically add the trendline, so you must also create a new legend for the plot. Here’s typical output from this example using randomly generated data.

Add a trendline to show the average direction of change in a chart or graph.
Add a trendline to show the average direction of change in a chart or graph.

About This Article

This article is from the book: 

No items found.

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.