Portfolio Project: Online Retail Exploratory Data Analysis with Python¶

Overview¶

In this project, you will step into the shoes of an entry-level data analyst at an online retail company, helping interpret real-world data to help make a key business decision.

Case Study¶

In this project, you will be working with transactional data from an online retail store. The dataset contains information about customer purchases, including product details, quantities, prices, and timestamps. Your task is to explore and analyze this dataset to gain insights into the store's sales trends, customer behavior, and popular products.

By conducting exploratory data analysis, you will identify patterns, outliers, and correlations in the data, allowing you to make data-driven decisions and recommendations to optimize the store's operations and improve customer satisfaction. Through visualizations and statistical analysis, you will uncover key trends, such as the busiest sales months, best-selling products, and the store's most valuable customers. Ultimately, this project aims to provide actionable insights that can drive strategic business decisions and enhance the store's overall performance in the competitive online retail market.

Prerequisites¶

Before starting this project, you should have some basic knowledge of Python programming and Pandas. In addition, you may want to use the following packages in your Python environment:

pandas
numpy
seaborn
matplotlib

These packages should already be installed in Coursera's Jupyter Notebook environment, however if you'd like to install additional packages that are not included in this environment or are working off platform you can install additional packages using !pip install packagename within a notebook cell such as:

!pip install pandas
!pip install matplotlib

Project Objectives¶

Describe data to answer key questions to uncover insights
Gain valuable insights that will help improve online retail performance
Provide analytic insights and data-driven recommendations

Dataset¶

The dataset you will be working with is the "Online Retail" dataset. It contains transactional data of an online retail store from 2010 to 2011. The dataset is available as a .xlsx file named Online Retail.xlsx. This data file is already included in the Coursera Jupyter Notebook environment, however if you are working off-platform it can also be downloaded here.

The dataset contains the following columns:

InvoiceNo: Invoice number of the transaction
StockCode: Unique code of the product
Description: Description of the product
Quantity: Quantity of the product in the transaction
InvoiceDate: Date and time of the transaction
UnitPrice: Unit price of the product
CustomerID: Unique identifier of the customer
Country: Country where the transaction occurred

Tasks¶

You may explore this dataset in any way you would like - however if you'd like some help getting started, here are a few ideas:

Load the dataset into a Pandas DataFrame and display the first few rows to get an overview of the data.
Perform data cleaning by handling missing values, if any, and removing any redundant or unnecessary columns.
Explore the basic statistics of the dataset, including measures of central tendency and dispersion.
Perform data visualization to gain insights into the dataset. Generate appropriate plots, such as histograms, scatter plots, or bar plots, to visualize different aspects of the data.
Analyze the sales trends over time. Identify the busiest months and days of the week in terms of sales.
Explore the top-selling products and countries based on the quantity sold.
Identify any outliers or anomalies in the dataset and discuss their potential impact on the analysis.
Draw conclusions and summarize your findings from the exploratory data analysis.

Task 1: Load the Data¶

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_excel("Online Retail.xlsx")
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    541909 non-null  object        
 1   StockCode    541909 non-null  object        
 2   Description  540455 non-null  object        
 3   Quantity     541909 non-null  int64         
 4   InvoiceDate  541909 non-null  datetime64[ns]
 5   UnitPrice    541909 non-null  float64       
 6   CustomerID   406829 non-null  float64       
 7   Country      541909 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 33.1+ MB
None

Null customer Id's are filled with '0' for now, to be filtered later. Also will make a revenue column, and remove non-product data from our set before checking out key statistics:¶

df["CustomerID"] = df["CustomerID"].fillna(0).astype("int32")
df['revenue'] = df['Quantity'] * df['UnitPrice']

#remove outlier orders with crazy price, must be mistake. this one basket has a price of 600 instead of its usual $3
df= df[~((df['StockCode'] == 22502) & (df['UnitPrice'] > 600))]

#remove postage, manuals, and bank charges, ect -- we just want to see sales data.
df = df[~df['StockCode'].isin(['M','DOT','BANK CHARGES', 'CRUK','AMAZONFEE','POST',
                               'PADS','B','D','C2','gift_0001_30','gift_0001_40',
                               'gift_0001_50','gift_0001_20','gift_0001_10','S'])]





unique_items = df['StockCode'].nunique()
unique_customers = df["CustomerID"].nunique()
unique_countries = df['Country'].nunique()
total_units_sold = df['Quantity'].sum()
total_revenue = df['revenue'].sum()

print(f"Unique products:  {unique_items}")
print(f"Unique customers:  {unique_customers}")
print(f"Unique countires:  {unique_countries}")
print(f"total units sold:  {total_units_sold}")
print(f"total revenue:  ${total_revenue}")
# Check for duplicate rows

Unique products:  4054
Unique customers:  4364
Unique countires:  38
total units sold:  5166077
total revenue:  $9753091.930000002

Below compares sales/revenue by day-of-week to see any patterns, and check month over month growth. Saturday has no sales data, and the monthly chart indicates a strong end of year growth:¶

df["DayOfWeek"] = df["InvoiceDate"].dt.day_name()
df["Month"] = df["InvoiceDate"].dt.month_name()

rev_by_day = df.groupby('DayOfWeek')['revenue'].sum()
rev_by_month = df.groupby('Month')['revenue'].sum()


# Sales by Day of the Week
plt.figure(figsize=(10, 4))
sales_by_day = rev_by_day.reindex(["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"])
sales_by_day.plot(kind="bar", color="skyblue")

plt.title("Revenue by Day of the Week")
plt.xlabel("Day")
plt.ylabel("Total Revenue")
plt.xticks(rotation=45)
plt.show()


# Sales by Month
plt.figure(figsize=(10, 4))
sales_by_month = rev_by_month.reindex(["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"])
sales_by_month.plot(kind="bar", color="lightgreen")

plt.title("Revenue by Month")
plt.xlabel("Month")
plt.ylabel("Total Revenue")
plt.xticks(rotation=45)
plt.show()

Which countries give the most business? The answer is UK by far. We can try to expand marketing to the other top countries:¶

countries = df.groupby('Country')['revenue'].sum().sort_values(ascending=False).head(5)
print(countries)
plt.figure(figsize=(12, 7))

# Plot the pie chart with labels
plt.figure(figsize=(12, 7))
countries.plot(kind="pie", autopct='%1.1f%%')

# Add title and show the chart
plt.title("Top 5 Countries by Revenue")
plt.ylabel("")  # Removes default y-axis label
plt.show()

countries.plot(kind="bar")
plt.ylabel('Revenue')
plt.xlabel('Product ID')

Country
United Kingdom    8.262101e+06
Netherlands       2.834795e+05
EIRE              2.596635e+05
Germany           2.008084e+05
France            1.821634e+05
Name: revenue, dtype: float64

<Figure size 864x504 with 0 Axes>

Text(0.5, 0, 'Product ID')

Below shows top products by quantity and revenue. The biggest quantity is not always the biggest revenue:¶

products = df.groupby('Description')['revenue'].sum().sort_values(ascending=False).head(15)
products2 = df.groupby('Description')['Quantity'].sum().sort_values(ascending=False).head(15)

productss = df.groupby('StockCode').agg({
    'revenue': 'sum',  
    'Description': 'first',  # Keeps the first description for each StockCode
    'Quantity': 'sum'
}).sort_values(by='revenue', ascending=False).head(15)

print(productss)

# Plot the pie chart with labels
plt.figure(figsize=(12, 7))
products.plot(kind="pie", autopct='%1.1f%%')
# Add title and show the chart
plt.title("Top 15 Products by Revenue")
plt.ylabel("")  # Removes default y-axis label
plt.show()

products.plot(kind="bar")
plt.ylabel('Revenue')
plt.xlabel('Product ID')


# Plot the pie chart with labels
plt.figure(figsize=(12, 7))
products2.plot(kind="pie", autopct='%1.1f%%')
# Add title and show the chart
plt.title("Top 15 Products by Quantity")
plt.ylabel("")  # Removes default y-axis label
plt.show()

products2.plot(kind="bar")
plt.ylabel('Quantity')
plt.xlabel('Product ID')

             revenue                         Description  Quantity
StockCode                                                         
22423      164762.19            REGENCY CAKESTAND 3 TIER     12980
47566       98302.98                       PARTY BUNTING     18022
85123A      97894.50  WHITE HANGING HEART T-LIGHT HOLDER     38830
85099B      92356.03             JUMBO BAG RED RETROSPOT     47363
23084       66756.59                  RABBIT NIGHT LIGHT     30646
22086       63791.94     PAPER CHAIN KIT 50'S CHRISTMAS      18902
84879       58959.73       ASSORTED COLOUR BIRD ORNAMENT     36221
79321       53768.06                       CHILLI LIGHTS     10343
22197       50987.47                SMALL POPCORN HOLDER     56450
23298       42700.02                      SPOTTY BUNTING      8534
22386       41619.66             JUMBO BAG PINK POLKADOT     21009
23203       40991.38                             mailout     20013
21137       40596.96            BLACK RECORD COVER FRAME     11616
22720       37413.44   SET OF 3 CAKE TINS PANTRY DESIGN       7286
23284       36565.39       DOORMAT KEEP CALM AND COME IN      5264

Text(0.5, 0, 'Product ID')

below takes a look at the outliers in quantity, unitprice, and revenue. negative values were due to returns and canceled out when summing:¶

plt.figure(figsize=(12, 5))
sns.boxplot(x=df["Quantity"])
plt.title("Outliers in Quantity")
plt.show()

plt.figure(figsize=(12, 5))
sns.boxplot(x=df["UnitPrice"])
plt.title("Outliers in Unit Price")
plt.show()

sns.boxplot(x=df["revenue"])
plt.title("Outliers in Revenue")
plt.show()

Look below to see the country UK's top 25 orders sorted by revenue¶

UK = df[df['Country'] == 'United Kingdom']
UK.sort_values(by='revenue', ascending = False).head(25)
#UK.sort_values(by='Quantity', ascending = True).head(25)

Below explores the top 15 customers on a pie chart, including all of the null values previously filled with 0:¶

top_cust_rev = df.groupby('CustomerID')['revenue'].sum().sort_values(ascending=False).head(15)
top_cust_rev1 = df.groupby('CustomerID').agg({
    'revenue':'sum',
    'Quantity':'sum',
    'Country':'first'}).sort_values(by = 'revenue',ascending = False).head(15)
print(top_cust_rev1)

plt.figure(figsize=(14, 10))
top_cust_rev.plot(kind='pie', autopct='%1.1f%%')
plt.title("Top 15 CustomerID's including all Nulls as 0")
plt.ylabel("")

               revenue  Quantity         Country
CustomerID                                      
0           1506048.11    264391  United Kingdom
14646        278778.02    197132     Netherlands
18102        259657.30     64124  United Kingdom
17450        189735.53     69041  United Kingdom
14911        128882.13     77098            EIRE
12415        123638.18     77242       Australia
14156        113855.32     57013            EIRE
17511         88138.20     63014  United Kingdom
16684         65920.12     49391  United Kingdom
13694         62924.10     61808  United Kingdom
15311         59419.34     37720  United Kingdom
16029         58193.85     32203  United Kingdom
13089         57385.88     30787  United Kingdom
15061         54250.34     28638  United Kingdom
14096         53258.43     16336  United Kingdom

Text(0, 0.5, '')

Filtering the null values out of the top customer chart:¶

tops_cust= df[~df['CustomerID'].isin([0])] 

top_custs = tops_cust.groupby('CustomerID')['revenue'].sum().sort_values(ascending=False).head(15)

plt.figure(figsize=(12, 7))
top_custs.plot(kind='pie',autopct='%1.1f%%')
plt.title('Top 15 CustomerIDs Excluding all nulls')
plt.ylabel("")

#lets see where our top customers are from, and their favorite products
t_cust = tops_cust.groupby('CustomerID').agg({'revenue':'sum','Quantity':'sum','UnitPrice':'first','Country':'first', 'Description': lambda x: x.mode().iloc[0] }).sort_values(by='revenue', ascending = False).head(25)
print(t_cust)

              revenue  Quantity  UnitPrice         Country  \
CustomerID                                                   
14646       278778.02    197132       0.29     Netherlands   
18102       259657.30     64124       4.78  United Kingdom   
17450       189735.53     69041       6.60  United Kingdom   
14911       128882.13     77098       9.95            EIRE   
12415       123638.18     77242       1.85       Australia   
14156       113855.32     57013       4.25            EIRE   
17511        88138.20     63014       8.50  United Kingdom   
16684        65920.12     49391       2.10  United Kingdom   
13694        62924.10     61808       1.06  United Kingdom   
15311        59419.34     37720       4.25  United Kingdom   
16029        58193.85     32203       4.95  United Kingdom   
13089        57385.88     30787      10.95  United Kingdom   
15061        54250.34     28638       0.36  United Kingdom   
14096        53258.43     16336       2.46  United Kingdom   
17949        53215.74     27572      10.95  United Kingdom   
15769        51823.72     27660       2.55  United Kingdom   
14298        50862.44     58021       1.25  United Kingdom   
14088        50415.49     12593       5.95  United Kingdom   
17841        40333.22     22850       2.10  United Kingdom   
13798        36351.42     23514       2.55  United Kingdom   
16422        34311.00     33584       1.25  United Kingdom   
12931        33544.81     23577       0.72  United Kingdom   
16013        33453.45     13942       4.25  United Kingdom   
15838        33362.28      9008       0.03  United Kingdom   
17389        31317.48      7442       2.95  United Kingdom   

                                  Description  
CustomerID                                     
14646                     SPACEBOY LUNCH BOX   
18102                 CREAM HEART CARD HOLDER  
17450              COOK WITH WINE METAL SIGN   
14911                REGENCY CAKESTAND 3 TIER  
12415                    DOLLY GIRL LUNCH BOX  
14156                REGENCY CAKESTAND 3 TIER  
17511                 JUMBO BAG RED RETROSPOT  
16684         GREEN REGENCY TEACUP AND SAUCER  
13694                        JUMBO BAG APPLES  
15311               EDWARDIAN PARASOL NATURAL  
16029                 JUMBO BAG RED RETROSPOT  
13089           SILVER HANGING T-LIGHT HOLDER  
15061         CLOTHES PEGS RETROSPOT PACK 24   
14096       SET OF 3 WOODEN HEART DECORATIONS  
17949                          POPCORN HOLDER  
15769                          JUMBO BAG OWLS  
14298                 FENG SHUI PILLAR CANDLE  
14088        ANTIQUE GLASS DRESSING TABLE POT  
17841                           CHILLI LIGHTS  
13798                    HOT BATHS METAL SIGN  
16422                 PAPER BUNTING RETROSPOT  
12931           RED TOADSTOOL LED NIGHT LIGHT  
16013                           CHILLI LIGHTS  
15838                 JUMBO BAG RED RETROSPOT  
17389             CREAM SWEETHEART MINI CHEST

Lets see a interactive map of the countries we do business with:¶

!pip install plotly
import plotly.express as px

#revenue by country
country_revenue = df.groupby("Country")["revenue"].sum().reset_index()

fig = px.choropleth(
    country_revenue,
    locations="Country",
    locationmode="country names",
    color="revenue",
    color_continuous_scale="reds",
    title="Revenue by Country"
)


fig.show()

Collecting plotly
  Downloading plotly-5.18.0-py3-none-any.whl (15.6 MB)
     |████████████████████████████████| 15.6 MB 30.8 MB/s            
Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from plotly) (20.1)
Collecting tenacity>=6.2.0
  Downloading tenacity-8.2.3-py3-none-any.whl (24 kB)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from packaging->plotly) (1.14.0)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging->plotly) (2.4.7)
Installing collected packages: tenacity, plotly
Successfully installed plotly-5.18.0 tenacity-8.2.3
WARNING: You are using pip version 21.3.1; however, version 24.0 is available.
You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.

Below we take a look at our best selling products, as well as the products which bring in the most revenue:¶

## analysis removing all customer null entries


products123 = tops_cust.groupby('Description')['revenue'].sum().sort_values(ascending=False).head(15)

products1234 = tops_cust.groupby('Description').agg({
    'revenue':'sum',
    'Country': 'max'}).sort_values(by ='revenue',ascending=False).head(15)
print(products1234)

# Plot the pie chart with labels
plt.figure(figsize=(12, 7))
products123.plot(kind="pie", autopct='%1.1f%%')
# Add title and show the chart
plt.title("Top 15 Products by Revenue")
plt.ylabel("")  # Removes default y-axis label
plt.show()

products123.plot(kind="bar")
plt.ylabel('Revenue')
plt.xlabel('Product ID')

                                      revenue         Country
Description                                                  
REGENCY CAKESTAND 3 TIER            132870.40     Unspecified
WHITE HANGING HEART T-LIGHT HOLDER   93823.85  United Kingdom
JUMBO BAG RED RETROSPOT              83236.76  United Kingdom
PARTY BUNTING                        67687.53  United Kingdom
ASSORTED COLOUR BIRD ORNAMENT        56499.22     Unspecified
RABBIT NIGHT LIGHT                   51137.80  United Kingdom
CHILLI LIGHTS                        45936.81  United Kingdom
PAPER CHAIN KIT 50'S CHRISTMAS       41500.48  United Kingdom
BLACK RECORD COVER FRAME             39009.38  United Kingdom
JUMBO BAG PINK POLKADOT              36473.01  United Kingdom
SPOTTY BUNTING                       35056.44  United Kingdom
DOORMAT KEEP CALM AND COME IN        34312.60     Unspecified
WOOD BLACK BOARD ANT WHITE FINISH    34307.06  United Kingdom
SET OF 3 CAKE TINS PANTRY DESIGN     32607.80  United Kingdom
JAM MAKING SET WITH JARS             31658.47     Unspecified

Text(0.5, 0, 'Product ID')

Notice, the stock items have different pricepoints throughout the year. so it would be a good idea to see which price points proved to generate the most revenue. We need more data, such as profit, to really see the real optimal price. the below graph shows the top 20 products and their optimal pricing based on quantity sold: (see revenue graph in the next cell)¶

#we noticed prices for same products are different at times, 
#we want to find out what price performed best, to see what prices we can get away with

#FIND MOST SALES PER PRICE POINT
price_performance = df.groupby(["StockCode", "UnitPrice"])["Quantity"].sum().reset_index()
best_prices = price_performance.loc[price_performance.groupby("StockCode")["Quantity"].idxmax()]

# Show first few rows
print(best_prices.head())

# Plot top 20 best-performing prices as a bar chart
plt.figure(figsize=(12, 6))
top_products = best_prices.sort_values(by="Quantity", ascending=False).head(20)
plt.bar(top_products["StockCode"].astype(str), top_products["UnitPrice"], color="skyblue")

plt.xlabel("Product StockCode")
plt.ylabel("Best Performing Price")
plt.title("Best Performing Prices for Top 20 Products")
plt.xticks(rotation=90)
plt.show()

   StockCode  UnitPrice  Quantity
1      10002       0.85       824
5      10080       0.39       301
7      10120       0.21       193
11     10125       0.85       966
15     10133       0.42      2046

Below is the graph filtering the different prices of our top 20 products BY REVENUE. It goes through sales at each pricepoint and finds the pricepoint that led to the greatest revenue for that item. the X axis is the total revenue generated at the optimal pricepoint in the data label.¶

# Find best price by revenue
price_performance = df.groupby(["StockCode", "UnitPrice"])["Quantity"].sum().reset_index()
price_performance["Revenue"] = price_performance["UnitPrice"] * price_performance["Quantity"]
best_prices = price_performance.loc[price_performance.groupby("StockCode")["Revenue"].idxmax()]

# Select top 20 revenue-generating products
top_products = best_prices.sort_values(by="Revenue", ascending=False).head(20)

# Create the bar chart
plt.figure(figsize=(13, 8))
bars = plt.bar(top_products["StockCode"].astype(str), top_products["Revenue"], color="green")

# Add labels above bars (show price instead of revenue)
for bar, price in zip(bars, top_products["UnitPrice"]):
    yval = bar.get_height()  # Get revenue (height of the bar)
    plt.text(bar.get_x() + bar.get_width()/2, yval + (yval * 0.02), f"${price:.2f}", ha="center", va="bottom", fontsize=10)

# Formatting
plt.xlabel("Product StockCode")
plt.ylabel("Total Revenue")
plt.title("Best Performing Prices for Top 20 Products")
plt.xticks(rotation=90)
plt.show()

curious on that item (22423 stockcode) that produced the most revenue:¶

df['StockCode'] = df['StockCode'].astype(str)
top_item = df[df['StockCode'] == '22423']

prices = top_item.groupby('UnitPrice')['Quantity'].sum().sort_values(ascending=False)
print(prices.head(50))

UnitPrice
10.95    7846
12.75    3796
24.96     783
25.49     287
12.72     251
4.00       30
5.00       20
9.86        5
32.04       4
0.00      -42
Name: Quantity, dtype: int64

Maximum unit prices for items¶

ppp=df.groupby('StockCode').agg({'UnitPrice':'max','Description':'first','Quantity':'sum'}).sort_values(by = 'UnitPrice', ascending=False)
print(ppp.head(25))

           UnitPrice                          Description  Quantity
StockCode                                                          
22655         295.00          VINTAGE RED KITCHEN CABINET        39
22656         295.00         VINTAGE BLUE KITCHEN CABINET        18
22826         195.00        LOVE SEAT ANTIQUE WHITE METAL        54
22828         165.00         REGENCY MIRROR WITH SHUTTERS        10
22827         165.00   RUSTIC  SEVENTEEN DRAWER SIDEBOARD        28
22823         125.00        CHEST NATURAL WOOD 20 DRAWERS        24
22833         100.00          HALL CABINET WITH 3 DRAWERS        76
23064          99.96               CINDERELLA CHANDELIER        142
21473          99.96    SWEETHEART CREAM STEEL TABLE RECT        27
22016          83.33  Dotcomgiftshop Gift Voucher £100.00         1
22803          82.50             IVORY EMBROIDERED QUILT         54
21769          79.95          VINTAGE POST OFFICE CABINET         6
21686          78.29           MEDINA STAMPED METAL STOOL        36
84078A         76.55     SET/4 WHITE RETRO STORAGE CUBES        380
22929          65.00               SCHOOL DESK AND CHAIR         16
84632          59.95     DECORATIVE HANGING SHELVING UNIT         5
22503          59.53            CABIN BAG VINTAGE PAISLEY        80
22504          58.29          CABIN BAG VINTAGE RETROSPOT       242
21476          58.29   STEEL SWEETHEART ROUND TABLE CREAM        41
84616          58.29            SILVER ROCCOCO CHANDELIER        11
17003          57.60                  BROCADE RING PURSE      23053
22769          51.02         CHALKBOARD KITCHEN ORGANISER        -5
22764          49.96   RUSTIC WOODEN CABINET, GLASS DOORS         7
23485          49.96        BOTANICAL GARDENS WALL CLOCK        244
84963B         49.95          BLUE PAINTED KASHMIRI CHAIR         3

Below I was curious about a top customer's (14646 on customer pie chart) favorite items:¶

df['CustomerID'] = df['CustomerID'].astype(str)
t = df[df['CustomerID'] == '14646']
tt = t.groupby('Description').agg({'Quantity':'sum','UnitPrice':'max','revenue':'sum'}).sort_values(by='Quantity',ascending=False)
print(tt.head(50))

                                     Quantity  UnitPrice  revenue
Description                                                      
RABBIT NIGHT LIGHT                       4801       2.08  9568.48
SPACEBOY LUNCH BOX                       4492       1.95  7415.40
PACK OF 72 RETROSPOT CAKE CASES          4104       0.55  1726.80
DOLLY GIRL LUNCH BOX                     4096       1.65  6758.40
ROUND SNACK BOXES SET OF4 WOODLAND       3120       2.55  7956.00
RED TOADSTOOL LED NIGHT LIGHT            2376       1.65  3459.60
WOODLAND CHARLOTTE BAG                   2300       0.72  1656.00
RED RETROSPOT CHARLOTTE BAG              2100       0.72  1512.00
5 HOOK HANGER RED MAGIC TOADSTOOL        2016       1.45  2923.20
JUMBO BAG RED RETROSPOT                  2000       1.79  3468.00
SPACEBOY BIRTHDAY CARD                   1872       0.36   673.92
GINGHAM HEART DECORATION                 1872       0.72  1347.84
MINI PAINT SET VINTAGE                   1728       0.55   950.40
PLASTERS IN TIN SPACEBOY                 1632       1.45  2366.40
ROUND SNACK BOXES SET OF 4 FRUITS        1584       2.55  4039.20
PLASTERS IN TIN WOODLAND ANIMALS         1536       1.45  2227.20
JUMBO BAG WOODLAND ANIMALS               1500       1.79  2629.00
PACK OF 60 SPACEBOY CAKE CASES           1440       0.42   604.80
CHARLOTTE BAG PINK POLKADOT              1400       0.72  1008.00
PACK OF 60 MUSHROOM CAKE CASES           1320       0.42   554.40
CHARLOTTE BAG APPLES DESIGN              1301       0.85   936.85
LUNCH BAG WOODLAND                       1300       1.45  1885.00
LUNCH BAG RED RETROSPOT                  1300       1.45  1885.00
CARD DOLLY GIRL                          1296       0.36   466.56
SPACEBOY BEAKER                          1201       1.25  1297.25
 DOLLY GIRL BEAKER                       1201       1.25  1297.25
60 CAKE CASES DOLLY GIRL DESIGN          1200       0.42   504.00
LUNCH BAG SPACEBOY DESIGN                1200       1.45  1740.00
FOLKART ZINC HEART CHRISTMAS DEC         1152       0.72   829.44
CHARLOTTE BAG DOLLY GIRL DESIGN          1100       0.72   792.00
GUMBALL COAT RACK                        1044       2.10  2192.40
PACK OF 12 TRADITIONAL CRAYONS           1008       0.36   362.88
FOOD CONTAINER SET 3 LOVE HEART          1008       1.65  1663.20
FILIGRIS HEART WITH BUTTERFLY            1008       1.25  1095.84
CHARLOTTE BAG SUKI DESIGN                1002       0.85   721.70
CHARLOTTE BAG VINTAGE ALPHABET           1001       0.85   720.85
STRAWBERRY LUNCH BOX WITH CUTLERY         967       2.55  2033.85
PACK OF 20 NAPKINS RED APPLES             960       0.72   668.16
60 TEATIME FAIRY CAKE CASES               960       0.42   403.20
72 SWEETHEART FAIRY CAKE CASES            960       0.42   403.20
SET/10 RED POLKADOT PARTY CANDLES         912       1.25   975.84
SET/10 BLUE POLKADOT PARTY CANDLES        864       1.06   915.84
SET/10 PINK POLKADOT PARTY CANDLES        864       1.06   915.84
PLASTERS IN TIN VINTAGE PAISLEY           864       1.45  1252.80
10 COLOUR SPACEBOY PEN                    864       0.72   622.08
PACK OF 60 DINOSAUR CAKE CASES            840       0.42   352.80
LUNCH BAG PINK POLKADOT                   820       1.65  1193.00
LUNCH BAG APPLE DESIGN                    801       1.65  1161.65
DOUGHNUT LIP GLOSS                        800       1.04   832.00
SPACEBOY CHILDRENS BOWL                   793       1.25   847.25

	InvoiceNo	StockCode	Description	Quantity	InvoiceDate	UnitPrice	CustomerID	Country	revenue	DayOfWeek	Month
540421	581483	23843	PAPER CRAFT , LITTLE BIRDIE	80995	2011-12-09 09:15:00	2.08	16446	United Kingdom	168469.60	Friday	December
61619	541431	23166	MEDIUM CERAMIC TOP STORAGE JAR	74215	2011-01-18 10:01:00	1.04	12346	United Kingdom	77183.60	Tuesday	January
348325	567423	23243	SET OF TEA COFFEE SUGAR TINS PANTRY	1412	2011-09-20 11:05:00	5.06	17450	United Kingdom	7144.72	Tuesday	September
52711	540815	21108	FAIRY CAKE FLANNEL ASSORTED COLOUR	3114	2011-01-11 12:55:00	2.10	15749	United Kingdom	6539.40	Tuesday	January
160546	550461	21108	FAIRY CAKE FLANNEL ASSORTED COLOUR	3114	2011-04-18 13:20:00	2.10	15749	United Kingdom	6539.40	Monday	April
52709	540815	85123A	WHITE HANGING HEART T-LIGHT HOLDER	1930	2011-01-11 12:55:00	2.55	15749	United Kingdom	4921.50	Tuesday	January
467804	576365	22086	PAPER CHAIN KIT 50'S CHRISTMAS	688	2011-11-14 17:55:00	6.95	0	United Kingdom	4781.60	Monday	November
160542	550461	85123A	WHITE HANGING HEART T-LIGHT HOLDER	1930	2011-04-18 13:20:00	2.40	15749	United Kingdom	4632.00	Monday	April
52772	540818	48185	DOORMAT FAIRY CAKE	670	2011-01-11 12:57:00	6.75	15749	United Kingdom	4522.50	Tuesday	January
244419	558526	23173	REGENCY TEAPOT ROSES	540	2011-06-30 11:01:00	8.15	17949	United Kingdom	4401.00	Thursday	June
160544	550461	48185	DOORMAT FAIRY CAKE	670	2011-04-18 13:20:00	6.35	15749	United Kingdom	4254.50	Monday	April
291249	562439	84879	ASSORTED COLOUR BIRD ORNAMENT	2880	2011-08-04 18:06:00	1.45	12931	United Kingdom	4176.00	Thursday	August
52134	540689	22470	HEART OF WICKER LARGE	1284	2011-01-11 08:43:00	3.21	17450	United Kingdom	4121.64	Tuesday	January
533812	581115	22413	METAL SIGN TAKE IT OR LEAVE IT	1404	2011-12-07 12:20:00	2.75	15195	United Kingdom	3861.00	Wednesday	December
16438	537659	21623	VINTAGE UNION JACK MEMOBOARD	600	2010-12-07 16:43:00	6.38	18102	United Kingdom	3828.00	Tuesday	December
348319	567423	23113	PANTRY CHOPPING BOARD	756	2011-09-20 11:05:00	5.06	17450	United Kingdom	3825.36	Tuesday	September
52710	540815	21175	GIN + TONIC DIET METAL SIGN	2000	2011-01-11 12:55:00	1.85	15749	United Kingdom	3700.00	Tuesday	January
348323	567423	22722	SET OF 6 SPICE TINS PANTRY DESIGN	852	2011-09-20 11:05:00	4.25	17450	United Kingdom	3621.00	Tuesday	September
447035	574941	22197	POPCORN HOLDER	1820	2011-11-07 17:42:00	1.95	0	United Kingdom	3549.00	Monday	November
160545	550461	21175	GIN + TONIC DIET METAL SIGN	2000	2011-04-18 13:20:00	1.69	15749	United Kingdom	3380.00	Monday	April
447040	574941	22086	PAPER CHAIN KIT 50'S CHRISTMAS	478	2011-11-07 17:42:00	6.95	0	United Kingdom	3322.10	Monday	November
52771	540818	47556B	TEA TIME TEA TOWELS	1300	2011-01-11 12:57:00	2.55	15749	United Kingdom	3315.00	Tuesday	January
411229	572209	23556	LANDMARK FRAME COVENT GARDEN	300	2011-10-21 12:08:00	10.95	18102	United Kingdom	3285.00	Friday	October
411235	572209	23557	LANDMARK FRAME BAKER STREET	300	2011-10-21 12:08:00	10.95	18102	United Kingdom	3285.00	Friday	October
411234	572209	23553	LANDMARK FRAME CAMDEN TOWN	300	2011-10-21 12:08:00	10.95	18102	United Kingdom	3285.00	Friday	October