Blogs

Don’t be Led Astray on Kaggle!

Analysis of a cleaned "Diabetes Health Indicators" dataset for usefulness in classifying patients revealed deep flaws in the data, methodolgy used by fellow Kagglers and emphasised the importance of justifying decisions in each stage of the data science process. Lessons learnt complete with visualisations can be read in the linked blog.

NHS English Prescribing Data (EPD) Analysis using Python (Part 2)

In part 2, I show how much more efficient it is to extract the data using SQL before cleaning. Highlights from exploratory data analysis are shared and I demonstrate the production of a simple distributed program in preparation for computationally expensive tasks.

This blog can also be found in the NHS Python Community Website.

NHS English Prescribing Data (EPD) Analysis using Python (Part 1)

Part one focuses on the use of Requests and Pandas to extract data from the open data portal (CKAN) API.
Time, memory and storage efficiency of the method was explored.

This blog can also be found in the NHS Python Community Website.