python – Async download of files

The following code is an async beginner exploration into asynchronous file downloads, to try and improve the download time of files from a specific website.


Tasks:

The tasks are as follows:

  1. Visit https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-services-monthly-statistics

  2. Extract the latest publication link. It is the top link under Latest Statistics and is the first match returned by css selector .cta__button.

    enter image description here

N.B. Each month this link updates so the next monthly publication e.g. 8 Apr 2021 the link will update to that associated with Mental Health Services Monthly Statistics Performance January, Provisional February 2021.

  1. Visit the extracted link: https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-services-monthly-statistics/performance-december-2020-provisional-january-2021 and, from there, extract the download links for all the files listed under Resources; currently 17 files.

    enter image description here

  2. Finally, download all those files, using the retrieved urls, and save them to the location specified by folder variable.


Set-up:

Python 3.9.0 64-bit Windows 10


Request:

I would appreciate any suggested improvements to this code e.g. should I have re-factored the co-routine fetch_download_links into two co-routines, each with a ClientSession, where one co-routine gets the initial link to where to get the resources, and the second co-routine to retrieve the actual resources?


mhsmsAsynDownloads.py

import time
import os
from bs4 import BeautifulSoup as bs
import aiohttp
import aiofiles
import asyncio
import urllib

async def fetch_download_links(url:str) -> list:   
    async with aiohttp.ClientSession() as session:
        r = await session.get(url, ssl = False)
        html = await r.text()
        soup = bs(html, 'lxml')
        link = 'https://digital.nhs.uk' + soup.select_one('.cta__button')('href')
        r = await session.get(link, ssl = False)
        html = await r.text()
        soup = bs(html, 'lxml')
        files = (i('href') for i in soup.select('.attachment a'))

        return files


async def place_file(source: str) -> None:
    async with aiohttp.ClientSession() as session:
        file_name = source.split('/')(-1)
        file_name = urllib.parse.unquote(file_name)
        r = await session.get(source, ssl = False)
        content = await r.read()
    
    async with aiofiles.open(folder + file_name, 'wb') as f:
        await f.write(content)

    
async def main():
    tasks = ()
    urls = await fetch_download_links('https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-services-monthly-statistics')
    
    for url in urls:

        tasks.append(place_file(url))   
    
    await asyncio.gather(*tasks)

        
folder = 'C:/Users/<User>/OneDrive/Desktop/testing/'

if __name__ == '__main__':
   
    t1 = time.perf_counter()   
    print("process started...")   
    asyncio.run(main())
    os.startfile(folder(:-1))
    t2 = time.perf_counter()
    print(f'Completed in {t2-t1} seconds.')

References/Notes:

  1. I wrote this after watching an async tutorial on YouTube by Andrei Dumitrescu.
    The example above is my own
  2. https://docs.aiohttp.org/en/v0.20.0/client.html
  3. https://docs.aiohttp.org/en/stable/client_advanced.html
  4. Data is publicly available