A simple guide to web scraping in Power Automate
If you’re a technophobe, you probably want to avoid coding at all costs.
Thankfully, Microsoft Power Automate empowers you to create powerful workflows without a jot of code. No-code (and low-code) solutions are all the rage in 2022, so it’s time to jump on the bandwagon!
I must confess: I’m new to Power Automate, and you probably are too. This guide will only give a glimpse of the magical things you can do with this tool, and it’ll focus on web scraping.
The reasons for web scraping are varied. Examples include:
- market research
- price comparison
- competition monitoring
- lead generation
- sports betting odds analysis
Power Automate fills the void Excel has as a web scraper. If you’ve ever used the From Web option in the Data tab, you’ll know how restrictive it is. Unless your chosen website displays tabular data in HTML tables, you’re stuck. VBA can circumvent these limitations, but it’s fiddly and bothersome.
What are we going to scrape?
We’re going to scrape the basic profile information for every single Microsoft MVP. There are 3,121!
The website uses pagination as well, which makes it a good introductory example.
A Microsoft Most Valuable Professional (MVP) is an award for individuals who’ve exhibited outstanding technical expertise and a passion for sharing knowledge in their technical community.
Prerequisites
Power Automate for desktop comes preinstalled on Windows 11, so just search for it. For prior versions, download it from the Microsoft Store.
Given the nature of what we’re doing, you’ll also require the web browser add-on/extension:
This guide assumes you’re using Microsoft Edge, so adapt any instructions for Google Chrome, if necessary.
More information about the installation process is found on Microsoft’s website.
Let’s scrape the data
1.
Open Power Automate and select New flow.
2.
Enter ‘Microsoft MVP Data’ as the Flow name, and then select Create.
3.
In the Browser automation dropdown, drag Launch new Microsoft Edge into the main pane for a pop-up window to appear.
4.
Paste https://mvp.microsoft.com/en-us/MvpSearch into the Initial URL field.
Keep the default options for Launch mode and Window state, as we want a fresh regular-sized browser window to open.
Select Save.
5.
Select Run to open the website.
6.
Select Recorder to open the Recorder window.
7.
Select Record to start scraping.
8.
Right-click on the first profile image and select Extract element value > Src.
Details of your latest action appends to the previous one in Recorded actions.
There are also several options available:
- Temporarily halt any actions from registering.
- Remove all actions from the list.
- Remove the unwanted action from the list.
9.
Extract the name, Award Category, and Country or region using the same process as the previous step, except choosing the Text option instead.
10.
Right-click the second profile image and select Extract element value > Src.
Boom! The elements of the other profiles all get selected. Power Automate knows what you’re trying to do, eh?
11.
Right-click the Next page symbol and select Set this element as pager to ensure each page is cycled through.
12.
At this point, the Recorded actions has updated to reflect your intentions. The four values of each profile are stored in the variable OutputData
.
Select Done to confirm the actions.
13.
Power Automate added an unnecessary Launch Microsoft Edge action when the recorder started. To remove this, right-click on it and choose Delete.
14.
Double-click on Extract data from web page to launch a new window.
15.
In the Store data mode dropdown, choose Excel spreadsheet to update the variable to ExcelInstance
, and then select Save.
16.
In the Excel dropdown, drag Save Excel into the main pane.
17.
In the Save mode dropdown, choose Save document as, select a file destination and name for the Document path. Then select Save.
18.
In the Excel dropdown, drag Close Excel into the main pane.
19.
In the Before closing Excel dropdown, choose Save document and then select Save.
20.
Save your flow and then select Run to execute it.
Sit back and enjoy!
Microsoft Edge opens the website and the MVP profile data is extracted from all 174 pages. The process takes about 13 minutes.
Once finished, you’ll see a glimpse of the data as Excel briefly opens and closes.
21.
Find the new Excel file in your chosen destination and open it.
That’s it! You can now prepare and transform the raw data to carry out analysis.
Final words
Wasn’t that easy? Power Automate is Microsoft’s hidden gem, and there’s no doubt it’ll gain in popularity as the word spreads.
There are lots of features I haven’t explored yet, so I’m looking forward to discovering more ways to automate tasks.
For now, what will you be scraping?
Great insights in this guide! As a<a href=”https://www.planethive.ai/services/expert-web-scraping”>Web Scraping Expert</a>, I can confirm that Power Automate is an excellent tool for automating data extraction tasks. Keep up the good work!
This doesn’t work for me. It opens the excel page every time, and overwrites the existing data with the new ones. How do i fix this?