US Labor Market Analysis – The Best Cities to Find a Job
Check out my article on Medium titled “You’re not going to find a job in the popular cities (here’s what you should do instead)” which explains the back story behind this analysis. The dashboard below is fully interactive and automatically updated every Saturday with the latest data from Indeed.com and the Bureau of Labor Statistics.
This was a challenging project because I wanted to achieve a user experience with a lot of flexibility while at the same time keeping the report fast and responsive. I achieved my goals by implementing several DAX design patterns.
With this dashboard, I wanted to be able to easily filter by different percentile ranges for a variety of measures. For example, I can just as easily ask the question “what counties are in the top 1% for unemployment rate” as I can ask “what counties are in the bottom 5% for labor force size.” To accomplish this, I implemented the Dynamic Segmentation pattern which allows me to recalculate one measure (for example Openings:Unemployed) for all counties that fall in a given percentile range when calculated by another measure (i.e. Job Find Index).
While I achieved my desired user experience I soon noticed a critical fault. Selecting different percentiles, changing measures, or drilling down in the dashboard was slowing things down substantially. Each recalculation took about 5 seconds.
I dug deeply into learning about the inner workings of Power BI and determined that the best approach would be to pre-calculate every possible county/measure/percentile combination ahead of time. As such, the heavy lifting would happen only once when the report is refreshed at the data source. I implemented a calculated table which works as follows: for each possible combination (i.e. the Cartesian product of my counties table, my percentile ranges table, and my “selected measures” table), Power BI pre-calculates whether the county qualifies for that given measure/percentile. That results in 3220 counties * 8 percentile ranges * 5 measures = 128,800 rows. While this may seem like a lot, it’s actually very efficient for the PowerBI storage engine to process (apparently it can handle billions of rows easily.) Each time the user selects a different configuration in the dashboard, the storage engine simply locates that row in the calculated table and returns whether the county qualifies or not and the slower Power BI formula engine is largely bypassed. The performance boost was incredible – each change on the report is recalculated almost instantly now.