The Rise of the Notebook Engineer – Ep. 381

Mike, Tommy, and Seth talk about the rise of the notebook engineer in Fabric: why notebooks are becoming the default interface for data work, and what good notebook practices look like in real teams.

News & Announcements

Episode Transcript

0:36 good morning and welcome back to the explicit measures podcast with Tommy Seth and Mike good morning everyone good morning gentlemen and a happy Tuesday all right well I’ve got to pick on you guys a little bit just before we went onto the podcast I was trying to say a quote from I don’t know was Transformers or something like that and I got it totally wrong and in the little chat window Tommy continued to ridicule me and SE like it is Transformers you got it all wrong like as they say a Cybertron says let’s

1:10 roll it was all wrong it was all wrong all not right it was all not right I was I was I was trying to say let’s start the podcast and I was trying to go like Autobots roll out or whatever that’s what the phrase is when you dude I miss I loved watching Transformers Saturday mornings oh man that was the best show great when the truck would drive down the road and it’s the same scene over and over again it’s the same thing the Scooby-Doo Scooby-Doo runs and it’s the same scene six times and it’s just the song over and over man it was like I love that

1:41 and over man it was like I love that stuff Scooby-Doo and and Transformers were like my thing I also like to qu bit of Care Bears Care Bears Bear stare what’s I’m getting I’m getting a bit of nostalgia lately because all of a sudden my daughter was watching a show on her tablet and it was Garfield and I was like when I was your age likei cartoons start I didn’t watch much Garfield Garfield and Friends

2:12 Garfield and Friends obviously I miss that Arnold guy Hey Arnold Hey Arnold I just remember the girl with the yellow hair that had went straight out sideways yeah what was your name Helga yeah sure and she called him football head awesome all right enough of enough of the the memories today’s topic here is quite a pointed one Tommy was saying this is not a very christmy article this was more of a it has a dark vibe to it U maybe

2:42 has a dark vibe to it U maybe potentially more of a Halloween well at least the artwork in the very beginning it’s called the rise of the notebook engineer someone sent this to me and said hey this would be a great topic you guys should talk about this one and I was like okay let’s just let’s tease this out of here a bit and this is 100% right so today we’re going to discuss about the rise of the notebook engineer and just unpack what this means I I don’t I have a lot of points that I think I disagree with in this article but there’re also valid points in here as well so I don’t know how we

3:12 in here as well so I don’t know how we grapple with this one this is going to be an interesting topic to to unpack what’s going on here the the article will be in the description of the video and it’s also in the chat window as well if you’re following along this this morning anyways let’s just do some regular announcements here Tommy We Haven an announced our news article here Source control control integration for the SQL database in Microsoft fabric how I’ve how I’ve talked about this in the past I’ve said I say words that people don’t understand if I said that sentence to normal people

3:43 if I said that sentence to normal people 90% of them would be like what are you talking about Tommy what are we talking about an icebreaker so what’s the yeah what an icebreaker for this is a good holiday party Topic start with this one when you go to your holiday parties this year what’s this talking about I also so and I think this going to be some of the last news we get from the Microsoft team they’re already putting out articles saying holiday recap but yeah we now have before it’s private preview I I believe I didn’t even check that but of course it’s in preview SQL databases fabric databases have Source control and

4:13 fabric databases have Source control and it looks like it’s pretty full-fledged in terms of the ability to Branch out to new workspace to do all your commits to the version control everything you’d expect from what is available in Source control and fabric now is available with the fabric databases obviously there’s going to be some limitations but they did not hold back in terms of getting that out the door in terms of the cicd capabilities getting started it’s pretty pretty neat there’s actually a Microsoft build SQL project

4:45 a Microsoft build SQL project environment that you can actually do utilizing command line or CLI commands or your terminal command so they went full-fledged here way more than our data flows but they I think they understood too if you’re building and I’m saying this with I understand there there’s are trepidations with some of the words I’m about to say understanding how important our Version Control is they are understanding that a SQL database is going to be a core part of what I think a lot of people are planning to

5:15 a lot of people are planning to do I I I agree with your point Tommy and I will definitely Echo the idea of wow other teams in Microsoft take note how quickly the tool was released and how quickly cicd and get was supported right behind it but I but I think I also we could give the the SQL team I think a bit of a pass here as well if you think about what sql’s been doing SQL doesn’t they’ve been building this stuff with get integration inside Azure like a long time so like get integration for SQL data bases is not a

5:46 integration for SQL data bases is not a new thing it’s been around for a while so it’s more like incorporating something we already own and know how to work with as opposed to data flows really had no concept of a get integration experience so that is is quite I think an advantage for the SQL database team one thing I will point out here that I really like about this one there’s two options here you can commit to Source control which means you create the database and you could commit here the fabric database commit

6:16 here the fabric database commit the items into the the git integration update from Source control I think this is going to be quite impactful so you can make changes directly into the repo and then it will pull its changes in add a new constraint change the table definition other like this would be great to have these other experiences where you can update things directly from U the git control so I think that’s a very useful feature one that I’m looking forward to using Seth you being the the squl guy

6:49 using Seth you being the the squl guy on the crew here from squl Land what what’s your have you yeah from squl land from from past tense yes I Still Play Still dabble exactly so what what are your thoughts on SQL things what are there any major Milestones or nuggets that you need to have SQL do that you are hoping that it will continue to as they bring that SQL experience directly into fabric I I don’t I don’t think anything is left off at that that I’m

7:21 anything is left off at that that I’m aware of right like it’s not like you’re pushing the bounds exponentially outside of some of the just rudimentary things that we can schedule a job we can execute store procedures we can build the things we need to do for data engineering so I I I haven’t noticed anything that’s just like a complete Miss off right off the rib one thing I will I will just point out here and this is not a paino yet I don’t know if this is automatable yet I’m hoping that we can get there one

7:52 yet I’m hoping that we can get there one thing I was looking for I was researching I didn’t I’m learning I’m not a sequel developer so this these are things that I’m stepping into a bit more have you heard about the backpack files that come out of SQL so if you do a a backup of a SQL database you get these things called backpack files and apparently if you there’s like an already BKP files yeah is that BAC pacc it’s a backup file it’s a backup file but puts backpack right it’s kind file but puts backpack right it’s fun name I think the name is

8:23 of fun name I think the name is fun oh whatever the backup file so there’s a the the backup file that you would Ed to go restore your database right so you take a backup of the file there’s an automated way that you can turn on a 7-Day backup of the SQL database and it’ll just store backups for your database and you can restore back into the SQL database which I thought was really interesting I like that idea of having to backup feature already built into the system my challenge was sometimes

8:50 system my challenge was sometimes companies have a SQL server and they’re only willing to give you a backup file of their data from the third party system whatever that may be and it would be really nice if I could store my backup files inside a lak house and I thought to myself why doesn’t the SQL Server actually just have an entire like doesn’t matter how many days you back it up if I have the Lakehouse and it’s part of my fabric environment why not let me just spool up a folder for SQL backup files go to my SQL environment point to

9:23 files go to my SQL environment point to this is the folder in this lake house where I want to store my backups and tell it how many days do I want 30 60 90 like I doesn’t it shouldn’t matter I should be able to add backup files there and then restore from those backup files directly on the SQL Server I just feel like that would been that would have been like a an experience that I that I would like to see in the tool maybe it’ll get there I’m not sure but I feel like there’s I think we still have to sift out like when you bring in these new tools into fabric there’s this experience of like okay we have SQL we

9:54 experience of like okay we have SQL we now have fabric there’s probably new features that should exist because the two tools are so uniquely different now and I think this is the give and take of having all these tools readily available by a push of a button we don’t have the full customization that you normally have if you’re doing it in another environment where you could do that pretty easily but you also have to have a lot of skill on the back end all the tools notebooks the SQL databases now they they’re not lacking necessarily full-fledged features but we don’t have

10:26 full-fledged features but we don’t have the same customization that you would normally have available yes like with SQL databases yes there’s users and roles but or there’s roles that I can create but still everything through active directory I can’t create a basic user and have someone log in that way it has to be through Microsoft entra yes or right or active directory so I think a lot of these tools we’re just gonna have to deal right now with 98% of what most people are trying to achieve is available in these products but that

10:57 available in these products but that extra customization that you would Norm do had you been that life in that past is just not going to be there now or maybe in the future so yeah I agree I want to I want to Pivot here just slightly a bit here one thing I just want to point out there’s an event coming up here recently I’ll see if I can get the link here very quickly as we talk about this one there’s the Milwaukee user group is going to meet we’re going to we’re going to fire things up we have a meeting this Friday at the Milwaukee user group so if you are in the Milwaukee area or you’re

11:27 you are in the Milwaukee area or you’re listening from the Milwaukee area we’re going to do a a Friday user group for powerbi fabric we’re going to go over Dax query view so we’re going to spend some time in Dax query view writing some formulas exploring where you can see it do some things in in the service do some things in desktop just a quick overview of what you can do there we might to explore some of the new info functions as well so there’s some new info functions to automatically document your model we’ll go through those as well and and put point out a couple things so just want to call that out I will go get the Meetup item here and we’ll put

11:58 the Meetup item here and we’ll put that Link in the chat window if you want to join we will be at Milwaukee tool will’ll be the the host of our event this year so is it in hybrid or we just do it in person we’re going to try to do hybrid I’m going to try and drop in a link do a little bit of video for just the tutorial portion of it if it doesn’t go well we’ll just ditch the the live portion and we’ll just or we’ll ditch the the online portion and we’ll just do live so we’re going to attempt to do both we’ll see how well we go again it depends on like good strong internet and all the other things that

12:28 internet and all the other things that come along with say a hybrid event but we’ll we’ll do the best we can to broadcast as best we can into the internets if nothing else maybe I’ll record it and catch it back later on anyways that being said I will go grab the link for the event if you want to join us we would would love to have more people joining the event please make sure you sign up if you are going to come just because we need to know a count of people and we need to let the people the front desk know who you are and how many people are going to show up all right that being said I don’t have

12:58 all right that being said I don’t have any other on I have one more item here do we have another a little bit more time for another topic sure okay always time all right I’ve been so people are very excited about the SQL analytics endpoint and Lakehouse and and people are now I believe building a lot more things inside the powerbi ecosystem one thing that I have found or discovered as I’ve been working on the system is the the SQL analytics endpoint does not refresh as fast as the data you put into the lake housee let me describe if you

13:28 the lake housee let me describe if you have a a notebook and your notebook is writing new tables or updating existing tables in your lake house those tables are getting a new definition Json file there’s a there’s a a file that describes the paret tables the Delta tables and then that file can be read by other tools so desktop or SQL analytics endpoint those read those definition files to go find the latest version of a table so over time I’m writing data to a table table version one version two version three what happens is there’s a

13:59 version three what happens is there’s a slight delay I don’t know what the delay is I’ve seen it range anywhere between five minutes to like an hour depending on the cue of how busy things are maybe in fabric but the the tables created from The Lakehouse are versioned higher than what the SQL analytics endpoint observes from the table I don’t know what’s going on there but apparently there’s a hack that you can run a SQL analytic refresh or a resync of the SQL table looking at the at the data table

14:30 table looking at the at the data table so I believe there’s an API eventually coming that might help us out with this or we need an API to help us out with this but for now just be aware if you’re writing data into the Lakehouse and you’re expecting like a very fast turnaround time from the table’s been written and the sequel analytics endpoint is immediately able to read that data there is a bit of a lag between the version of the Delta table and the version that the SQL analytics endpoint is reading so just be aware of that curious does is that affect the is it a cache or is it also affecting like

15:01 it a cache or is it also affecting like direct Lake like if I were to pull data from into a powerbi model everything s the SQL so my understanding is and what I’ve been doing in some testing around is when the SQL analytics endpoint is trying to talk to the Delta table there’s a stored definition of what version that table is on the SQL analytics endpoint is not updating as fast as the actual Delta table is updating so there’s a little bit of a lag or lead time what I found to work around this was if going to read data from the lake house and you need it to

15:31 from the lake house and you need it to be a fast like I’m going to write the table and then I’m going to immedately read the table again do it inside a notebook because notebooks obey that time frame and will immediately read the most recent version of that data with no lag or no lead time so just FYI about that I think this solution is going to be fixed I’m trying to put my finger on someone had a hack there’s like a you can run a SQL analytics command using like a a command to actually like figure out how to make it refresh but

16:01 figure out how to make it refresh but just refresh itself so so it’s almost like the same thing with like a semantic model you want the semantic model to reframe go point at the latest data reframe the data go get the RO recent version that’s what we want for SQL analytics endpoint it’s not quite there yet and I’m I’ve had this happen on multiple customers where we trying to read data and write it quickly into the Lakehouse and the SQL analytics endpoint is behind what the actual version of the table is which breaks your process sometimes so just be be aware of this I think it’s getting worked out I think it’ll be solved but it’s not

16:35 think it’ll be solved but it’s not it’s not super fluid at this point the way you’re describing things is it sounds like they just have it on a job on the back end exactly right yes that’s my understanding it sounds like there’s a job running through and hey I’m just going to scan through these tables figure out which ones are new okay just let’s just refresh those data sets speaking of opportunities for real time okay maybe you put it in your own system I was saying like the squl analytics real time to the data that’s being written in other places like there’s an

17:06 written in other places like there’s an opportunity for real time there you go that’s what they’re trying to that’s what the whole real time Buzz has been about anyways just for your an yeah and and to be clear this is what we’re doing is we’re doing like batch loading right we’re doing like a a single like kick off a job do a bunch of things and at the end of those things have something accomplished with the final table right these are things that are not happening like at midnight they’re like happening multiple times throughout the day and if you’re relying upon that SQL analytics endpoint to have the most recent view of a lakeh house table H there’s going to be a bit

17:36 house table H there’s going to be a bit of a delay there so just just plan

17:37 of a delay there so just just plan accordingly that’s something I’ve got myself stuck on and we’ve had to like re-engineer around it by using more notebooks and things as well so just FYI about that that’s like my news from the street here there’s probably going to be more topics around this one as it gets resolved in the future anyways enough of that anything else for news or beat from the streets that we want to talk about pretty good all right for December yeah this is great yeah to your point Tommy usually by this time in other

18:07 Tommy usually by this time in other years the the what has being developed slows down substantially there’s not been a lot of slowdown and I’m actually happy for a Christmas break through December and January maybe I can catch up a little bit and figure out what the heck has been developing so far there’s so many things coming out now it’s almost overwhelming to keep up with everything all right well that being said let’s jump into our main topic the rise of the notebook engineer here’s our our our link for today our topic for today we’ll jump into that one

18:39 today we’ll jump into that one directly all right Tommy give us you read a little bit of the article give us maybe a bit of an overview of where we go and maybe we can dive into the articles some specifically some areas here it’s a heated article this is a yeah there are calls the Spartacus there are calls to banding together as a culture here there’s no whole back here so the article we’re talking about is by Daniel Beach and it’s really about the criticism of notebook usage and the amount of usage there is being used in production life cycles being able to create it so

19:11 life cycles being able to create it so easily interestingly enough there’s no mention here of fabric this is really more mentioning just the datab bricks environment but obviously obviously I think this is even more Paramount with what can be created in Fabric and we’re really just dealing with the calls to we have to have a better system rather than just notebook creation whether it’s a skill challenge or the lack thereof whether it’s the technology challenge or vendors and I don’t know there’s a few places like I

19:42 don’t know there’s a few places like I said there are no holding back in terms of Daniel’s opinions here let’s take this from the fabric point of view where notebook creation is easier than ever it is maybe one of the easiest things to do in fabric available in a Lake House available in a SQL data or actually not a SQL database yet but available in a Lakehouse available in a warehouse to bring data into fabric through the notebook and again Mike your big calling card has been well a lot of people are going to

20:14 been well a lot of people are going to transition to The Notebook a lot of people are going to migrate from the power query side of things into the notebook but there’s a challenge here and I I think that’s probably a good place to start is there’s obviously a point of contention that’s happening and I assume it’s more than just Daniel here in terms of the ability to use notebooks and more more importantly use them in a production environment so let’s start there yeah so this is an interesting article where I’m coming from is more of

20:44 article where I’m coming from is more of that business user area right so I’m looking at this lens of this article as I have been a business user I’m stepping into data engineering as a I’m typically used to excel I don’t I don’t have haven’t been given a lot of tools I use what I can to make things work right I maybe I’m doing access maybe I’m doing Excel right my for me the business person when I started this I thought man power Cory is the best thing ever I’m going to use power Cory all day long this is great helps me get stuff done I can engineer what I want I can automate for me the aha moment was

21:16 automate for me the aha moment was everything can be automated this article is almost ripping apart the idea of like okay once you get yourself to data flows and then once you get from data flows into like notebooks everyone just sits in notebooks and they’re just building lots of data engineering things and the my my gripe with this article a little bit is this is one of my my pet peeves here is he he talks a lot about life cycles life cycle management how do you get I could do the same thing a notebook does 10 times more efficient and I could do it writing my

21:47 efficient and I could do it writing my normal data engineering code whatever that may be but it doesn’t really allude to what tools are you using to do data engineering I I I felt like there was a whole Gap here he was very much ragging on The Notebook engineer but he wasn’t giving a comparable like what is the expected tooling for a data engineer to use to produce these results of shaping and transforming and moving around data and so to me it was like okay I understand your point you’re ragging on the notebook engineer and everyone thinks they’re now a data

22:17 everyone thinks they’re now a data engineer because they know how to know how to run a notebook but what is the equivalent tool that you should be using as a data engineer what are those tools and I didn’t really see anything in the article that gave any alternatives so I felt like it was calling the pot black here a bit don’t do don’t do notebooks notebook engineering is not the right way to go and then no response to what should you be using yeah I in terms of like General areas of opportunity in

22:48 like General areas of opportunity in the article I would agree the the the same thing I would have liked to have seen it was a lot of rant against notebooks and I thought it was a little lackluster and providing alternative Solutions of saying like hey like here here are some tool sets that you should be using outside of just a general like broad Strokes of software engineering principles or best practices or things that you should be doing right yep and and we we can all rally around like and maybe it’s the structure of

23:19 like and maybe it’s the structure of that itself that he’s he’s pinning or pitting against notebooks where you pitting against notebooks where we’ve had tons of kind ation like know we’ve had tons of kind ation like how how do you create a CCD pipeline how do you do Version Control how do you do like how do you standardize your process and it’s it’s plug-and playay with notebooks versus some of the more fundamental approaches of using different tool sets that allow for a very rigid deployment and checks and

23:49 very rigid deployment and checks and balances and testing and all the things that that produce unexpected outcome in a repeated way right yeah you’re using functions and objects and and different things right I could argue you could do the same thing in notebooks right like you can reuse objects and notebooks and like whatever yep so yeah there there’s a little there’s a little distance in there just from the standpoint that I don’t understand I think he’s coming from that that realm of like very rigid structured things

24:20 of like very rigid structured things that are deployed against more of the Wild West that of notebooks and it’s like like even what talking about with the intro yeah Fabrica is introducing these things and like you can start to utilize them and whatever but it’s not hey you’re G to start here and if we’re it it speaks more of like pure data engineering practices versus where I think we find ourselves and he points it out yes where you’re more

24:50 points it out yes where you’re more of the developer data developer right like you you cross these boundaries and and yeah with these new tools sets organizations are 100% in the realm of get it done yep right and and I think that fundamentally also changes with the size of data right like when you’re in Big Data Systems like data bricks or like fabric even right like the volumes of data that you’re interacting with that’s where I would have been like okay

25:20 that’s where I would have been like okay I would love to see what alternatives there are that you’re using in this local pipeline to push things or you local pipeline to push things or what what dire direction is there know what what dire direction is there outside of notebooks if if you’re such a if you have such a strong opinion and not liking them and I’m going to take a slight step back because Mike something that you mentioned in the beginning initially that I think we’re gonna have the internal argument on is just because you’re coming from the the business side of things you said Excel or access and now I’m creating notebooks hey Mom I’m an engineer that’s

25:52 notebooks hey Mom I’m an engineer that’s probably not necessarily the case and true again I’m going to forever right now I think I’m going to have the internal argument with you that to take someone from the business and then to put them in a python environment or even a SQL environment in a notebook is still difficult more than the user interfaces available in power query or heck even in a pipeline that at least has a user interface rather than a language but now there’s another introduction here outside of the skill barrier to get into notebooks is the process side right one of the biggest things that we’ve dealt

26:23 of the biggest things that we’ve dealt with with poor powerbi development was

26:26 with with poor powerbi development was the poor to your point scaling or automation of reports and models where there’s so much overlap of the data there and I think we’re going to see so much more of this fruition not fruition but we’re gonna see so much more of this buildup of gunk in fabric because you’re gonna deal with not only are you gonna have lack of good code in notebooks in fabric again I’m taking this from a fabric point of view but we’re going to see a lot of poor notebook naming a lot

26:56 see a lot of poor notebook naming a lot of overlap and I think the notebook creation and these people who are now saying they’re Engineers or tasked with engineering jobs because fabric exists without without the proper knowledge of the process nor the skill level and I think that’s just saying if you can create notebooks right now if you actually have some skill in at least having that knowledge which I think is we’re still really going to lack again taking from the normal powerbi or fabric user not saying from someone with

27:28 user not saying from someone with background or someone with a background in python or in Notebook creation I guess I agree with you on some degrees but i’ also I also feel like in this information age as where we are right now there’s a number of free classes educational things and I I I keep wanting to go back to like I understand your point Tommy but I’m I’m going back to this regular thought in my head that is Microsoft is continually democratizing

27:58 democratizing bringing to everyone the ability to do data engineering at some level you may have you may hold a tighter Rin on what data engineering that term means but reg regardless Microsoft is expecting or or Lang the language they’re using is that data is part of your business and anyone can shape it transform it and get it ready to go into reporting I think this is a Monumental shift at as to where we’ve been where

28:29 shift at as to where we’ve been where most of those keys to that type of like the tooling what you had to pay for If I think about what happened mean if I think about what happened before fabric we had to go into Azure spin up various tools know how the all the tools work together there was a lot more configuration right I think Microsoft has taken the right approach here and I don’t know if you guys saw it but there was a there’s an an email I guess sent out to everyone from it’s like an advertising thing for the Microsoft fabric conference in Las Vegas I don’t did you guys see that email come

29:00 I don’t did you guys see that email come out they’re doing they’re doing the the Microsoft fabric conference inside I think it’s it’s inside the T-Mobile Arena which is a huge building in in Las Vegas so the Microsoft Fabrics Community is taking over the T-Mobile Arena out in Las Vegas it’s huge apparently that place seats 20, 000 people and and they’re pushing for like a really big event they’re excited about this like I think what Microsoft the message mean I think what Microsoft the message here is Microsoft is here to play in the

29:32 here is Microsoft is here to play in the data game we are building this incredible new platform that everyone can get their hands on and so they’re not just tackling just to your to this guy’s article point right we’re not just tackling the people that are only doing the data engineering In Articles we’re going after everyone everyone’s going to be able to to lever data at some degree and I think this is really interesting and I that’s what I’m trying to unpack here I don’t I agree that if you have just a brand new notebook engineer yeah you’re probably not going to get the most efficient thing and there’s a lot of other considerations I’d like to

30:03 lot of other considerations I’d like to think through which is I think where the community needs to step up here a bit and give some more information about which notebook experience is the fastest we’ve been hearing now about polers and what’s the other one duck DB these are two pure python related languages and now we have python notebooks so we now have a spark based notebook we now have a python based notebook and we have a tsql based notebook all these notebooks execute on top of different compute engines and now you have to know which one which which

30:34 have to know which one which which engine to use if you’re doing Mega Big Data probably Sparks your option but if you’re not doing super big data and you’re moving just around smaller mediumsized tables maybe the just the python would be would be better and more cheap to run as part of compute units so maybe this D data engineering role is becoming becoming more selective in that notebook experience cuz yeah but you say books it’s a huge topic of information so anyways those are unpacking my thoughts there around that

31:04 unpacking my thoughts there around that part of it it’s interesting about the creation of notebooks though too one of the things that Daniel says that again not holding back here if you don’t agree with me meaning you abuse notebooks and encourage others to follow in your wickedness yes that’s what that’s me that’s me this is why I was like I’m reading this article thinking like he’s literally talking he’s talking to me right now yeah and again he the funny thing here is we’re not even mentioning fabric here so the democratization that you’re talking about I don’t know if Daniel even know exists or even this

31:35 Daniel even know exists or even this whole article if you read this I I it’s not even coming from a fact like oh this is available not just to other Engineers this is available to everyone that you’re saying and this is where I’m I’m really having trouble agreeing and I have I’ve converted to notebooks myself in terms of love the experience I learn the language I implemented part of the processes and what I do for on projects but that being said I’ve also seen myself go through the process and to try

32:06 myself go through the process and to try to imagine an organization that did not have this background of people who were let’s say just doing powerbi and they’re doing models and even if they were doing it an advanced version to now say we’re going to go notebook we’re going to go notebook Centric right that’s going to be the main way we process store process and then output data we don’t have a lot of good practices in articles and governance around that in our world in the powerbi

32:36 around that in our world in the powerbi world in the we’ll say the Microsoft world right obviously in data bricks there there are but for the majority of people who are coming from the Microsoft realm we don’t have governance or best practices or user groups that are involved around a team around notebooks and I think that’s where for me the concern is well it’s it’s where his concern is too I think yeah and I I think it come it comes from a good place right like I I think I think it does I think the the the structure for a

33:09 think the the the structure for a repeatable outcome right checks and balances like like this the software engineering process has that built and baked in right and how how much of that is getting skipped when you just use notebooks right like when you have access to all the data when you can manipulate it when you’re building your Medallion architecture all these things in notebooks how much of those checks and balances are you building into that

33:40 and balances are you building into that process because it’s a manual effort versus and and intentional versus working with platforms that force you to do that and what is the result of that the result is like bad data sets reprocessing data like there there are a number of you like there there are a number of issues that come out of data bricks know issues that come out of data bricks own like or I should say notebook only solutions but I the Counterpoint to that

34:10 solutions but I the Counterpoint to that I think in my mind is well let me finish this thought and and and it dovetails but it I think the strategies for in today’s day and age dovetail with what Mike was saying right like just be like because these tool sets have become wider like EAS more easily adopted by more business folks that doesn’t mean that we we like the teams that are responsible for creating the strategy and the

34:42 creating the strategy and the implementation of key data structures are not following those pure data engineering rules they absolutely are yeah there may be some things that they’re skipping right or if you’re only in a notebooks environment but that coincides with people using these tools to get things done and to your point Tommy I think that’s going to create an immense amount of noise in these systems because people are have access to things that they do they have zero knowledge of data engineering

35:13 zero knowledge of data engineering principles but is the business getting more value because they have access to it and because they can do the things that they are through this analysis through these other things and like Daniel makes a an argument that data that notebooks are are good for analysis I think to your point you’re just creating a volume of like garbage in the ecosystem it related to naming and things but I think that’s where you need to be diligent about how you build these

35:43 to be diligent about how you build these these structures Within These new types of environments and that that dovetails into my last point the there there’s a negativity towards like notebooks and maybe it’s because they’re fundamentally isn’t a structure there right I C I can do whatever I want thing and and that does have risk to it but that doesn’t mean that every implementation from a data engineering perspective that

36:13 from a data engineering perspective that uses notebooks is a bad one and I think he plays that a little bit in the article what he doesn’t pay homage to is even if you follow some software engineering best practices or you get some of those pieces because not every software engineering process has them all you can build the most convoluted insanely difficult to understand solutions that that are I would argue

36:43 solutions that that are I would argue worse than what you could have in a notebook because at least I can like like clearly understand in that environment where things are going what the lineage is between things Etc whereas in previous architect CES that I’ve seen like you’re using multiple different tools you’ve got to know like there’s this set of jobs over here and then there’s this thing over here and then you’re doing this tool here and your deployment process how good is that just because you have one doesn’t mean that it’s like oh it’s not deploying all

37:13 that it’s like oh it’s not deploying all the changes to all the environments you have to do this one and then that one and like how complex is that solution when all all we want at the end of the day is to refresh some data right so I I think there ‘s a balance here I think he makes some good points I think some of the the challenges we’ve had around notebooks and ensuring we have proper cicd or where can we put some of these test cases in or how do you deliver a repeatable product every time

37:44 deliver a repeatable product every time valid but I’m Al it also speaks of like structure structure structure and it’s like yeah that’s not going to save you with bad bad development and totally and I’m going to give some reasons I’m going to give some data flows a little more love here because if if Daniel’s not going to give the solution here I think let’s talk about the world we live in in fabric right now we have pipelines notebooks or data flows those are really the options to integrate data in and I don’t think I’m missing any any major one there but those are the really the

38:14 one there but those are the really the three options we’re going to play we have to play in the sandbox there are still five articles on the Microsoft fabric about notebooks that’s it two of them have to do with vs code the other has to do with net and C devel in notebooks for an API that’s all while with data flows gen two and gen one we still have a entire section I believe over 15 articles devoted to data flow development both on the skill and process and governance around data flows

38:44 process and governance around data flows those have been available for a while and they’ve continued to update those so we’re still lacking with documentation and a process here and then two not again not saying notebooks are to Daniel’s point the devil or or or something to be to be frowned upon but if I’m going to the fabric development and I’m bringing in a non skilled or technical team like a marketing team right I’m going to start with the data flow because I’m imagining they don’t have it’s not a big data system or it’s this incredible influx

39:17 system or it’s this incredible influx I can teach them even the the idea of data engineering with a data flow gen one I don’t even have to do gen two on just getting data in storing it and how this process works it can upgrade to a notebook but I think to this point and again from someone who’s really enjoying notebooks it’s not always the best case for every situation we solve pipelines and they’re pretty they’re pretty filled out in terms of what they’re available in fabric where I

39:47 they’re available in fabric where I don’t think they were lacking too many features and activities in pipelines data flows Gen 2 for all the things we’ve talked about from beginner to Mid skill level can do a ton of things from the advanced point of view notebooks are awesome but again you are already coming with the prerequisite that you have the skill and yes you can learn and I totally get that we have co-pilot we have ai that can do a ton of things but

40:17 have ai that can do a ton of things but again I can think what the end goal is for that team or or that user what the end goal are they going to be working in notebooks for 40% of their time 50% of their time in comparison to just trying to get data in while there’s still two other products data flows and pipelines with the user interface that can do a ton now we can get into the cost we can get into the I think the process of fabric and I think it’s a very valid argument but leaning towards where

40:47 argument but leaning towards where Daniel’s going here at least in our playground we still have two very good suitable products for people who may not have that experience or that work in that environment before I’m going back to like my question here is in this space right so let’s just take the lens here right we know there’s a motion that Microsoft is making here is that data should be engineered moved transformed shaped by anyone right that’s that’s the experience here this is something we can all agree upon this is the direction the market is going

41:19 is the direction the market is going it’s going to be going this way more and more I also think again my very big piter brain mind here zooms out to this picture and say look the more we get apis the more data becomes digitized the more the world keeps making more data we’re going to need different experiences to then engineer and shape that data I’m not going to go to a SQL database and try and run whisper against a SQ database with a bunch of video files in it I’m just not like it’s just it’s not what I’m going to do right I

41:49 it’s not what I’m going to do right If this gentle if Daniel’s talking mean if this gentle if Daniel’s talking about data Engineering in the traditional sense and maybe he’s thinking like land everything in a SQL data data warehouse and then use store procedures and then check them in and all the things totally understand you’re writing code get it but like if you take that mindset and put it on top of okay we need to adopt a new storage system I think the challenge for me here is SQL hadn’t in the past and this is until fabric showed up SQL hadn’t in the past had the ability to talk to a

42:19 the past had the ability to talk to a lake house so we had to like reformulate what these other things can do and so now we’re we’re Shifting the mindset here we’re taking this storage and the compute thing were splitting it apart and I think what’s happening right now is we’re in a place where the new patterns and or standards not have not necessarily developed there’s a couple companies I think leading the helm here I think data bricks is setting the pattern for what a medallion architecture is Microsoft was initially not calling it Medallion architecture Microsoft was not calling it bronze silver gold they were trying

42:50 it bronze silver gold they were trying to use a different term they were trying to use some other language here initially when this rolled out and what we have found is they falling back into the standard Medallion like architecture and I think a lot of people are not actually studying what this means it’s the same things you were doing previously with store procedures and staging and production tables it may be a little bit more enhanced now but I think with a little bit of education and reading and and classing or studing studying around this one I

43:20 or studing studying around this one I think this is going to be a an industry standard where you’re going to use notebooks but as the as the or organization choose on this as experts unpack notebooks you’re eventually going to get to a point where here are some good patterns you should follow and that becomes now the standard so I think we’re in this growth stage right now where I see a lot of people trying to get a lot of new notebooks there’s a lot of new comput showing up and we haven’t quite dialed in what are the best practices using this new this new world so I I my point I agree from a

43:52 world so I I my point I agree from a data perspective right I think in directionally and fabric supports this method like the data mesh methodology right like does does standards and a

44:03 right like does does standards and a singular process and things coming through the same systems make sense from a central data perspective sure yeah but that that’s not what that’s not the direction that’s being pushed right like it is much more to your point can you create an AR a data architecture that is a data mesh where you have access to the apis who owns those different data artifacts are we build like are we building key infrastructure of data Engineering in all areas of business then or are we

44:34 all areas of business then or are we just saying ownership of that data belongs to somebody and and regardless of how we got there it’s the data set that’s available to the rest of the organization that’s important well I I would argue in today’s day and age like the vast majority of companies are going to be like literally could care lot you’re responsible for delivering a key product how you get there I don’t care right right but it’s not like every every system or

45:07 but it’s not like every every system or every I would I would argue the vast majority of companies don’t do that right A A lot of it is centralized a lot of it is in one or two places or should or could be Consolidated for delivery and data access right so the the data point is an interesting one and I know we’re getting your time but I’ll conclude with some thoughts here because I gave a face Seth because

45:37 here because I gave a face Seth because I was going back through my own experience where they don’t care how you get there as long as you get there well let’s say I’m on a business intelligence team and I’m the only one doing notebooks well then that’s the same situation where I was the only one doing power apps and when I decided to jump ship there was a lot of training to teach people power apps and because the I was the only one with that skill and I did initially go road with learning power apps I thought it was a great solution but again this the knowledge was in one

46:08 again this the knowledge was in one particular person and maybe it’s not the best analogy because I think obviously notebooks are much more Universal but I think there’s a really good point here with as we really dive into fabric some questions to ask ourselves for organizations and governance is notebooks a self-service product that we’re delivering to the business or is it more managed is this part of some skill or Center of Excellence that we’re now introducing notebooks into these are questions that I’m really beginning to ask and as more people and more

46:39 ask and as more people and more organizations begin to adopt fabric these are questions that are going to become up now I don’t necessarily know just like a lot of things in our world it depends and I think there’s not going to be a oniz fits all but I think that also Mike and I’m going to give the notebooks the last bit of love here where this should absolutely be something that you’re pushing the and from The Notebook experience if it’s available if you have the right resources on a team to make this a I don’t say a priority but something that

47:10 don’t say a priority but something that we want we’re going to encourage and really push an organization to do if the resources both from a skill and people are there and I think there’s also a a note here as well right Microsoft is incentivizing us to move to notebooks because we’ve already paid for the compute and honestly if we look at like the comparison between data flows and like pipelines data flows and notebooks those are the ways I’m getting data in and I’m bringing it to the lake those those are the three main pipeline pieces there’s going to be more

47:40 pipeline pieces there’s going to be more of them in the future there’s going to be more areas where I can then leverage and shape and move data around that’s that’s going to be a thing but that being said if I look at those tools I want to focus that’s the data engineering team and I think if you put again to this article here I think where we have a little a little bit of an out here with notebooks is notebooks has the ability to be a regular part of your data Engineering Process you just can’t ignore software engineering practices you can’t ignore check-in checkout you can’t ignore good quality data

48:11 can’t ignore good quality data quality checking on top of your information and if you incorporate that into notebooks I’ve had notebooks running for years the same notebook just clipping along doing its thing building stuff and you you learn how to build a a robust system and I’m going to go back to we’re going to end here so just can wrap up things here I think my final thought here is really around measuring your value of what the notebook does to the cost and of either training your people or bringing in other tooling to do the same experience

48:42 other tooling to do the same experience and I think right now the way I see it notebooks add a lot of value but you do need do need some as a business you need some specific thinking about how am I going to use a notebook what do it happens when it’s going to fail what happens when new data comes in so as long as you can really incorporate that data Engineering Process to the business at the end of the day if I have a master product table that Master product table better be update and current every day no matter what if we can do that with notebooks great if that’s less costly

49:13 notebooks great if that’s less costly than running a a power query data flow or it’s work it’s used in conjunction with a pipeline great that’s that’s what it should be doing so I think you find what works for your business and what skills your team has I I definitely am a proponent of if you’re already writing SQL scripts notebooks won’t be too far away from you writing other code just in python or py spark it’s not too far away and I know we now have tsql notebooks so if you really love writing SQL you can still write that there anyways there’s

49:45 still write that there anyways there’s options yeah I think my my my final thought is in all actuality as I’m coming around on this I think Daniel is part having the same conversation we do which is just a lot around how do we standardize our environment yeah the the benefit of notebooks is the the analysis is so much closer to the implementation and in data like we’re we’re constantly interrogating data and figuring out what the solution is and none of like whether you choose a notebook or other systems or

50:16 notebook or other systems or whatever we have at our disposal good data architecture and strategy is all already and should remain important software engine ing principles should be kept in mind but that’s a team implementation and an org adopted thing that they’re accepting versus just saying that people who use notebooks can’t follow those same good practices practices albeit like there there are organizations that will will say yeah that’s Tech debt that’s fine give me the

50:47 that’s Tech debt that’s fine give me the give me the value upfront and then it’s the the team’s responsibility to build the repeatability the security the things in the back behind the scenes but I think where we’re at we can produce value to the business so much faster and the business always wants that so it’s it’s the trade-off between things we know we need to do versus the value that the business sees awesome tell me any final thoughts I already said them we’re good okay we’re good so thank you very much for

51:17 good so thank you very much for listening we know this is an interesting topic thank you for your ears we know you could be doing a lot of other things like reading a book reading about data science actually doing real work for at your business or your company so we appreciate you spending an hour with us we hope some of this was gleaning some insight to you I don’t think notebooks is going away I think notebooks is here to stay I don’t I agree with the article we probably need more rigor around teaching people better patterns in notebooks but I don’t think that I don’t think it’s a reason to shy away from notebooks at all I think Microsoft is giving us a great ecosystem that will be super useful

51:49 ecosystem that will be super useful with notebooks being a very pivotal part of that ecosystem that being said we really don’t advertise this podcast all we would love it if you would just share this with somebody else if you like this topic or found this interesting please share it on social media let somebody else know you found this topic interesting we’d love to have more people in the discussion thank you chat for jumping in and making lots of comments we also appreciate your comments as well been trying to respond to those and interact as well so thank you all very much we appreciate it Tommy where else can you find the podcast you can find us on Apple Spotify wherever you your podcast make sure to subscribe

52:20 you your podcast make sure to subscribe and leave a rating it helps out a ton do you have a question an idea or a topic that you want us to talk about a future episode head over to power bi. podcast leave your name and a great question and finally join us live every Tuesday and Thursday a. m. Central and join the conversation all powerb tips social media channels oh he had that in one breath oh I almost I almost tripped I almost tripped almost stumbled on the way out thank you all so much we appreciate you and we’ll see you next time time [Music]

The Rise of the Notebook Engineer – Ep. 381

News & Announcements

Episode Transcript

Episode Transcript

More Posts

AI-Assisted TMDL Workflow & Hot Reload – Ep. 507

Filter Overload – Ep. 506

Excel vs. Field Parameters – Ep. 505