Anyway it is complicated to answer the OPs question
And you need to do a lot of quite complicated stuff to get it to run fast (use the stack to copy, preshift your graphics, lots of unrolling).
Joefish's 128K scroller linked to earlier has the source available and I would have a look at that (that was when I gave up on my own scroller since I saw Joefish had basically already done what I was trying to do anyway, and there was some gnarly bug trying to make my tiles 48x16 instead of 16x16 which I never got around to fixing). His code is better than mine
(although it was my first attempt at scrolling).
I also borrowed part of my line copy routine from another poster on WoS which uses each row of the preshifted graphics offset by 256 bytes for each row to avoid having to add anything to HL when moving down a row, you can just INC H instead, which was a good idea. I can't remember if Joefish did that in his code or not. That made it harder to debug though I seem to remember lol
You don't end up wasting space you can put other graphics offset by the width (so my first preshifted tile is at someaddress, next line down preshifted to someaddress+256, etc. and the next tile is at someaddress + 16, etc.)
Or you can use an offscreen buffer which is easy but I expect you won't be able to run faster than 16.7 fps then (which isn't that bad tbh, especially if it is your first attempt at a scroller).
EDIT: Ghosts N' Goblins does not preshift it just rotates the tiles by the scroll delta amount each frame IIRC. The tiles repeat every 4 cells though I think which is quite restricting (but you can draw them at any width on the screen via a load of unrolled code - one routine to draw 30 cell width repeated tile, another to draw 29 cell width, and so on.