You seem very set on this goal, so I won’t try to dissuade you any further. In my opinion, it’s not a feasible project except in the short term. Nothing is impossible though.
I wouldn’t expect to see something like this on a stock marketplace for any price point. If you ever do, I would urge extreme caution, as you will require frequent updates and probably support as well. New scripts will be extremely buggy, because the number of edge cases when crawling across the web (particularly for search engine indexing purposes) is staggering.
Let me explain some potential hurdles.
First, the reason I say PHP is a horrible language choice for this kind of crawling is that it can only do one thing at a time. There are ways around this, but it’s still not great. Something like Node.js is simple to program with and can crawl hundreds of pages simultaneously, but this is one of the least efficient solutions.
You should know that many websites require a web browser to view correctly. Google has switched to crawling using something like Puppeteer (basically a Chromium browser) to be able to view client-side websites. It is possible for a script to do this, but you’ll exhaust your server’s resources quickly as you begin to scale up.
Additionally, you will quickly run into issues with your user agent. Cloudflare may start to block you. You will need to apply with them as a trusted agent, which is a fairly simple process, but you’ll need to get off the ground and build reputation first.
A database for this kind of project can get extremely demanding. It’s not as simple as storing a copy of each page’s text. Indices must be built with fragments of the text, this data adds up quickly, and can require significant compute power to both insert and query.
If your search engine ever does maintain a large number of pages, it will most likely be crawling 24/7. Remember that you need to periodically fetch existing pages in your index as well. This means constant inserts and updates to the database.
The problem with constant “upserts” into the database is that they often block queries. This can get quite serious, quite quickly, and this is a fundamental reason that search engines don’t update their user-facing databases in real-time. In the beginning, a strategy may be to maintain 2-3 databases, one for reads, one for writes, and one to sync and buffer between them nightly.
If I imagine something like this being in a stock script, I wouldn’t expect it to scale very far. They would most likely not account for these hurdles, which you can face fairly early on. It may be worthwhile consulting with some freelances, but oversight will be required to make sure they build something that can scale well, depending on how much you intend to grow.
In other words, the platform you require is ultimately going to depend on how big you want to be able to scale. I’m not aware of any scripts on CodeCanyon that fit this at any scale, but regardless of what you find, you should analyze it carefully with your future in mind.