Hey all. This is just an idea. Quite a way from implementation.
On old uni lecturer of mine has given me his “source code comparison” system.
This can compare millions of lines of source code and find copied bits really quickly. Even if variables/functions are renamed or moved around.
This was used to compare different Linux/UNIX versions and find borrowed bits of code. Written in c it’s top knotch and runs super fast.
Anywho it can be adapted to support html/php/js and I had the idea of comparing themes/plugins here on the market. Mostly for reviewers to help them find duplicate submissions really easily.
It would ignore common things like jquery, while flagging less common blocks of code that come from a different author account.
Well just an idea. What do people think? Any cool ideas?
It would probably be nice to be run during the review process so no themes which have borrowed code could be uploaded BUT there is one big flaw… there are millions of free code snippets / scripts out there which authors use so this would need to be able to stop these showing (popular scripts like sliders / js scripts etc).
Yep that would be an interesting one to tackle! Some sort of threshold whereby a certain percentage of a theme has to match another theme, or ignore a snippet if it appears in a large number of themes (e.g. prettyPhoto js) across multiple authors.
I have to do that a lot, I have ideas in my head and have to bash my head in to stop thinking about working on it… but some do come to fruition but most get scrapped
There are a lot of obstacles to overcome but it would be helpful for Envato to be able to eliminate the copies at the review stage.
Interesting idea.
Question: How is code compared ? Is it pulled from database , or files are directly scanned every time ?
Maybe this can be a community project?
Where we setup a site and you just give the program two different theme links to compare?
It tokenizes all the code first into smaller cached files, and then compares those using some fancy logic. No database. Although you would need one to keep track of all these raw files and provide a nice web interface to the c backend.
Maybe it would be a nice community project to start with
It’s a very good idea. I am just curious if how it’ll behave for themes built on underscore framework? I’m sure those themes have very the same code base and I’m afraid the system will detect each theme built on underscore very similar to others which is built on the same framework.
I believe that is irrelevant. Because many themes using jQuery libraries, or open source scripts at famous parallax https://github.com/IanLunn/jQuery-Parallax and other example is Slightly Modded Options Framework and many many other scripts from Github. And big companies in their web sites using such scripts. I do not think it’s something bad.
And some authors even specify this in credits. Envato terms specified clearly not permitted tutorials in themes but does not prohibit the development premium products with free open source scripts. So this I understand from terms.
Let’s say that : somehow we eliminate verification these open source scripts. And your idea could be easily fooled. Just rename classes, prefixes and even code reorganization and reinterpretation. Two years ago my university had anti plagiarism software for html and c# comparing it in intern library with the other years student works, and tutorials from search engines. I want to tell you that they found thousands of ways you could fool. At one point competition was made who gave the lowest percentage of plagiarism with works plagiarized.
I think it’s all about originality of concept, design, simplicity of use, user experience and user interaction .
This is my opinion, your idea is good, but it’s impossible implemented correctly.
Hmm might be missing the point here. Its main goal is to find the 1 to 1 copies that tend to slip through the cracks (ie some colors and logo is changed). Not to block someone because they use a free framework or library