Automatic comparison of themes/templates/plugins to find duplicates

Hey all. This is just an idea. Quite a way from implementation.

On old uni lecturer of mine has given me his “source code comparison” system.

This can compare millions of lines of source code and find copied bits really quickly. Even if variables/functions are renamed or moved around.

This was used to compare different Linux/UNIX versions and find borrowed bits of code. Written in c it’s top knotch and runs super fast.

Anywho it can be adapted to support html/php/js and I had the idea of comparing themes/plugins here on the market. Mostly for reviewers to help them find duplicate submissions really easily.

It would ignore common things like jquery, while flagging less common blocks of code that come from a different author account.

Well just an idea. What do people think? Any cool ideas?

1 Like

It would probably be nice to be run during the review process so no themes which have borrowed code could be uploaded BUT there is one big flaw… there are millions of free code snippets / scripts out there which authors use so this would need to be able to stop these showing (popular scripts like sliders / js scripts etc).

2 Likes

Yep that would be an interesting one to tackle! Some sort of threshold whereby a certain percentage of a theme has to match another theme, or ignore a snippet if it appears in a large number of themes (e.g. prettyPhoto js) across multiple authors.

eh just getting an idea out of my head

I have to do that a lot, I have ideas in my head and have to bash my head in to stop thinking about working on it… but some do come to fruition but most get scrapped :smile:

There are a lot of obstacles to overcome but it would be helpful for Envato to be able to eliminate the copies at the review stage.

Interesting idea.
Question: How is code compared ? Is it pulled from database , or files are directly scanned every time ?
Maybe this can be a community project?
Where we setup a site and you just give the program two different theme links to compare?

It tokenizes all the code first into smaller cached files, and then compares those using some fancy logic. No database. Although you would need one to keep track of all these raw files and provide a nice web interface to the c backend.

Maybe it would be a nice community project to start with :slight_smile:

It’s a very good idea. I am just curious if how it’ll behave for themes built on underscore framework? I’m sure those themes have very the same code base and I’m afraid the system will detect each theme built on underscore very similar to others which is built on the same framework.

1 Like

yep, php is open sourse code (mostly), so it’s hard to tell someone is a thief :smile:

1 Like

I believe that is irrelevant. Because many themes using jQuery libraries, or open source scripts at famous parallax https://github.com/IanLunn/jQuery-Parallax and other example is Slightly Modded Options Framework and many many other scripts from Github. And big companies in their web sites using such scripts. I do not think it’s something bad.

And some authors even specify this in credits. Envato terms specified clearly not permitted tutorials in themes but does not prohibit the development premium products with free open source scripts. So this I understand from terms.

Let’s say that : somehow we eliminate verification these open source scripts. And your idea could be easily fooled. Just rename classes, prefixes and even code reorganization and reinterpretation. Two years ago my university had anti plagiarism software for html and c# comparing it in intern library with the other years student works, and tutorials from search engines. I want to tell you that they found thousands of ways you could fool. :smile: :sunglasses: At one point competition was made who gave the lowest percentage of plagiarism with works plagiarized.

I think it’s all about originality of concept, design, simplicity of use, user experience and user interaction .

This is my opinion, your idea is good, but it’s impossible implemented correctly.

1 Like

Looks you’re complicating open source doctrines :frowning: though your idea is valid :slight_smile: better to plan a site like envatoleaks (wikileaks) for TF and CC :stuck_out_tongue:

We need to draw a clear line between inspiration and copying

2 Likes

image

This has been said before by Picasso :smile:

its a good idea if the script didn’t say the code is copied from wordpress.org or getbootstrap.com :stuck_out_tongue:

1 Like

If I’m not mistaken , Javascript code that is used in themes on envato is not GPL,

@dSKY Javascript is GPL / MIT .
Like jquery : https://jquery.org/license/

You are free to use any jQuery Foundation project in any other project (even commercial projects) as long as the copyright header is left intact.

Hmm might be missing the point here. Its main goal is to find the 1 to 1 copies that tend to slip through the cracks (ie some colors and logo is changed). Not to block someone because they use a free framework or library

1 Like