Wednesday, September 9, 2015

Why copycats don't work

Over the last 12 month here in Miami we've seen a lot of tech start-ups action. New companies pop-up every week. New entrepreneurial programs and organizations are established. New incubators and accelerators come up.

However a lot of the ideas we are presented with are basically copycats of already successful companies. Things like "The Uber of XYZ service" or "A better, faster and more niche E-Bay" or "A more niche social media network" or "A more secure and we-don't-sell-your-e-mail-addresses-and-phone-numbers social media network" or a "Better Yelp combined with a MLM system" or "A safer Tinder" are crossing our desks every month.

While we admire the initiatives, the truth of a matter is these kinds of businesses will be difficult to grow. Here are some of the reasons for that:

1. They face fierce competition from bigger and better companies

It's is extremely difficult to divert current users of Facebook from Facebook just because you are telling them that you are not selling their data. People like Facebook and they got used with it. Despite security concerns, some defects and some pessimistic projections of naysayers Facebook's audience keeps growing and it will keep growing for a while.

Same thing with E-Bay and Amazon. Tough job to make people use an alternate platform just because it's cheaper or more niche. To convince them you need to throw in a tone of freebies, much more business value and- the most important thing- show credibility that you can scale, you won't crash at a certain threshold, you will do a better customer service job than them long term. If you do that, you will become an acquisition target anyways- which is not bad at all, if you can by all means you should sell.

And the idea of combining a Yelp type of service for fewer categories, operated by a new/unknown company that *combines* the subscription model with a MLM type of system is straight stupid.

2. They usually do not address the real pain or the actual need in the market place

See the real pain of Facebook's users is not that Facebook sells their data. I mean for some people that is important. But for most of them it's not because they don't look at their personal data as super valuable assets- which is why they keep signing up to Facebook despite all the privacy lawsuits and stuff.

Facebook's users would potentially benefit from more fun features, more engagement, an easier way to connect with more targets audiences, an easier way to to find lost friends (like me now, I am looking for an old friend I have not talked to in 17 years and I can't find him on Facebook, I wish Facebook had to way to find him for me by interfacing with other online services where he maybe a subscriber), more personalized stuff etc. So that's were would be a lot of added value for this type of business.

It is very important to identify the real issue / the real opportunity in the marketplace. Founders of copycats usually fail to do that.

3. They are generally not well executed as businesses

Execution is everything in business. Or almost everything. It's 99% of your business success. I'd say the idea and funds availability are under 1% in importance. How you run your business, how you sell, what you sell, to whom you sell, how well done your product is, how much word of mouth referrals you get and how much people like you and your company and perceive you as somebody who delivers a lot of value into their lives, all those are key elements.

The reality is that most of the founders do not execute well in business. They cut corners on product quality, they lack sales skills, they do not have the right relationships / re-sellers / distribution channels and are more likely prone not to be able to build critical mass.

4. They are hard to fund

Because some of these reasons (and potentially others) copycats are very hard to fund. Investors are very knowledgeable (many times they are entrepreneurs themselves) and they spot a copycat very quick. If you have a copycat and really believe in it, I'd say fund it 100% with your own money because it will be a hell of a job to raise them from investors.

5. They do not deliver true innovation

In the tech world innovation is key. Come up with something new and useful and you have a pretty strong play. But a copycat does not have the "meat" of the true innovation that was initially in place. Hence it will never trigger the respect and admiration of the initial true innovation.

And, of course, we are always presented with the counter example of ... Google. Google was founded at a time when five other major search engines existed and was still very successful. However people who say that tend to overlook that, back in the 1998-1999, Google was still a very early player in a relatively new and unexplored market (the market of search engines), that the Founders were data scientists who did a stellar job in categorizing "world's information"- a much better job than everybody else at the time and since then and that the company did introduced and very well implemented a multitude of innovative concepts such as the concept of page ranking. 

Back to Google the company was founded in Silicon Valley in a very unique and favorable moment in tech history- a moment that we will less likely see in our lifetime. Business wise Google got a little lucky too (as lucky as you can get with a lot of hard work and a good strategy) when their paid ads / Google Ad Words took off big time despite concerns of spam, security and invasion of privacy. Almost two decades after its inception Google (who employs some of the best engineers in the world) is still trying hard to innovate and to bring new products / ideas to the market place, most of which still fail (I don't know how many people realize how many products and services from Google's current portfolio are actually profitable- while everybody knows the giant success of Ad Words).

So I would advise for caution when comparing a lot of these new start-ups with Google.

I would stay innovative, humble and continue to try hard. Make less buzz and do more work.

Have a great day ahead!

Adrian Corbuleanu
Miami Beach, FL

Monday, July 6, 2015

On big data and data science

These days there is a lot of talk in the Miami start-ups and mid size organizations communities about big data. Usually looked at as something "new" and "cool" that you can do with your data, people ask you generic questions such as "Do you guys do big data?" or they even venture to state it as follows "Oh, I see, you guys do big data!".

I would like to clarify it right here: we currently do not do big data. While all our average projects collect and process significant amounts of data and while we do implement BI and Analytics, it would be a stretch to pretend that we are currently doing big data or data science.


Fig. 1 Big Data is a lot of Data!

I am writing this article to educate the public on what big data actually is, how it gets collected and processed, what kind of resources and technologies you would need to implement for big data, what kind of projects / initiatives is big data for and what kind of budgets is big data for. I will also touch on aspects of data science as many times data science goes hand in hand with big data.

This article does not want to be comprehensive however, if you are really interested in big data, it will give you a start point.

According to Wikipedia "Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy."

Big data applications collect and process exceptionally large amounts of data, i.e. hundreds of terrabytes, petabytes or even exabytes of data of great varieties (not just alphanumeric data or binary pictures). These data are generated at great velocities and they have no guaranteed quality. Data sets are generally not easily link-able or immediately related to each other.

Examples of applications that involve big data are the ones under work in different Government Agencies (including the ones under the Obama's Administration Big Data Initiative), applications implemented by large consumer product or social media companies (where they are trying to understand consumer's behavior to increase sales), by car manufacturers (who collect a lot of technical data on their deployed fleets of vehicles in order to predict a component failure and replace it right on time), by national insurance companies, by telcos or by professional sports teams and franchises. Companies like EBay, Google, Facebook, IBM and SAP have the budgets and practical reasons to implement big data. GM does big data. And of course the Government does big data.

An example of an unusual or unstructured piece of datum would be a "like" triggered by a certain post on social media or a shortened url. How do you collect, store and analyze that in conjunction with other likes of the same (or of a different person) on the same (or on a different) type of post. Remember posts can be of any kind: pictures, videos, texts, binaries, urls etc.

The objective of the systems handling these data is generally to collect and process them, to correlate them and find patterns in these large amounts of unstructured data and to design predictive algorithms or software that, with a certain amount of accuracy, can predict the behavior of certain systems or processes.


For example @ CERN (the place where the web was born and which is actually a nuclear physics lab that hosts the world's most powerful particle accelerator) scientists run experiments that are trying to explain some of the secrets of our universe. Their big data playground looks like this: about 60,000 processors distributed in thousands of computers in more than 100 data centers across the Globe. They collect and process some 30 petabytes of data.

Fig. 2 Supercomputer
Weather it's a supercomputer like Pleiades or a massively parallel / distributed / networked system you will need very serious infrastructure to handle big data. Your average shared (or even private) cloud host will more likely not be able to handle big data projects. The reason is they are generally designed to handle general purpose business applications and lack the processing horsepower, memory and storage capacity to handle these kinds of projects.

Processing big data and coming up with relevant / useful results also involves specialized technologies. They sound like these: Hadoop (used @ Yahoo), MapReduced, Hive, PIG and MongoDB. Predictive analytics are written in packages like MatLab, Mathematica, RapidMiner or Pervasive. Machine learning mechanisms are written in things like R, dlib or ELKI. Your average MySQL database with some back-end php code will not be able to successfully handle big data. While you can implement some traditional scripting on the data collection & data communication side of things, you will need specialized tools and complicated formulas to dig into the data you collected.

On the business side, last year we had an interesting discussion with the CTO of the Dolphins at the Dolphin's Stadium. As they are currently running a Big Data initiative to collect and dig into data collected from fans to increase fans experience on the Stadium and to increase tickets sales, they were struggling with a very simple notion. As follows: "The big question is ... what question to ask the system?". And that's obviously what question(s) to ask and how to design this system in order to actually see a benefit in increased sales. So before collecting large amounts of data and designing, think about your questions and your objectives. Or maybe it will come out ... It will jump at you ... as you start digging into those quadrillions of bits and pieces .,.

Now, a few words about data science as it does relate to big data and it is many times used in conjunction with big data. Here we go, back to a simple definition, I quote from Wikipedia "Data Science is the extraction of knowledge from large volumes of data that are structured or unstructured which is a continuation of the field data mining and predictive analytics [...]."

So what are some of the things your programmers have to be competent in to say they are data scientists? Here we go, just a few of key concepts.

To do data science you have to know statistics. Concepts to start with are as follows:

- parameters estimations

- confidence intervals
- p-value

I will not bother you with long definitions but, just an example, "confidence interval (CI) is a type of interval estimate of a population parameter. It is an observed interval (i.e., it is calculated from the observations), in principle different from sample to sample, that frequently includes the value of an unobservable parameter of interest if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level or confidence coefficient." [quote from Wikipedia]

Fig. 3 Graphical representation of p-value. Authors Repapetilto @ Wikipedia and Chen-Pan Liao @ Wikipedia

To do data science you have to be very familiar with cost functions:


- log-loss
- DCG
- NDCG

Here is a fairly intuitive (even if not that straight forward) explanation of log-loss or logarithmic loss according to Data Scientist Alice Zhen. "Log-loss measures the accuracy of a classifier. It is used when the model outputs a probability for each class, rather than just the most likely class.  Log-loss is a “soft” measurement of accuracy that incorporates the idea of probabilistic confidence. It is intimately tied to information theory: log-loss is the cross entropy between the distribution of the true labels and the predictions. Intuitively speaking, entropy measures the unpredictability of something. Cross entropy incorporates the entropy of the true distribution plus the extra unpredictability when one assumes a different distribution than the true distribution. So log-loss is an information-theoretic measure to gauge the “extra noise” that comes from using a predictor as opposed to the true labels. By minimizing the cross entropy, one maximizes the accuracy of the classifier."


And another explanation by Software Engineer Artem Onuchin:


"Log-loss can be useful when your goal is not only say if an object belongs to class A or class B, but provide its probability (say object belong to class A with probability 30%). Good example of case where log-loss can be useful is predicting CTR or click probability in on-line advertising: Google uses log loss as CTR prediction metric." 

Let's say you are collecting data on millions of posted pictures and you want to write an image classifier that can make the difference between a banana and a ... boat for example. We know that pictures are nothing but matrices of pixels with different colors and levels of illumination but we also need to implement the formulas to classify these objects. 

To do data science you also have to be competent in machine learning and be able to understand concepts like:

- classification
- regression
- ranking
- overfitting
- convex optimization
- trees

"Overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been over fit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data." [quote from Wikipedia]


To do data science you have to conversant in some of the following technologies and tools:


- R
- Python
- Mathematica
- Weka
- Kaggle

R is a specialized programming language and environment used in statistic and data mining. It is a fairly straight forward open source command line language which provides powerful statistical functions that do not exist in other programming languages but it's a "different kind of language" that the ones your average programmer is used with.


Fig. 4 Data Types in R by Assoc. Prof. Roger Peng from John Hopkins School of Public Health

To do data science you also have to have a good understanding of complexities of algorithms, things like:

- eigen vectors
- singular values
- PCA
- LDA
- Gibbs Sampling
- Bottlenecks

Complexity of algorithms is crucial when crunching tones of data and trying to find the ... un-find-able. No matter how many hardware resources you will have in hand (and hardware resources are never infinite nor cheap at this level) the problems that you will have to solve will often raise you up exponentially. And it's never good to have exponential algorithms running.



And, somehow, with all this science you also have to have a certain feeling on the expected behavior of the user, reasonable ranges, top-level engagement etc. Yes, data science is a science and an art at the same time.

So, please next time when you think big data, think in terms of exabytes of data collected and processed, sophisticated machine learning mechanisms, thousands of servers and storage, large Corporations or the Government. And think some of the most brilliant minds on Planet Earth writing predictive analysis code and millions of dollars in research and development budgets. [and, yes, maybe us one day too but not now :)]

For everything else that's "new" and "cool" please check out our website http://wittywebnow.com.

Make it a great day!

Adrian Corbuleanu
Miami, FL
http://wittywebnow.com

Note: To document for this blog I used online resources from the following sites. I thank them for making these info available.

1. http://wikipedia.com
2. http://quora.com
3. http://linkedin.com

Monday, June 22, 2015

New and old requirements on shopping carts and e-commerce solutions

As we have recently released a couple of shopping carts and e-commerce solutions, we have had a chance to validate some of the newer and older requirements when designing these kinds of solutions.

First off we will say that, while we occasionally design mid size marketplaces, most of our shopping carts address the needs of small businesses hence these are shopping carts with less than 50 products. We implement shopping carts in a variety of technologies ranging from Ruby on Rails gems to wooCommerce plug-ins for WordPress.

1. PCI compliance

The needs for PCI compliance are not new. The credit card processing industry had been implementing software security, specific procedures and standards for years. The bigger your company is and the higher the # of transactions you perform, the stricter the requirements are.

What is new is that we get requests from start-ups and other entities (such as educational entities) to provide PCI compliant solutions. That's usually a challenge because the budgets of these kinds of entities generally do not allow for detailed audits and specific procedures. Making a cart PCI compliant does not only involved packing their cart with a SSL certificate or passing the buck to a PCI compliant gateway (even if these things are steps forward to achieving PCI compliance). There are also specific training, internal procedures and manually procedures involved.

2. Products dashboards

Even smaller companies nowadays need the flexibility to add/change/delete products from their stores, edit prices and descriptions, update pictures etc. They also have to have the ability to do that using non-technical staff so they asked us to create product dashboards and management consoles.

3. Custom carts

From pictures to pricing, from layouts to integration with payment processors all of our carts are quite custom designed.

Our clients are asking for that rather than going with more generic (and often miss labeled as more secure or with a higher performance solutions such as Magento, Volusion or Big Commerce). We like that way too. There is no need to overload a small business with bells and whistles when they don't need them and actually can't afford them. Plus the effort spent on configuring a pre-packaged e-commerce solution is often times bigger than integrating a well done open source cart.

4. SEO friendliness

In the era of $25-$30/Google PPC more and more companies come back to the roots of SEO and organic search. They feel that investing long term on that web based real estate where they put their cart on is more important than driving immediate results by paying nightly prices in beach side hotels.

Hence most of the requirements we had for shopping carts lately involved designing the cart with SEO friendliness in mind. And that's everything, from how we call the pages, to how we put the copy of the text on the page and to how we name the pictures of the products. Meta tags too, even if they are lately less important than they used to be.

5. Pictures quality

Pictures quality is more important than ever. Unless you have a professional / superb quality / good size picture of your product, do not even think to put it up. In an era of minimalist websites and responsiveness it is a challenge to produce beautiful pictures that also load fast and scale well on all kinds of devices including laptops, desktops and mobile devices.

6. New industries / verticals

Interestingly enough, we have witnessed some new industries getting into shopping carts and e-commerce solutions. One of them is the education sector with quite a few schools implementing carts for parents to order uniforms or even contribute to school budget with direct donations.

As these kinds of solutions were traditionally adopted more by retailers, wholesalers or companies implementing market places for their own variety of products it is something new that we salute and appreciate.

Make it a great day!

Adrian Corbuleanu
Miami Beach, FL
http://wittywebnow.com