Omgili blog

September 8, 2009

Omgili Opinions Firefox Extension

Filed under: News, Omgili, opinions — ran @ 3:56 pm
I am happy to let you know about a new Firefox extension we released. Omgili Opinions is a very unobtrusive addition to your Firefox browser. It will look for discussions and opinions about the current page you are browsing and if it found them, it will show you a small button near the address bar ( ). To view those discussions, simply click on that button.
If you want to find deeper conversations & insights about a product you found on Amazon, if you want to see what people think about a certain movie on IMDB or a video clip on Youtube, and the comments aren’t very insightful – let Omgili find the most interesting discussions and communities about any page you are browsing.

I am happy to let you know about a new Firefox extension we released. Omgili Opinions is a very unobtrusive addition to your Firefox browser. It will look for discussions and opinions about the current page you are browsing and if it found them, it will show you a small button near the address bar. To view those discussions, simply click on that button.

If you want to find deeper conversations & insights about a product you found on Amazon, if you want to see what people think about a certain movie on IMDB or a video clip on Youtube, and the comments aren’t very insightful – let Omgili find the most interesting discussions and communities about any page you are browsing.

So check it out: http://opinions.omgili.com/

Here are some screenshots:

As usual we welcome your feedback….

Thank you,

Ran Geva

Omgili CEO

November 2, 2008

So “Who will be the next president?”

Filed under: Buzz, Elections, US Presidential Elections, opinions — ran @ 5:30 pm

For the past year and a half we had this question on the front page of Omgili. We followed the hopefuls before anyone knew it’s going to be McCain against Obama, and the Pailin/Biden running mates. In three days we will all know the answer, but in the meantime we can bring our best educated guesses.
Every day new polls predict the victory of Obama over McCain. The problem is that people might not always tell the whole truth about their decision, (We’re all familiar with the Bradley effect) and so the results are certainly not definitive.
I was curious if I can find more clues about the out come of the elections. It is true that people might be too “Politically Correct” to tell the truth, but on online forums, anonymity might uncloak more accurate results.
I ran two sets of queries on Omgili Buzz Graphs – One for Obama vs McCain to see the detailed and overall buzz about the two candidates in the past month, and a second for discussions mentioning only one candidate without the other.

Let’s take a look at the first query graphs:

As you can see, the number of discussions about Barack Obama is indeed higher than about McCain. We can also see that the number of discussions is rising over time as we approach Election Day. I must say that even though the number of discussions is higher for Obama, the margin is quite small (20%-30%) for Obama. I expected a much bigger delta.

The real interesting figure is on the second query. It is only natural that when you speak of one candidate you will also mention the other. I wanted to see how many discussions are taking place around a single candidate Obama or McCain, in other words, who is more “interesting”?

Now these results have a much more distinct trend. First of all, there is MUCH more discussion about Barack Obama than about McCain. Furthermore, as we approach Election day, the number of discussions about Obama is increasing rapidly whereas with McCain the number is decreasing (up to 400%).
Brand Monitoring, and Web 2.0 based research is a very hot topic these days. I am not saying this is a qualified research, but it will sure be interesting to see if there is a correlation between the above graphs and the outcome of the elections.

July 15, 2008

OpinionatR.com – Everybody’s Got an Opinion

Filed under: News, Omgili, opinions — ran @ 10:51 am

I am happy to announce the release of OpinionatR.com. OpinionatR is a search engine focused around… well… opinions. Thinking about buying a camera? Why not get the opinions from thousands of real people who bought one or thinking about buying.
Opinionatr.com Screenshot
What do people think about the new iPhone, about the Stock Market or the situation in North Korea? It is easy to find opinions, just use Opinionatr.com friendly URL:

http://www.opinionatr.com/opinions/<Your Topic>

As usual, your feedback is always appreciated.

Enjoy,

Ran Geva

May 15, 2008

The Google – Killer (for real)

Filed under: opinions — ran @ 2:28 pm

Every now and then, a new search engine (usually in stealth mode) is promising to be the next Google. Heck, even better than Google! Much better!!! They don’t want to be called “Google Killers” (high expectations?) but that’s exactly the buzz they are trying to create.

Usually the buzzing is around a new semantic technology or sometimes other words such as “Human Search” or “Collective Intelligence” and other terms a CEO blurbs to a VC.

The common premise is simple: the Internet is big. very very big. ever expanding. too much information. Google doesn’t bring back relevant results from time to time. We (the buzz search company) will offer better relevancy because we have an ex-googler on our team – hooray.

Now, don’t get me wrong, I wish good luck to us all (I own a search engine – although no ex-googlers on our team). We all try to bring something to the table, slice, dice and hopefully create something helpful and useful for our users. The cynical tone in the above lines just conveys my feelings about the unnecessary buzz and shear amount of it about these “new disruptive technologies” that in most cases turn out to be nothing at all.

What I want to see

The next “Google-Killer” shouldn’t be a search engine that brings back more relevant results. The next search engine shouldn’t be a Search Engine at all. (…pause for surprise and suspense).

The next “Google-Killer/Slayer/Buyer” should utilize the data it covered to create a knowledge base where it can deduce new information and answer user’s queries not with pointers to articles, but with new information it created on top of the knowledge it accumulated.

An answer to the question “Is Elvis Presley an alien?” should not return references to elvisinfonet.com and toppun.com articles. I don’t want to read that – I have A.D.D. Give me the answer! Probably something like:

Probably not. Elvis Presley is a performer, a singer and an actor. He was born on January 8, 1935 and died on August 16, 1977. It has been popularly suggested that he has been abducted by aliens, or that he is actually an alien who faked his own death so he could return to his home planet.

In Mostly Harmless by Douglas Adams, Elvis is discovered by Ford Prefect and Arthur Dent working as a bar singer on an alien planet, and owning a large pink spaceship.”

Of course, all the sources where the system deduced the information from should be referenced as hyper links inside the text.

Cool! This is buzz worthy. When you think about it, it’s quite simple… The system (Let’s call it G-Dooms-Day) should recognize the keywords “ Elvis Presley” as a name (using the above semantic technology we mentioned), make an internal search on its database, extract all knowledge about it and structure the relationships between the entities it finds, do the same for the keyword “Alien”, compare between the two and send the results your way – simple!

Obviously it is not simple, actually I think it’s borders the impossible. But hey you wanted a disruptive technology – here it is. And I will tell you another thing, the day this technology will become a reality Sergey & Larry will cash their Gstocks and run for the hills (to build a huge mansion with the money they just got).

This is my vision for the Google-Killer, and I hope I won’t get disappointed when those stealth search companies uncloak to reveal their uber-cool search engine.

 

August 22, 2007

The (Near) Future of Search Engines

Filed under: opinions — ran @ 10:07 am
Charles knight referred me to Kaila Colbin blog post about what the future (2010) search engine should look like. I was honored to be asked for my opinion so I sat down and wrote my thoughts about what the future search engine should be. Granted, this is not exactly what I was asked to do, but I ended up with a four pages document so I decided to post it on our blog. I hope you find it interesting.

The (Near) Future of Search Engines

We have a saying that “Prophecy was given to the fools” (loosely translated from Hebrew) that basically means that only fools try to foresee the future. Since I don’t consider myself a fool (at least not at this point in my life) I will start by saying that I have no idea how search engines will work but I have some ideas about how they should.
After the previous unnecessary paragraph I will dive right into the subject at hand. Basically all proprietary search engines are built out of four parts: data collection (crawlers), data storage (indexing), data extraction (searching) and data presentation.

Data Collection – the Crawler:

The crawler is the part that goes over the information and collects it. In my view, this is one of the most important components in a search engine. The crawler has control over:

  1. The amount of data being collected.
  2. The quality of the data.

Data Quantity:

The coverage of a search engine is a key element in its overall quality. We as users, would like our search engine to be aware of any piece of information out there. Currently, Google has massive amount of crawlers indexing the web, but even they admit they stand no chance in covering what is really out there. I don’t really believe we can calculate the percentage of what is really covered since we know the volume of what we crawled, but we don’t know what we missed so we cannot really solve this simple formula.
The future of the crawlers should be in diverting resources from hundreds of centralized crawlers to the millions of users surfing the web. Let the user’s computers act as crawlers. This will save bandwidth and be much more efficient in finding the “dark parts” of the web. I actually think Google is already utilizing this method when we use some of its free software. They of course ask our permission to do so as they should. The problem with this method is with the indexer and I will refer to that later.

Data Quality:

The information is out there, there is a whole ocean of it – no worries there. Lets say we were able to collect it and successfully store it. If we are unable to correctly analyze it, then we will have problems retrieving it in a relevant way. Not all web pages where created equally, different pages have different roles. Inside the pages, different segments of text serve a different purpose.
It is very important to discriminate between the importance of web pages in dependence of the contextual meaning (i.e pageRank), but it’s as important to discriminate between the different parts of text inside these pages and to analyze their meaning.
The future search engine should include crawlers that “understand” the page structure and detect the role of each part.
I will supply an example:
A simple Wikipedia page has titles, paragraphs, different words are marked by underline/bold notations and we can calculate the importance of the document by its structure. We can assume that titles are important, bold and underline words are significant. The frequency of a word in a paragraph related to the diversity of words in the text serves another important role. The above considerations (and more) could be applied to many web pages that should be categorized as articles. Now what about web pages from discussion based sites? The structure and pageRank importance of these pages is completely different. The crawler should analyze the page as a discussion and not as an article since a discussion has a title, a topic and replies. The considerations taken into account should be applied to the separate parts on the discussion. A discussion has a date and replies. Text inside the 22nd reply does not have the same importance as the one in the 1st reply or the topic. Many times, the discussion page includes a text inside the page that has nothing to do with the discussion in hand. The crawler should be able to decipher between these document segments and textual attributes. The better it does the job, the more relevant the results we will get.

User intervention

I am not a big fan of users affecting the quality of the results. A good result for your query doesn’t have to be a good one for mine. It might work for a very simple query with one or two words, but certainly, when the query becomes more complex, even a human wouldn’t necessarily understand what one is asking for. User intervention in rating and grading a document can cause damage and skew the results since it cannot be normalized into a known and predictable algorithm.

Data Categorization

The crawler should apply categorization scheme upon each document. This categorization will help the user guide the search engine to better focus the results in case the focus was in the wrong place. The categories should match today’s “vertical” search engines; for example Blogs (technological, gossip, etc.), news (world, regional, science etc.), discussions (reviews, opinions, QnA etc.) and so on.

Bottomline

The crawler is the intelligent agent that collects and analyzes the data. There should be millions of crawlers, analyzing every piece of data on the web. The analysis should take into account the structure of the page, its category and prioritize its internal content accordingly.

Data Storage and Indexing

The future search engine should be updated about the content on the world wide web in real-time. Today, search engines prioritize the resources they crawl, in order to stay updated with the most important content in real time, but this is not enough and the road is long.
As mentioned above, Once the crawlers collect the data they should pass it on for storage. If we accept the idea that the users’ computers should act as crawlers, data from millions of computers will flow into one or more centralized data centers. There is a need for a vast amount of processing power in order to process data in this rate. It’s like trying to count all the water molecules of the Niagara falls as they splash in (or maybe I’m vastly exaggerating but you get the idea).

Bottomline

More data centers and more computational power should be invested in indexing incoming data in order to meet the real-time demand.

Data Extraction

The search and extraction of the data that has been crawled and stored, has very much to do with the quality of the crawling. If the crawler was able to analyze the page importance in regards to its contextual meaning correctly, and to determine the page structure by using a categorization methods, it will be much easier to retrieve the relevant document for a certain query.
There are many debates about how a user should interact with the machine – in our case the search engine. Currently, we as users need to rewrite the way we speak in order to interact with a search engine. Many times we need to completely rephrase a sentence so a search engine will “understand” it correctly (hopefully). No doubt, there is a need to solve this problem. Again, this is not the case when the search query is simple and contains only few and straight forward keywords. Unfortunately, many times this is not the case and a good search engine should be able to analyze the meaning of our search query. Semantic search engines are on the right path.
I also think, previous queries in the immediate time frame should be taken into account. When a user tries again, after failing to receive good results on a previous attempt he/she is “hinting” the search engine where to focus and what to fix.

Bottomline

The future search engine should try to understand free text and use semantic methods to better “understand” what the user is looking for. It can also interact with the user by suggesting more keywords in order to narrow down the field of interest.
The same query could return different results (and more accurate) for different users if the engine takes into account the previous queries the user entered (in the immediate time frame).

Data Presentation

If a search engine conquered all the technological barriers but the presentation layer isn’t good enough than it failed. The presentation is the place where the user sees the results returned in response to a query. If the results weren’t good enough, the search engine should supply an easy way to refine them. The user has very little tolerance, the interface should be highly intuitive and fast.
Giving too much information around the results is a big mistake, since the user will get lost. Too much flash and animation is cool and fun but not in the long run. When a user is looking for information it should be to the point.
The preview snippets should have the ability to be easily expanded further since this is where our eyes are focused when we check out the results relevancy.
I really don’t like the image previews on a website – It breaks the design flow, colors are mixed, the preview images are usually out dated and if not the user cannot read the text on the page. For reading the text we have preview snippets that are ordered by relevancy and marked when needed.

Summary

The future search engine should be a simple one on the front end. It should be fast, intuitive, and the user should easily guide the engine to refine the results when needed. The back-end should be adjusted to handle mass amount of information retrieved by millions of crawlers and analyzed according to the type of the document.

Combining all things said above should create an easy to use, intuitive, up-to-date “Find-Engine”.

Cheers,
Ran Geva,
Omgili CEO/Founder.

Powered by WordPress