Omgili blog

August 28, 2007

Weekly Buzz

Filed under: Buzz — Yoav Pridor @ 5:07 pm

USA – Hope for a better future


The Video tab on Omgili buzz, brings us the greatest videos of the week. This week there is one amazing video that is simply in a league of its own. I used the word amazing because its incredibly hilarious and at the same time, painfully sad.
Watch as the South Carolina representative to the Miss Teen USA Pageant, is asked a basic question. To view, click here.
This video provoked a tremendous number of online discussions. Sample some, they are pretty funny as well.

Sport is bad for you

The Buzz cloud for the past week, brought up the name Eddie Griffin. I immediately thought it was the comedian, but the chatter was all about the terrible news of the untimely death of Eddie Griffin, the 25 year old NBA player who was, until recently, with the Minnesota Timberwolves. Reading the discussions regarding Griffin’s death, I learned about a talented kid who was mostly involved in violence, Alcohol abuse and drunken driving. Griffin spent most of his professional Basketball career, in rehabilitation, in court and under injunctions. He died as his SUV drove straight into a speeding train, some think it was suicide.

Look at some of my recent posts. There was Barry Bonds, who is believed to be breaking records thanks to the influence of illegal drugs. There was Mike Vick, with the dog fighting allegations. It seems that nowadays Athletes, at least successful professional ones, are generally a bad influence.

I think there was a time where young kids could genuinely admire these superstars.

What happened?

My Birthday

I had a birthday on Thursday. This is something that has been happening every year for quite a few years now. So I figured that by now it should create some kind of buzz. So I went to work:
First I made a Buzz graph for the term Birthday. To my astonishment, there was no special chatter peek on my birthday (Weird !?). I turns out that there is one birthday that is discussed online more than any other – the birthday of The United States of America. Check out the graph:


Then I made a Buzz Graph for the date of my birthday and Bingo! It seems like a lot of people talk about my birth date on that specific day:

I guess its an event after all (-:

Thanks for reading, for using Omgili and for your feedback,
Yoav Pridor, Omgili

August 22, 2007

The (Near) Future of Search Engines

Filed under: opinions — ran @ 10:07 am
Charles knight referred me to Kaila Colbin blog post about what the future (2010) search engine should look like. I was honored to be asked for my opinion so I sat down and wrote my thoughts about what the future search engine should be. Granted, this is not exactly what I was asked to do, but I ended up with a four pages document so I decided to post it on our blog. I hope you find it interesting.

The (Near) Future of Search Engines

We have a saying that “Prophecy was given to the fools” (loosely translated from Hebrew) that basically means that only fools try to foresee the future. Since I don’t consider myself a fool (at least not at this point in my life) I will start by saying that I have no idea how search engines will work but I have some ideas about how they should.
After the previous unnecessary paragraph I will dive right into the subject at hand. Basically all proprietary search engines are built out of four parts: data collection (crawlers), data storage (indexing), data extraction (searching) and data presentation.

Data Collection – the Crawler:

The crawler is the part that goes over the information and collects it. In my view, this is one of the most important components in a search engine. The crawler has control over:

  1. The amount of data being collected.
  2. The quality of the data.

Data Quantity:

The coverage of a search engine is a key element in its overall quality. We as users, would like our search engine to be aware of any piece of information out there. Currently, Google has massive amount of crawlers indexing the web, but even they admit they stand no chance in covering what is really out there. I don’t really believe we can calculate the percentage of what is really covered since we know the volume of what we crawled, but we don’t know what we missed so we cannot really solve this simple formula.
The future of the crawlers should be in diverting resources from hundreds of centralized crawlers to the millions of users surfing the web. Let the user’s computers act as crawlers. This will save bandwidth and be much more efficient in finding the “dark parts” of the web. I actually think Google is already utilizing this method when we use some of its free software. They of course ask our permission to do so as they should. The problem with this method is with the indexer and I will refer to that later.

Data Quality:

The information is out there, there is a whole ocean of it – no worries there. Lets say we were able to collect it and successfully store it. If we are unable to correctly analyze it, then we will have problems retrieving it in a relevant way. Not all web pages where created equally, different pages have different roles. Inside the pages, different segments of text serve a different purpose.
It is very important to discriminate between the importance of web pages in dependence of the contextual meaning (i.e pageRank), but it’s as important to discriminate between the different parts of text inside these pages and to analyze their meaning.
The future search engine should include crawlers that “understand” the page structure and detect the role of each part.
I will supply an example:
A simple Wikipedia page has titles, paragraphs, different words are marked by underline/bold notations and we can calculate the importance of the document by its structure. We can assume that titles are important, bold and underline words are significant. The frequency of a word in a paragraph related to the diversity of words in the text serves another important role. The above considerations (and more) could be applied to many web pages that should be categorized as articles. Now what about web pages from discussion based sites? The structure and pageRank importance of these pages is completely different. The crawler should analyze the page as a discussion and not as an article since a discussion has a title, a topic and replies. The considerations taken into account should be applied to the separate parts on the discussion. A discussion has a date and replies. Text inside the 22nd reply does not have the same importance as the one in the 1st reply or the topic. Many times, the discussion page includes a text inside the page that has nothing to do with the discussion in hand. The crawler should be able to decipher between these document segments and textual attributes. The better it does the job, the more relevant the results we will get.

User intervention

I am not a big fan of users affecting the quality of the results. A good result for your query doesn’t have to be a good one for mine. It might work for a very simple query with one or two words, but certainly, when the query becomes more complex, even a human wouldn’t necessarily understand what one is asking for. User intervention in rating and grading a document can cause damage and skew the results since it cannot be normalized into a known and predictable algorithm.

Data Categorization

The crawler should apply categorization scheme upon each document. This categorization will help the user guide the search engine to better focus the results in case the focus was in the wrong place. The categories should match today’s “vertical” search engines; for example Blogs (technological, gossip, etc.), news (world, regional, science etc.), discussions (reviews, opinions, QnA etc.) and so on.

Bottomline

The crawler is the intelligent agent that collects and analyzes the data. There should be millions of crawlers, analyzing every piece of data on the web. The analysis should take into account the structure of the page, its category and prioritize its internal content accordingly.

Data Storage and Indexing

The future search engine should be updated about the content on the world wide web in real-time. Today, search engines prioritize the resources they crawl, in order to stay updated with the most important content in real time, but this is not enough and the road is long.
As mentioned above, Once the crawlers collect the data they should pass it on for storage. If we accept the idea that the users’ computers should act as crawlers, data from millions of computers will flow into one or more centralized data centers. There is a need for a vast amount of processing power in order to process data in this rate. It’s like trying to count all the water molecules of the Niagara falls as they splash in (or maybe I’m vastly exaggerating but you get the idea).

Bottomline

More data centers and more computational power should be invested in indexing incoming data in order to meet the real-time demand.

Data Extraction

The search and extraction of the data that has been crawled and stored, has very much to do with the quality of the crawling. If the crawler was able to analyze the page importance in regards to its contextual meaning correctly, and to determine the page structure by using a categorization methods, it will be much easier to retrieve the relevant document for a certain query.
There are many debates about how a user should interact with the machine – in our case the search engine. Currently, we as users need to rewrite the way we speak in order to interact with a search engine. Many times we need to completely rephrase a sentence so a search engine will “understand” it correctly (hopefully). No doubt, there is a need to solve this problem. Again, this is not the case when the search query is simple and contains only few and straight forward keywords. Unfortunately, many times this is not the case and a good search engine should be able to analyze the meaning of our search query. Semantic search engines are on the right path.
I also think, previous queries in the immediate time frame should be taken into account. When a user tries again, after failing to receive good results on a previous attempt he/she is “hinting” the search engine where to focus and what to fix.

Bottomline

The future search engine should try to understand free text and use semantic methods to better “understand” what the user is looking for. It can also interact with the user by suggesting more keywords in order to narrow down the field of interest.
The same query could return different results (and more accurate) for different users if the engine takes into account the previous queries the user entered (in the immediate time frame).

Data Presentation

If a search engine conquered all the technological barriers but the presentation layer isn’t good enough than it failed. The presentation is the place where the user sees the results returned in response to a query. If the results weren’t good enough, the search engine should supply an easy way to refine them. The user has very little tolerance, the interface should be highly intuitive and fast.
Giving too much information around the results is a big mistake, since the user will get lost. Too much flash and animation is cool and fun but not in the long run. When a user is looking for information it should be to the point.
The preview snippets should have the ability to be easily expanded further since this is where our eyes are focused when we check out the results relevancy.
I really don’t like the image previews on a website – It breaks the design flow, colors are mixed, the preview images are usually out dated and if not the user cannot read the text on the page. For reading the text we have preview snippets that are ordered by relevancy and marked when needed.

Summary

The future search engine should be a simple one on the front end. It should be fast, intuitive, and the user should easily guide the engine to refine the results when needed. The back-end should be adjusted to handle mass amount of information retrieved by millions of crawlers and analyzed according to the type of the document.

Combining all things said above should create an easy to use, intuitive, up-to-date “Find-Engine”.

Cheers,
Ran Geva,
Omgili CEO/Founder.

August 21, 2007

Weekly Buzz

Filed under: Buzz — Yoav Pridor @ 5:47 pm

Bourne for the Buzz


It took a week, but the new “Bourne” film has finally hit the buzz. “Bourne Ultimatum”, the third and final movie in the Bourne series aired in the states in the beginning of August and in most of Europe last week, and seems to be doing very well.

The discussions about the film are rather favorable and most of them are recommending to go see. This movie, like the previous one, was directed by Paul Greengrass, who is said to have made a wonderful film that “builds on the success of its predecessors rather than destroy them” .

Enjoy.

Cheney Vs. Cheney

The most buzzed video of the week is a 1994 clip of Dick Cheney, speaking against the option of an all out invasion of Iraq. In the clip, Cheney explains the potential outcomes of such an invasion, and it actually sounds like he’s describing the current situation in Iraq (and Afghanistan). Until 1993, Dick Cheney served as Secretary of defense in George Bush’s (The Father) administration (During the first war in Iraq), and in 1994 he briefly considered running for the presidency. The clip originates in an interview that was held with Cheney in 1994, at the American Enterprise Institute, where he was working at the time.
As of today, Dick Cheney is the Vice President of The United States and there are 4,107 dead American service men and women since the beginning of the war in March 2003.

Breaking TV records

I was surprised to find “High School Musical 2” in very large bold print on the Buzz Cloud. It seems that this TV movie sequel aired on Friday, the 17th of August on the Disney Channel, and broke quite a few viewing records. High School Musical 2 was the most-watched basic cable television program of all time. With 17.2 million viewers, it was also the most-watched basic cable movie of all time, the highest-rated television program ever for children age 6 to 11 and the second-highest-rated television program of all time (behind only the 2004 Super Bowl) for viewers age 9 to 14. It also was the most-viewed Friday television telecast, cable or broadcast, in the past five years. Now the $7 million musical sequel will be shown in 100 countries.

The new king of baseball

On my “Weekly Buzz” post from August 7th, I wrote a small entry about Barry Bond’s 755th home run which tied him with Hank Aharon for the world record for career home runs.
On that entry I predicted that he may break the record in the following week, and that that will get the buzz going once more. I didn’t follow up on it in the following week, but it turns out that Bonds broke the record on the next day. Check out the Omgili Buzz Graph for Barry bonds on that week:

Thanks for reading, for using Omgili and for your feedback,
Yoav Pridor, Omgili

August 12, 2007

Weekly Buzz

Filed under: Buzz — Yoav Pridor @ 11:32 am

Blame it on Global Warming


Global Warming is very big in the Omgili Buzz Cloud this week. Last time the online discussion about this subject soared, it was due to the “Live Earth” event. This time, there seems to be more than one reason, although there is a main theme.
The reason that seems to be most attached to this specific timing is, oddly enough, the collapse of the bridge in Minneapolis. The forums are very busy discussing the option that the collapse was caused by Global warming. I personally think we can blame everything on global warming. That way we’ll have something to blame anything on, and we’ll save the tremendous amount of energy we all spend on finding who is to blame. That in itself may slow down global warming.

Hello Kitty badge of shame

An abnormal peak in the buzz over “Hello Kitty”, grabbed my attention this week. I researched the term and found the following phrases being associated with it: “Thai Police” and “BANGKOK, Thailand “. Digging into the discussions, I found a very bizarre story.
It seems that the Thai police have come to a conclusion that regular reprimand and punishment, inflicted on police personnel, who fail their duties or don’t obey the rules, is not doing the job. So, they came up with a harsh new sanction: Thai police men and women, who will fail to uphold their duties, will be forced to ware a pink “Hello Kitty” badge.
imagine the disgrace (-:

Thanks for reading, for using Omgili and for your feedback,
Yoav Pridor, Omgili

August 7, 2007

Weekly Buzz

Filed under: Buzz — Yoav Pridor @ 7:25 pm

Barry Bonds Continued

Last weeks “Barry Bonds” topic, got replaced in the top of the buzz chart, by the term “Hall of fame”. Bonds hit his 755th career home run, tying Hank Aaron’s world record. There’s a really big opposition in online discussions, to Bond joining the baseball Hall of Fame, because of his alleged steroid use history. Will he break the record this week? Very probable. I anticipate a peek in the buzz when that happens.

“Tomorrow” never dies

Tom Snyder, one of American TV’s icons, died on July 29th at the age of 71. He died from complications of Leukemia.
Snyder started out as a radio talkshow host and gained national fame as the host of Tomorrow with Tom Snyder (more commonly known as The Tomorrow Show), which aired late nights after The Tonight Show on NBC from 1973 – 1982. It was a talk show unlike the usual late-night fare, with Snyder, cigarette in hand, alternating between asking hard-hitting questions and offering personal observations that made the interview closer to a conversation.
Peak moments with Snyder on Tomorrow included John Lennon’s final televised interview, in April 1975, and Irish rock band U2’s first American television appearance in June 1981. Also memorable was the 1980 cigarette smoke-filled appearance of Public Image Ltd.’s John Lydon (aka Johnny Rotten) and Keith Levene, whose thoroughly uncooperative twelve-minute appearance on the show acquired a long-term notoriety. “Weird Al” Yankovic’s first television appearance was on the show in April 1981.
The show was canceled in 1982, to make way for the up-and-coming young comedian, David Letterman.

Brooklyn bridge is falling down?

There was a lot of reference to the Minneapolis bridge collapse tragedy, in online discussions this week. At first the discussion was news related. People linked to news coverage of the event mostly on CNN but also on fox news, MSNBC and local networks.
Later on discussions brought on an interesting angle. One of the terms that popped up in many of these discussions, was “Brooklyn Bridge”. I went into a few of those conversations to find out what the connection is. Most of these discussions were being held by worried New Yorkers who know how old the Brooklyn Bridge is (Construction started in 1870. Opened for traffic 1883).
But somebody actually went and checked: Check out this story from the New York Times. The headline reads: “Brooklyn Bridge Is One of 3 With Poor Rating”. Should we be worried? I hope the bridge’s next maintenance, which is scheduled to begin in 2010, will be re-scheduled after this.

Thanks for reading, for using Omgili and for your feedback,
Yoav Pridor, Omgili
Next Page »

Powered by WordPress