Michal Hrušecký

...about me, Linux and OpenSource

openSUSE Search

During last two week I was among the other things investigating how to implement search through all openSUSE web pages. As part of our Umbrella project, we want to make all our webs look unified and search through all of them is a part of this goal. So what I tried and what are my conclusions? Let's see...

Customized Google Search

First idea would be to use Google search. They are offering customized search for anyones web. They are good at searching and we wouldn't need implement anything by ourself. Other upside of this approach is that we wouldn't need any infrastructure for this. They will let us use their machines.

But it has some downsides as well. One minor thing is that it will index all our sites regardless of their content. So wiki pages may come up less relevant then a comment in someones blog. I don't think we really want this.

The major downside as I see it is the agreement. I'm not a lawyer, but I didn't like it. Lets say that request that we have to display some advertisements and Google preferred links is ok. But according to the agreement we can't customize the results as much as we want, we can just provide some theme and they may use it somehow (no details in agreement). Other thing I found quite a disturbing is that they can use our logo and trademarks forever to promote their products. Well, I don't think we mind right now, but forever is a long time. Maybe I just didn't understood the agreement well, but I'm quite sure that we don't want to use it without discussions with some skilled lawyers.

At last but not least, as an Open Source community we should try to go for some Open solution. So I decided to check some Open Source engines available on the internet...

Open Source Solutions

I took a brief look at Xyzse and Swish++. Disadvantages I found were that last versions seemed to be released somewhere in 2008. This doesn't have to be bad, but I think that something more alive may be better. And Other thing I didn't liked was that it seemed like I need to hardcode some search limitations during compilation of the packages (at least it looked like that from installation instruction that required to edit some headers manually).

YaCy

YaCy is a really interesting search engine. It's main innovative idea is that it is decentralized. You just run one peer, connect it to the network and then search across all peers in that network. You don't need any big server, you can make it work with just everybody indexing his own web. Really interesting idea. One small thing I personally didn't like was that it is in Java (I don't really speak Java). It was quite easy to try it as it started it's own web server, but it looked like it wouldn't be easy to customize it. It would be great to use it for my own webpages, but I think we want something else for openSUSE.

Datapark Search Engine

Last search engine I want to speak about is Datapark Search Engine. It is Open Source engine and it is written in C. For storage of the data it can use MySQL, PostgreSQL or SQLite. It can be used as a cgi on web, as an apache module or through it's php bindings. Results page is highly customizable. It's just a HTML template that gets read and filled with results. So it wouldn't be any problem to create Bento theme for it and make it integrated with the rest of our webs.

Other interesting feature is that it allows to tag all servers and create hierarchical category list to make searching on some part of our infrastructure easier. Didn't tried this feature yet, but I think we can use it. We can also add some extra points to the most relevant webs (I think wiki deserves this).

Last very interesting feature is that it can index pretty much anything. It doesn't have to be only web pages. Everybody can write its own plugin that knows how to handle some specialized format. If I want to be able to search among the rpms on the Build Service, I can write easy filter to make it possible. And then during the search for MySQL I wouldn't see only Wiki pages dedicated to the MySQL and related blogposts but also rpms of MySQL itself. Pretty interesting, isn't it? I'm not really sure whether we want this, but we can do it with this search engine face-wink.png

Conclusion

I think we should use Datapark Search Engine. Because it's Open Source, it has categories and tags, it can add extra points to sites we like and it's highly customizable. If I missed something interesting we should evaluate, please let me know. There are many interesting projects out there and I tried only few of them. Although I think I found what I was looking for, any comments are welcome as well as any suggestions...

#1 Re: openSUSE Search

darix, <> / Monday 15 March 2010 3:46pm  
avatar

did you poke yaloki about it?

he is working on solr for the the package search.

[ Reply (0) ]

#2 Re: openSUSE Search

Michal Hrušecký, <E-Mail> / Monday 15 March 2010 4:06pm  
avatar

Not yet, poking him now... Thanks for suggestion ;-)

[ Reply (0) ]

#3 Re: openSUSE Search

Pascal Bleser, <E-Mail> / Monday 15 March 2010 4:56pm  
avatar

You obviously missed the best solution by far: Apache Solr :)

[ Reply (0) ]

#4 Re: openSUSE Search

Chris, <E-Mail> / Monday 15 March 2010 8:36pm  
avatar

+1 for Solr. The open source thing that is better isn't written yet.

[ Reply (0) ]

#5 Re: openSUSE Search

David, <E-Mail> / Thursday 18 March 2010 8:21am  
avatar

YaCy has a really powerful API if you want to hide ("customize") it from users.

[ Reply (0) ]

Leave a Comment

Write the captcha code you are seeing.

Comment XML feeds: RSS | Atom

openSUSE Conference 2010

Identica

  • Last ride in the driving school on Wednesday, then the exams on Monday, I wonder whether I'll make it...
    2 days ago :: link
  • Did you knew that all ducks are wearing dog mask? http://i.imgur.com/XSVIy.jpg
    16 days ago :: link
  • Firt ride in the driving school. No crush, kill or destroy so I guess it's going well so far :-D
    16 days ago :: link
  • Already at #frosconn preparing for !openSUSE Connect hacking session!
    18 days ago :: link
  • Are you interested in !openSUSE Connect? Join us at #froscon! http://bit.ly/dDdKIp
    21 days ago :: link

Ohloh Journal for Michal Hrušecký

  • Finally pushed Affisix version that supports variables in the way I'm satisfied with. Time to get back to the documentation fixing ;-)
    9 days ago :: link
  • Article about Affisix was acceptet for ITAT 2010! Time to fix everything that reviewers complained about!
    76 days ago :: link
  • Abstract for ITAT2020 about Affisix sent, working on the article now...
    106 days ago :: link
  • Just commited basit filter mode support to the Affisix repository. It is still missing a lot of features, but the basics are already there!
    132 days ago :: link
  • One more note about affisix - backward entropy works now as well. So after school exams, I can start implementing new features and fixing broken ones...
    199 days ago :: link

Emblems

Powered By Jaws Project
Supports RSS2
Supports Atom
Powered By openSUSE
Powered By Vim
Is Valid XHTML
Is Valid CSS
Is Valid RSS2