Simon Wistow ([info]deflatermouse) wrote in [info]news,

The search for enlightenment

Since I was outed in the State of the Goat post I figured it was probably a good time to let you know what's going on with search and LJ.

In short ...

  • We're building a search system for LJ: journals, entries, communities, people, or everything on the site.
  • Our new search will respect your privacy settings.
  • We tried talking to partners like Google and Yahoo about using their search stuff, but that didn't work out.
  • LJ's new search is going to be built on 100% Free-as-in-Speech Software.
  • It'll take a few months to finish everything up and to index the billions of entries already in LJ, but we'll start testing things before then.
  • We wanted to let you know about it before it launches because our New Year's resolution was to have fewer surprises. Except for Ninja attacks, which must by their very nature remain a surprise.


If you want more details then read below the cut.



As part of our New Years Resolutions we at LiveJournal have vowed to stop ringing the buzzers of our neighbours and then running away before they open the door. Oh, and because it's one of the most commonly requested things by users we're going to be more proactive in telling you what we're working on for the future. Which is probably slightly more relevant to you. This will have to suffice until Brad can finish working on his telepathy device so that finally we can all live in some sort of huge borg-like hivemind and you'll know exactly what we're up to which will, conveniently be exactly what you want. Hurrah!

So, Search then. First of all, you're probably wondering why LJ doesn't have search at the moment. There's a simple reason for that. We're lazy. No, wait, that's the reason why nobody's cleaned up the Nacho Hat we had for Cinco di Mayo last year that's still sitting in the kitchen and has kind of gone all fuzzy and grey.

The real reason is that LJ is big. Really big. Kind of like the Nacho Hat, come to think of it.

Seriously though, put it this way - a while back Yahoo! and Google were crowing about the fact that they had around 19 billion items in their index. And these are big companies with billions in the bank and large search teams.

Then there's us. In contrast we have a mere 3 billion posts and comments. Oh, and another 16 added every second. We don't, on the other hand, have a vast, inexhaustible budget (since Brad took the 7 or so billion he got from selling his soul to Six Apart and built a house out of bundles of 100 dollar bills from which he plots new ways to take over the world. And works on his telepathy device) or, indeed, a large team of people. We have me, working on indexing posts and comments, and Brad and Mischa, beavering away rewriting the directory and user searches to be shiny and spiffy.

However, despite these limitations, for the last couple of months we've been building a search engine. It uses one third of San Francisco's water supply to cool and it has a dedicated Nuclear Reactor just to provide the power to index the word "depressed". The phrase "my parents don't understand me" takes up so much physical disk space that we had to hire the hangers at Moffets Field. For those wanting more technical details it's written using 100% Free-software although, in a bit of departure from our usual Perlish nature, it uses Lucene from the Apache Foundation for various technical reasons which we may go into later. For those wanting really technical details it works sort of like Google's GFS and MapReduce. Except changed to be more optimised for blog posts and comment threads.

We haven't finalised the features yet - to be honest I'll be glad just to get the data into the index. Current, conservative guesstimates have it taking 3 months just to back index what we already have. What it will do is respect your privacy and pay attention to the waxes and wanes of your friends list. It will let you search date ranges and restrict your search to individual journals, communities, people or entries and their comments. Apart from that we'll just have to see where we go from there. Don't worry - we got ideas (and to us that's dear)[*].

You may be wondering why we didn't outsource to Google or Yahoo! or someone else. Or use an existing search engine package. And the answer is - we tried. It didn't work out. Trust us, it would have been a lot easier and we would have done it if we could but indexing something like LiveJournal and getting the most out of the meta data (explicit or implicit) requires a more custom approach.

We can't give you an exact time frame for when it's all going to be unveiled - we hope soon. We're currently doing internal testing and then we'll probably roll it out to permanent and paid accounts first to see if it falls over horribly under real load. Then we'll roll it out everywhere. Hopefully that will mean we find all the bugs first and you, my dear LJers, get pure 100% distilled awesomeness.

So that's it - feel free to comment and ask questions and we'll try and answer them as best we can without either incriminating ourselves or making promises we can't keep.

- Simon (aka [info]deflatermouse)

[*] Spot the musical reference, pop fans.

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    Your reply will be screened

    Your IP address will be recorded 

  • 1207 comments
Previous
← Ctrl← Alt
Next
Ctrl →Alt →

[info]decadence1

January 19 2007, 19:58:10 UTC 5 years ago

Brave stuff! Sounds impressive.

[info]dzvinkaya

January 19 2007, 20:22:09 UTC 5 years ago

Первая нах (на русском) И ниипёт! :о)

[info]buddahz

January 19 2007, 19:58:12 UTC 5 years ago

первый н)

[info]anomie666

January 19 2007, 19:58:14 UTC 5 years ago

First?

[info]mommadona

January 19 2007, 23:51:06 UTC 5 years ago

ur lame

[info]anomie666

5 years ago

[info]mommadona

5 years ago

[info]pompomelo

5 years ago

[info]blindkit

January 19 2007, 19:58:16 UTC 5 years ago

Cool,first!)

[info]blindkit

January 19 2007, 19:58:43 UTC 5 years ago

лол =)

[info]scottique

January 19 2007, 19:58:20 UTC 5 years ago

Hooray! Finally finally!

Thanks for the note about privacy. Nobody wants another freaking Facebook.

[info]kellzilla

January 19 2007, 20:35:27 UTC 5 years ago

Holy crap, it's you! :D

[info]mel06

5 years ago

[info]mel06

5 years ago

[info]burr86

January 19 2007, 19:58:32 UTC 5 years ago

YAY SIMON :D

[info]slyfoot

January 19 2007, 22:47:11 UTC 5 years ago

Yay for SEARCH!!

Best [info]news post EVER.

[info]betweenclark

January 19 2007, 19:58:35 UTC 5 years ago

First page!

[info]mommadona

January 19 2007, 23:51:16 UTC 5 years ago

ur lame

[info]forecaster15

January 19 2007, 19:58:48 UTC 5 years ago

What about pirate attacks? Will we get warning about those?

[info]veroz

January 19 2007, 20:05:54 UTC 5 years ago

Yes. Pirates are not as suave and sexy as ninjas are.

[info]veroz

5 years ago

[info]tajasel

5 years ago

[info]c4bl3fl4m3

5 years ago

[info]halkeye

5 years ago

[info]gemfyre

5 years ago

[info]kightp

5 years ago

[info]starshine2night

January 19 2007, 19:58:51 UTC 5 years ago

Cool.

[info]erinzdad

January 19 2007, 19:58:57 UTC 5 years ago

The Ninja remark shall brand you as a geek/nerd for evermore. Kudos!

[info]kellzilla

January 19 2007, 20:36:02 UTC 5 years ago

As if working for LJ wouldn't. :D

[info]peshwengi

5 years ago

[info]jennifer

January 19 2007, 19:58:57 UTC 5 years ago

I can't wait for this. :D

[info]smarties_2087

January 25 2007, 17:58:28 UTC 5 years ago

*squeak* I love that show!

[info]flamingo_killer

January 19 2007, 19:59:09 UTC 5 years ago

Thank you for keeping the Ninja's in mind. It would be horrible to ruin their fun.

[info]sprite_fairy

January 20 2007, 00:38:06 UTC 5 years ago

I like ninjas! :-D

[info]pheret1

January 19 2007, 19:59:17 UTC 5 years ago

If we don't want our page indexed, will it respect our "tell robots and spiders to go away" setting?

[info]cahwyguy

January 19 2007, 20:13:02 UTC 5 years ago

I actually hope there are two settings. There's a difference between Google indexing things, and an internal LJ index that would respect things like "Friends" or "Custom Groups" settings on knowing when to display things.

[info]burr86

5 years ago

[info]burr86

5 years ago

[info]yrena

5 years ago

[info]pheret1

5 years ago

[info]helzebel

5 years ago

[info]nobody_

5 years ago

[info]zach

January 19 2007, 19:59:26 UTC 5 years ago

Okay.

[info]babyboo93003

January 20 2007, 23:31:50 UTC 5 years ago

whos this

[info]unloud

5 years ago

[info]forecaster15

January 19 2007, 20:00:03 UTC 5 years ago

P.S. SEARCH!! Yay!!!!

[info]tk_creations

January 19 2007, 20:00:16 UTC 5 years ago

first page!

[info]staceywoo

January 19 2007, 20:09:19 UTC 5 years ago

fancy meeting you here :o

[info]mommadona

5 years ago

[info]mommadona

5 years ago

[info]the_dude_xxi

January 19 2007, 20:00:30 UTC 5 years ago

balls.

[info]travis

January 19 2007, 22:28:11 UTC 5 years ago

your mouth.

[info]poke_me_an_die

January 19 2007, 20:00:55 UTC 5 years ago

Awesome

x

[info]forever

January 19 2007, 20:01:06 UTC 5 years ago

It sounds interesting, I'll be looking forward to hearing more about it as it progresses.

[info]yenesi

January 19 2007, 20:38:24 UTC 5 years ago

Hey, what's your icon from? It looks so familiar, but for the life of me I cannot remember...

[info]forever

5 years ago

[info]civilbloodshed

January 19 2007, 20:01:17 UTC 5 years ago

The hivemind already exists. However, its’ sole purpose is to propagate pornography (of any kind).

[info]duskwuff

January 19 2007, 22:38:22 UTC 5 years ago

Rule #34, dear sir.

[info]arohanui

5 years ago

Deleted comment

[info]kinomym

January 21 2007, 00:35:40 UTC 5 years ago

hola como estas eres linda

[info]dapperderp

January 19 2007, 20:01:56 UTC 5 years ago

Haha. Great post.

Search *bounce bounce*

[info]lafemmezilla

January 19 2007, 20:02:17 UTC 5 years ago

Sounds very cool! Thanks for letting us in on the plan. :)

[info]stella_tweety

January 20 2007, 09:27:07 UTC 5 years ago

hello!!!

hello!I would like to contact with me!

[info]hellsop

January 19 2007, 20:02:17 UTC 5 years ago

Okay, this is the best news release in a long time... I'm still chuckling about 'dedicated Nuclear Reactor just to provide the power to index the word "depressed".'

[info]museumfreak

January 19 2007, 20:05:43 UTC 5 years ago

+1

[info]ex_intheroom347

January 19 2007, 20:02:25 UTC 5 years ago

yay! i am sick of having to search through old entries for stuff.

[info]codestothestars

January 20 2007, 19:58:51 UTC 5 years ago

ICONNNN.
Previous
← Ctrl← Alt
Next
Ctrl →Alt →
Create an Account
Forgot your login or password?
Facebook Twitter More login options
English • Español • Deutsch • Русский…