del.icio.us web archive bookmarklet updated
A while ago, I posted a small bookmarklet here, that allows you to look up a stale del.icio.us link on the Wayback Machine at http://archive.org/.
With the rewrite of "del.icio.us" into "delicious", I had to adapt the Javascript side of the bookmarklet. Now it works again. The link to the bookmarklet didn't have to change, for completeness, here it is again. Just drag it to your bookmark bar:
The source can be found here:
git clone http://werkbold.de/~felix/git/archivecgi.git/
Object roles
During a recent refactoring of the LimeWire code base we pulled the search trie based keyword index out of the FileManagerImpl class and put it into its own class SharedFilesKeywordIndexImpl.
The new SharedFilesKeywordIndexImpl listens for file manager add/remove events and updates its search trie accordingly. The file manager is no longer coupled to the index. The implementation is abstracted away behind the SharedFilesKeywordIndex interface which is used in the components that need it:
public interface SharedFilesKeywordIndex extends FileEventListener {
public abstract Response[] query(QueryRequest request);
}
If you wanted to implement a Gnutella client with a different behavior that allowed other clients to still browse your client's shared files but not to search for them, you might be tempted to implement your own NullSharedFilesKeywordIndex that just didn't return any responses. And indeed, what a nice workaround this would be, at least better than before where you would have had to subclass FileManagerImpl.
So what's the problem here?
Looking at the LimeWire code to find out where SharedFilesKeywordIndex is used, we find out that we also just disabled the clients functionality that allows the user to search through their own shared files from the user interface. The class is used in this context too.
The problem is that the SharedFilesKeywordIndex doesn't represent the actual outward behavior of the program but constitutes internal behavior that is used in different ways to produce outward behavior. Thus, the keyword index must be agnostic of its callers and can't make decisions as to whether deliver responses or not.
So where's the right place to change external behavior?
If you imagine all the objects in a program as part of a graph that visualizes their respective dependencies on each other, you should only replace the leaf nodes in that graph, if you want to make sure you only change the outward behavior in one place. As soon as you replace an inner node object, you run the risk of changing the behavior of more than one leaf node.
Back to our example, we find out that the class StandardMessageRouter contains the external behavior that we want to change: It holds the SharedFilesKeywordIndex and queries it when it receives an incoming query request. Ideally, this logic should be moved out of the message router class and just be a small message listener implementation. This would be the new leaf node that depends on the keyword index node.
Creating small classes like this with a clear separation of concerns in mind, also takes away the fear of not reusing enough code. These small classes often become trivial to implement and reimplement, and just tie two other components together.
The drawback of this approach is the myriad of classes and interfaces that need to be introduced and the extra burden that is put on wiring classes together and the desire to make that wiring code, which will be substantial in size, reusable too.
I think it is still a good idea, to identify the roles that classes play inside the program graph and in the case of singletons to make sure they are not swapped out for something else if they inner nodes of the dependency graph.
It might even be interesting to come up with a nomenclature for object roles (INTERNAL, EXTERNAL), and annotate classes with them.
P.S.: Looking at the code snippet of the SharedFilesKeywordIndex above, you'll notice a problem. It extends the file manager listener interface and thus gives away a detail of the default implementation that happens to be listen to file manager events.
Turing machine in Prolog
The Halting Problem for a Turing machine can be easily reduced to a Prolog program, thus showing that prolog programs are computationally tantamount to Turing Machines.
Given: A deterministic Turing machine. Reduce the halting problem on an empty input to a Prolog program.
delta(q, _, _, _). // for all q \in Q_F
delta(q, e, [H|L], R) :- delta(q', H, L, [e'|R]). // for all rules moving left after writing e'
delta(q, e, L, [H|R]) :- delta(q', H, [e'|L], R). // for all rules moving right after writing e'
delta(q, e, [], R) :- delta(q', blank, [], [e'|R]). // reached end of left ribbon half: generate blanks
delta(q, e, L, []) :- delta(q', blank, [e'|L], []). // reached end of right ribbon half: generate blanks
The goal is then:
delta(q_0, blank, [], []). // q_0 is the initial state
Cascading if-statements vs. boolean expressions
It's such a good feeling once you finally figured out the right combination to evaluate boolean state and you all put it in a method like this:
public boolean isComplicatedConditionTrue() {
return a < CONSTANT_1 && b * 5 > CONSTANT_6 && b < a....;
}
But how easy is this code to read? What are the drawbacks, how could you write it differently?
Let's see:
public boolean isComplicatedConditionTrue() {
if (a >= CONSTANT_1) {
return false;
}
if (b * 5 <= CONSTANT_6) {
return false;
}
if (b >= a) {
return false;
}
...
return true;
}
A drawback that jumps at you is the fact that it is more lines of code. But there are also benefits to it:
Although it goes against the clever programmer's pride to write code so inelegantly and verbosely, in my opinion it's easier to interpret the code when reading it. Think of the poor guy that has to maintain the boolean expression that took you 3 trials to figure out correctly.
You can easily debug this code and see where the the condition is not met and your debugger can easily step through each line.
Also, it is very easy to add logging statements into each branch to find out which values caused the expression to be false.
As to testing, I'm not sure how good coverage tools are right now when it comes to breaking down boolean expressions and mapping them back to source code, but this could would make their job easier too.
Currently, I'm heavily leaning towards preferring this style when I know I'm going to end up debugging this code at some point, and I can't really see a major drawback with it. But, what do you think?
Conditionally set undefined bash variable
Bash allows you to conditionally initialize a variable if it is not defined already. This makes it easy to add some flexibility to shell scripts and basically allows for named arguments to a script:
This is echo.sh:
#!/bin/bash
TEXT=${TEXT:-hello}
echo $TEXT
If we call it like this:
./echo.sh
We'll get:
hello
Now let's call it with the named argument:
TEXT="good bye" ./echo.sh
And we'll get:
good bye
Exclude .svn folders from Emacs grep-find
Follow these steps:
M-x customize-group
Type:
grep
Go to section "Grep Find Command"
Change it to:
find . -not -path "*svn*" -type f -print0 | xargs -0 -e grep -n -e
Save it for future sessions
del.icio.us backed by The Wayback Machine
Did you ever have a del.icio.us bookmark to a great article, but the article is gone?
Other social bookmarking sites provide snapshotting and archiving of pages that their users bookmark, but del.icio.us doesn't yet. Since man is a creature of habit, I stayed loyal to del.icio.us regardless of this shortcoming.
But maybe this little bookmarklet can help you with finding that long lost article. When activated, it adds extra links to all the bookmarks on the page, that allow you to query the The Wayback Machine of archive.org. The bookmarklet calls a CGI script asynchronously that tries to find the two page snapshots of the respective bookmarked page and adds them as links to the del.icio.us page.



Drag the following link onto your bookmark bar to install the bookmarklet:
The Javscript and Python code are here:
- archive.js (only tested on Firefox and Konqueror)
- geturl.py (needs beautifulsoup and json)
Python upload script for Launchpad
Last fall, we refactored LimeWire to use a GNU Gettext based translation system and Launchpad for translating LimeWire. We documented the results in this blog post.
Looking back at the last couple of months, the results have been good in several respects: It has become a lot easier for developer to mark messages for translation, and our community of translators at Launchpad is doing a wonderful job. Many new translations have been started and existing ones are updated regularly. Thank you!
What was still missing back then, was an automated way for us to upload the translation template (in gettext speak, the pot file) to Launchpad. This has been remedied in the meantime with the following python script.
Download the two python files you see there into the same directory and execute upload-template.py. It will print its usage text to help you get started.
Intelligent vs. smart
I'm not a native speaker, but in my own little language world I like to discern between what "intelligent" and what "smart" means.
For me, "intelligent" comprises the ability to comprehend, to memorize and to think logically and solve problems, basically it comprises the ability to be smart.
So what does "smart" mean for me then? It means the ability to closely approximate reality with one's own mental models. And this in turn also means to be able to change one's models quickly based on new data as it comes in. It is crucial to not ignore data and to rely on the right facts and not follow one's instincts which might be trained behavior from childhood.
An example could be catching an airplane and what the optimal time is for getting to the aiport. I approximate reality only from one side when it comes to leaving for the airport and taking an airplane, I try to be there very early. One argument why I don't try to be smart by finding the best time is that I don't do it often enough and the risk is too high to actually miss a flight payed out of my own pocket.
You can find other examples when looking at code that beautifully mirrors the nature of the problem that it solves. The programmer had a good understanding of the reality of the problem and was able to map it to code. Under- and overapproximations are more often the case though:
An overapproximation would be unnecessarily complicated code that covers cases or input data never realistically happen and should be considered abnormal. Here, the programmer didn't understand the "world" fully and compensated by accounting for more than there actually is. This usually causes the code to be very complex and hard to test since some of the branches can only be covered in tests by setting up very specific input. It is also hard for someone else to recognize this overapproximation, since it requires a very clear understanding of the problem and confidence that those cases will never occur. After all, you're dealing with proving the absence of them which is more difficult than proving their existence.
But why bother, with a too complex approximation? Because it is not efficient, it's like being at the airport a day early just to make sure you make it at all. Someone smarter will have more concise code and more efficient code (less branches) that accomplishes the same task.
An underapproximation is much easier to discover but not necessarily easier to fix: the programmer didn't account for a case that actually can occur and if the program fails hard (by throwing an exception) with enough information about the input, the world view of the code can be adapted. The tricky part here is keeping the design of the code in sync with reality. If the design is already an accurate approximation of the problem, it is easy to just add code to handle the unexpected case. But, if the design suffers from an underapproximation, you have to refactor, yay.
So, is there any strategy towards over- and underapproximating the world? One could be to "binary search". In the case of catching the airplane you would pick extremely early and extremely late times and thus figure out when to be at airport at the latest. This of course involves missing a flight or two.
The general principle we could take away from this is, that to be smart you have to be able and willing to fail when underapproximating and to retry when overapproximating. And being smart is mainly about creating an environment that allows that.
Getting rid of toString() boilerplate code
Today at work, I started discussing cutting down on boilerplate code with Tim with respect to Object.toString().
We went through a few iterations, first there was:
@Override
public String toString() {
return toString("nodeId", nodeId, "dst", dst, "message", message, "type", type);
}
String toString(Object...args) {
StringBuilder builder = new StringBuilder();
for (int i = 0; i < args.length; i+=2) {
if (i + 1 < args.length) {
builder.append(args[i]).append(": ").append(args[i+1]);
} else {
builder.append(args[i]);
}
if (i < args.length) {
builder.append("\n");
}
}
return builder.toString();
}
Putting this into a utils class would cut down on code a lot. But notice, how you still have to type everything twice, and your IDE can only give you limited completion support when typing all those strings
The next idea was to use reflection and just put out all fields. But this doesn't give you fine grained control over what is put out and the toString() method wouldn't convey that information neither.
Tim suggested introducing annotations to mark fields that should be part of of the output of toString() or mark the ones that should not be. But we agreed, that in both cases field declarations would become a bit noisy.
Googling for other solutions showed that one of the Apache projects had a ToStringBuilder and Bob Lee also wrote one for Guice. But in both cases, there was still the name-value duplication.
What we ended up with is a method, that can be used as follows:
@Override
public String toString() {
return StringUtils.toString(this, nodeId, dst, message, type);
}
And it generates the following output:
MessageDispatcherEvent {nodeId="AFF46FDE...", dst="172.34.4.4:4905", message="PingRequest", type="MESSAGE_RECEIVED"}
And here is the implementation:
private static ThreadLocal<IdentityHashMap<Object, Object>> threadLocal =
new ThreadLocal<IdentityHashMap<Object, Object>>();
public static String toString(Object thiz, Object...args) {
boolean cleanUp = false;
try {
IdentityHashMap<Object, Object> handledObjects = threadLocal.get();
if (handledObjects == null) {
cleanUp = true;
handledObjects = new IdentityHashMap<Object, Object>();
threadLocal.set(handledObjects);
}
if (handledObjects.containsKey(thiz)) {
return "circular structure";
}
handledObjects.put(thiz, thiz);
Map<String, String> fields = new LinkedHashMap<String, String>();
for (Field field : thiz.getClass().getDeclaredFields()) {
try {
boolean accessible = field.isAccessible();
field.setAccessible(true);
Object value = field.get(thiz);
field.setAccessible(accessible);
if (args.length == 0 || Arrays.asList(args).contains(value)) {
fields.put(field.getName(), String.valueOf(value));
}
} catch (IllegalArgumentException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
}
}
return thiz.getClass().getSimpleName() + " " + fields.toString();
} finally {
if (cleanUp) {
threadLocal.set(null);
}
}
}
Keeping track of already processed objects in ThreadLocal is necessary to avoid circular reference structures. Printing the correct fields relies on the equals() implementation of their values and if there are several fields with value null, they will be all added to the output, because there is no way of discerning which field should be added. Still pretty neat :)
Thanks to Tim for pointing out the problem with circular data structures!
JavaOne talks: summary and talks III
Design Patterns Reconsidered
This talk was given by Alex Miller. His agenda covered the Singleton, the Template Method, the Proxy, and the Visitor pattern.
Singleton
The problems with the singleton pattern are well known by now. It introduces hidden coupling between components and makes mocking out components for unit tests impossible or at least very difficult. In Java the singleton property is also not held as soon as a multiple class loaders come into play. Initializaton order of components is another problem.
The solution is of course using a dependency injection framework, such as guice, until the language maybe provides support for it.
Template Method
The template method provides pluggable strategies or algorithms through subclassing instead of using composition as in the Strategy pattern. Heave use of it can currently be found in LimeWire's downloader code, where we have hierarchies like:
AbstractDownloader > ManagedDownloaderImpl > MagnetDownloaderImpl
The code is hard to maintain, since template method documents the intent of its API poorly. Quickly, it becomes unclear in which state a method in a subclass is called from the super class. Also, it might not be clear which methods have to be overridden. Inheritance is a very strong form of coupling.
The solution is to use composition and the strategy pattern.
Proxy
Alex skipped over this section due to time constraints. From reading the slides, he would have touched upon certain constraints of the Java language.
It's tedious to write static proxy classes, writing a dynamic proxy can help with this, but proxies can cause problems when certain players rely on object identity or getClass(). Other solutions are AOP or code generation.
Visitor
The visitor pattern is used for operations over a composite hierarchy. It frees the algorithm having knowledge of the the structure of the underlying data structure. He then restates the "Expression Problem" formulated by Philip Wadler, in that it is hard to add new cases to a data type and new functions over the data type without recompiling the visitors.
JavaOne talks: summary and talks II
Closures
Neil Gafter presented his draft for closures. An advantage of using closures to operate on collections instead of iterating over them is that the library code can easily swap out the implementation and use a multithreaded algorithm:
My only concern with that was that this would allocate a lot of intermediate collections to allow for the fluent style of chaining filters, maps, and reductions.
The draft requires a new type to be introduced: Nothing, and the type Void would be used more.
I didn't like how single method interfaces and closures were mixed.
Java Modules
There is JSR for introducing modules/components to the Java platform, to make deployment easier and address the issues covered by OSGi right now.
This would introduce a new visibility keyword "module" that allows interfaces to define methods only visible within the same module.
Also an initialization class can be specified which is probably called when the module is initialized, this would give an entry point for components to properly initialize themselves.
Module properties are defined using annotations directly in Java code.
A blog to read: http://blogs.sun.com/abuckley
GWT vs. JavaFX vs. Flex
parleys.com did a case study reimplementing their website in GWT, JavaFX and Flex. I'll give a short +/- list of the points that were mentioned:
GWT
+ Java Developers can write the frontend code too.
- GWT apps/site are not indexed by Google, but Google is working on that.
+ GWT FX is a nice effects extension toolkit that gives you transitions and animations for GWT components.
Flex/AIR
+ AIR can talk to Flex application in the browser.
+ It just works, 4+ years old technology. [I would have to add, that on Linux, I could only find an Alpha version so far]
+ Many built-in animations.
+ Many components, extensible.
+ URL changes in the browser reflect the page in the Flex app in the browser, allows for deep-linking and bookmarking views inside the app.
+ AIR provides means to cache data locally using sqlite or the filesystem.
- Enterprise development tools are only slowly catching up, rudimentary unit and functional testing, only commercial UI testing, not static code analysis.
- Inherently single-threaded.
- Flex app is caught in the browser, can't break out of that sandbox, like Java Applets that can in the future be dragged on to the desktop.
- No server socket support, so no p2p applications possible yet.
- No google indexing.
JavaFX
- No OSX support yet.
+ Animations on the scene graph possible.
+ Binding paradigm, similar to Flex.
+ Quick prototyping possible, thanks to Adobe plug-ins.
+ Only one code base: Browser, desktop, backend, all Java/JavaFX.
+ Can make use of the JVM eco system, abundance of Open Source libraries.
- Not so many built-in transitions yet.
- It's targeted for designers, but no OSX support yet!
- No deep-linking.
- No notion of Online/Offline mode.
- Not indexable by Google.
+ Possible higher reuse of old backends and models and just UI makeover.
Matching regular expressions with Prolog
Here is an implementation for matching regular expressions with Prolog. It provides concatenation, the kleene star and negation:
The alphabet used is Sigma = {a, b}. Complexity is worse than O(n^4) which could be achieved by using dynamic programming and caching intermediate results. n denotes the length of the expression plus the length of the word to be matched.
t(a, [a]).
t(b, [b]).
t(kleene(_R), []).
t(unify(R, _S), EXP) :- t(R, EXP).
t(unify(_R, S), EXP) :- t(S, EXP).
t(not(R), EXP) :- t(R, EXP),!, fail.
t(not(_R), _EXP).
t(concat(R, S), T) :- t(concat(R, S), [], T).
t(kleene(R), [H|T]) :- t(kleene(R), [H], T).
t(concat(R, S), H, T) :- t(R, H), t(S, T).
t(concat(R, S), L, [H|T]) :- append(L, [H], L1), t(concat(R, S), L1, T).
t(kleene(R), H, T) :- t(R, H), t(kleene(R), T).
t(kleene(R), L, [H|T]) :- append(L, [H], L1), t(kleene(R), L1, T).
A goal for the regular expression a*b matching the text aaab looks like this:
t(concat(kleene(a), b), [a, a, a, b]).
AOP for asserting class invariants
Someone probably thought of this already and I haven't even used AOP yet, but...
It'd be nice use AOP to assert class invariants.
This would mean before and after each method is called, some AOP code would be executed that asserts the object is in the correct state and all class invariants are held.
It would be best if you could put the assertion code in the same class:
public class ClassWithInvariants {
@NonNull
private Field field;
public void workWithField() {
...
field = computedValue();
}
}
This would even use an annotation to specify a class invariant for the field field. The advantage over using a pluggable type system would be here, that illegal return values from other methods that compute the new Field could be caught at runtime.
I wonder if I fixed many bugs, where this would actually have helped.
A more intricate case could look like this:
public class ClassWithNontrivialInvariant {
private Field field;
@SuppressWarning("unused")
@Invariant
private void assertInvariant() {
assert field != null;
assert !field.isEmpty();
}
}
A question would be if it was possible to get the stack trace for the offending method call or otherwise enable or ease debugging.
JavaOne talks: summary and thoughts
Here's a list of talks I saw at JavaOne 2008 and the notes I took. I'm just throwing my notes up here without double checking the spelling of words or the facts.
Desktop Track Overview
Nice UIs to check out:
Sun will host a JavaScript file named deployJava.js.
The JavaScript library can be used in browsers to easily generate HTML code for embedding an applet or alternatively presenting a link to download the latest version of the JRE if the runtime requirements of the applet are not met.
It also provides methods to detect the currently available Java version using either native browser methods provided by the new Java browser plugins for version 6u10 or a JavaScript fallback on old platforms.
The new Java browser plugins will have the JVM running out-of-process which will allow different applets in the same browser to have different VM arguments. Speed and stability will increase and the applet can be dragged out of the browser and keep running as a desktop app, even after the browser has been closed.
Eclipse Moons
The next Eclipse release will be called Ganymede and will come out soon.
A demo with a multi-interface application sharing 90% of its model code using OSGi bundles. The frontends were RCP, RAP and eRCP.
Upcoming Java Language Features
Blog by Alex Buckley: http://blogs.sun.com/abuckley/
New language features:
try {
} catch (Exception1, Exception2 e) {
// to avoid boilerplate exception handling code
}
and
try {
method(); // that only throws E1 and E2
} catch (final Throwable t) {
throw t; // will rethrow E1 and E2, so those will have to be declared
}
More in the next entry.
Rename JavaOne 2008 slides in Python
Another time, that I played around with Python, to get more familiar with the language and its libraries.
This time, I wrote a small script to rename the slides from JavaOne 2008 to have nice filenames. When you download them, they have names like:OT_TS-5638_295638_191-1_v1.pdf
I used the HTML page with the list of talks to extract a mapping from the ids (in this case TS-5638) to the titles of the talks.
For parsing the HTML I first tried out html5lib, but it couldn't handle the huge page and bailed out reporting too many levels of recursion. So I chose BeautifulSoup at which I had looked before once.
Here's the code:
#!/usr/bin/python
# Script to rename JavaOne 2008 slides to their titles
#
# See usage() on how to call it
#
import os
import sys
from BeautifulSoup import BeautifulSoup
import logging
logging.basicConfig(level=logging.INFO)
def usage():
"""Prints the usage of this small program.
"""
print """
Usage:
./slidesrenamer.py sessions.catalog.jsp directory_with_slides
"""
def parse_dict(html_file):
dict = {}
file = open(html_file)
document = BeautifulSoup(file.read())
rows = document.findAll("tr")
for row in rows:
columns = row.findAll("td")
if len(columns) >= 4:
a = columns[3].findAll("a")
if a and len(a) == 1 and a[0].string:
key = columns[2].contents[0].replace(" ", "")
value = a[0].string
logging.debug("%s => %s" % (key, value))
dict[key] = sanitize(value)
return dict
def sanitize(filename):
return filename.replace("/", "_")
def rename_files(directory, id_title_dict):
for file in os.listdir(directory):
for key, value in id_title_dict.items():
if file.find(key) != -1:
print "Renaming %s to %s" % (file, key + " - " + value + ".pdf")
os.rename(directory + "/" + file,
directory + "/" + key + " - " + value + ".pdf")
break
def main(argv):
"""Builds dictionary and renames files.
"""
if len(argv) != 2:
usage()
sys.exit(1)
id_title_dict = parse_dict(argv[0])
logging.debug(id_title_dict)
rename_files(argv[1], id_title_dict)
if __name__ == "__main__":
main(sys.argv[1:])
My findings:
- string.strip() has strange semantics, not just removing surrounding whitespace, but large parts of the strings that I tried to clean. Maybe, because they were unicode strings, that is what BeautifulSoup gives you back.