Month: January 2009

  • Splitting a String in Perl

    A user recently asked:

    How do I take a string in Perl and split it up into an array with entries two characters long each?

    Ultimately I want to turn something like this

    F53CBBA476

    into and array containing

    F5 3C BB A4 76

    This was my answer:

    @array = ( $string =~ m/../g );

    The pattern-matching operator behaves in a special way in a list context in Perl. It processes the operation iteratively, matching the pattern against the remainder of the text after the previous match. Then the list is formed from all the text that matched during each application of the pattern-matching.

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “How can I split a string into chunks of two characters each in Perl?

  • Understanding Unfamiliar Databases

    A user recently asked:

    What kind of approaches and techniques can you employ to become familiar with an existing database if you are tasked with supporting and/or modifying it? How can you easily and effectively ramp up your knowledge of a database you have never seen before?

    Here was my reply:

    • The first thing I do is create an Entity-Relationship Diagram (ERD). Sometimes you can simply describe the metadata with command-line tools but to save time there are some tools that can generate a diagram automatically.

    • Second, examine each table and column make sure I learn the meaning of what it stores.

    • Third, examine each relationship and make sure I understand how the tables relate to one another.

    • Fourth, read any views or triggers to understand custom data integrity enforcement or cascading operations.

    • Fifth, read any stored procedures. Also read SQL access privileges if there are such.

    • Sixth, read through parts of the application code that use the database. That’s where some additional business rules and data integrity rules are enforced.

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “What are the Best Ways to Understand an Unfamiliar Database?

  • Why Should You Use an ORM?

    A user recently asked for good arguments in favor of using Object/Relational Mapping technology:

    If you were to motivate [sic] the “pro’s” of why you would use an ORM to management/client, what would the reasons be?

    Try and keep one reason per answer so that we can see what gets voted up as the best reasons.

    I offered four answers. The first three got the most votes, but my last answer got little interest.

    1. Speeding development. For example, eliminating repetitive code like mapping query result fields to object members and vice-versa.
    2. Making data access more abstract and portable. ORM implementation classes know how to write vendor-specific SQL, so you don’t have to.
    3. Supporting OO encapsulation of business rules in your data access layer. You can write (and debug) business rules in your application language of preference, instead of clunky trigger and stored procedure languages.
    4. Generating boilerplate code for basic CRUD operations (Create, Read, Update, Delete). Some ORM frameworks can inspect database metadata directly, read metadata mapping files, or use declarative class properties.

    There are lots of other reasons for and against using ORM frameworks. Generally, I’m not a fan of ORM’s, because their benefits don’t seem to make up for their complexity and tendency to perform slowly. Their chief value is in reducing the time taken in repetitive development tasks.

    Hibernate, for example, is about 800,000 lines of code (Java and XML), but it’s complex enough that I doubt it’s easier to learn or to use than SQL. Besides, there seem to be fundamental tasks, such as a simple JOIN that are impossible to do through the entity interface. Please correct me if I’m wrong, but I’ve been searching tutorials and examples and I haven’t found a way to fetch a joined result set from two entities, without writing a custom query in HQL (Hibernate’s abstract version of SQL).

    I was also led to a blog by Glenn Block, titled “Ten advantages of an ORM (Object Relational Mapper).” I disagree with Block on several points. He cites some traits of ORMs as advantages where I see them as defects. He also cites features that are not specific to ORMs; they could be achieved with any type of data access library.

    update: Upon request, here are some specific comments on Glenn Block’s list of advantages of an ORM:

    1. Facilitates implementing the Domain Model pattern

    Not necessarily. I can design Domain Model classes containing plain SQL as easily as I can design classes that operate on the database via an ORM layer. Keep in mind that ActiveRecord is not a Domain Model.

    2. Huge reduction in code.

    Depends. When executing simple CRUD operations against a single table, yes. When executing complex queries, most ORM implementations fail spectacularly compared to the simplicity of using SQL queries.

    3. Changes to the object model are made in one place.

    This is not a benefit of an ORM. Many people use ORM interfaces inexpertly, so when the database structure changes, they still have to update many places in their application to reflect the change. But instead of redesigning SQL queries, they have to redesign usage of the ORM. There is no net win. They could structure their application using plain SQL queries and still be as likely to achieve the benefit of DRY.

    4. Rich query capability.

    Absolutely wrong.

    5. You can navigate object relationships transparently.

    This is definitely a negative rather than a positive. When you want a result set to include rows from dependent tables, do a JOIN. Doing the “lazy-load” approach, executing additional SQL queries internally when you reference columns of related tables, is usually less efficient. Leaving it up to the ORM internals deprives you of the opportunity to decide which solution is better.

    6. Data loads are completely configurable …

    This is not a benefit of an ORM. It is actually easier to achieve this using plain SQL.

    7. Concurrency support.

    Again, not a benefit of an ORM.

    8. Cache managment.

    This has nothing to do with using an ORM. I can cache data using SQL.

    9. Transaction management and Isolation.

    Also has nothing to do with using an ORM versus a more direct DAL.

    10. Key Management.

    Ditto.

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Nice Answer” or “Good Answer” badges. This was my answer to “Why Should You Use An ORM?

  • Is This Legal?

    A user recently asked a question about GPL compatibility with his company’s commercial software offerings:

    I work for a software / design firm and I recently found out that our “in house” CMS is actually [based on software] licensed under the GPL Ver 2. I would like to know if it is ethical / legal to be selling this to clients.

    Don’t act on any legal advice you read on a forum like StackOverflow — including mine. 🙂

    Here’s a passage about GPL from Wikipedia (emphasis mine):

    The terms and conditions of the GPL are available to anybody receiving a copy of the work that has a GPL applied to it (“the licensee”). Any licensee who adheres to the terms and conditions is given permission to modify the work, as well as to copy and redistribute the work or any derivative version. The licensee is allowed to charge a fee for this service, or do this free of charge. This latter point distinguishes the GPL from software licenses that prohibit commercial redistribution. The FSF argues that free software should not place restrictions on commercial use, and the GPL explicitly states that GPL works may be sold at any price.

    However, if your company is distributing the software under another license not compatible with GPL, then they’re violating their license.

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “Is This Legal? (GPL Software/ Licensing Issues)

  • Learn to Program in 21 Days

    A user recently asked:

    Has anyone “learned how to program in 21 days?”

    I’m not a fan of these learn how to program in X amount of days books. Some even boast, learn how to program in 24 hours. This is a joke and an insult to me as a software engineer who went through a rigorous discipline in computer science and mathematics.

    So a question to the community, have you benefited from these become a programmer quick books?

    No, it’s impossible to learn how to program in 24 hours or 21 days.

    See “Teach Yourself Programming in Ten Years,” an article by Peter Norvig (Director of Research at Google, Inc.).

    If you already have good fundamental skills at programming, and you just need a tutorial-style book to guide you through learning a new API, then these kinds of books may be helpful.

    Even then, the level of expertise will be shallow. It will take many months (at least) to become really proficient. But the quick-introduction books are useful to give you a taste of the range of functionality in a language or API.

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “Has anyone ‘learned how to program in 21 days?’

  • Best. Perl Script. Ever.

    A user recently asked:

    What has been your best programming experience so far?

    The most successful program I’ve ever written was this Perl script:

    map(($r=$_,map(($y=$r-$_/3,$l[24-$r]
    .=(' ','@')[$y**2-20*$y+($_**2)/3<0]),(0..30)),),(0..24));
    print join("\n", map(reverse($_).$_, @l)), "\n";
    

    I wrote this for a woman I was dating in 2001. Writing a Perl script for my girlfriend is not as geeky as it sounds, at least in this case. She’s also a software developer, and she was taking a Perl class at the time.

    I consider this script a great success because she married me in 2007!

    I’ll leave it as an exercise for the reader to run the script in a console window and see its output (I promise it’s not a Trojan Horse or any other kind of evil trick).

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was based on my answer to “What is your best programming experience?

  • The Next-Gen Databases

    A user recently asked:

    I’m learning traditional Relational Databases (with PostgreSQL) and doing some research I’ve come across some new types of databases. CouchDB, Drizzle, and Scalaris to name a few, what is going to be the next database technologies to deal with?

    SQL is a language for querying and manipulating relational databases. SQL is dictated by an international standard. While the standard is revised, it seems to always work within the relational database paradigm.

    Here are a few new data storage technologies that are getting attention currently:

    • CouchDB is a non-relational database. They call it a document-oriented database.
    • Amazon SimpleDB is also a non-relational database accessed in a distributed manner through a web service. Amazon also has a distributed key-value store called Dynamo, which powers some of its S3 services.
    • Dynomite and Kai are open source solutions inspired by Amazon Dynamo.
    • BigTable is a proprietary data storage solution used by Google, and implemented using their Google File System technology. Google’s MapReduce framework uses BigTable.
    • Hadoop is an open-source technology inspired by Google’s MapReduce, and serving a similar need, to distribute the work of very large scale data stores.
    • Scalaris is a distributed transactional key/value store. Also not relational, and does not use SQL. It’s a research project from the Zuse Institute in Berlin, Germany.
    • RDF is a standard for storing semantic data, in which data and metadata are interchangeable. It has its own query language SPARQL, which resembles SQL superficially, but is actually totally different.
    • Vertica is a highly scalable column-oriented analytic database designed for distributed (grid) architecture. It does claim to be relational and SQL-compliant. It can be used through Amazon’s Elastic Compute Cloud.
    • Greenplum is a high-scale data warehousing DBMS, which implements both MapReduce and SQL.
    • XML isn’t a DBMS at all, it’s an interchange format. But some DBMS products work with data in XML format.
    • ODBMS, or Object Databases, are for managing complex data. There don’t seem to be any dominant ODBMS products in the mainstream, perhaps because of lack of standardization. Standard SQL is gradually gaining some OO features (e.g. extensible data types and tables).
    • Drizzle is a relational database, drawing a lot of its code from MySQL. It includes various architectural changes designed to manage data in a scalable “cloud computing” system architecture. Presumably it will continue to use standard SQL with some MySQL enhancements.

    Relational databases have weaknesses, to be sure. People have been arguing that they don’t handle all data modeling requirements since the day it was first introduced.

    Year after year, researchers come up with new ways of managing data to satisfy special requirements: either requirements to handle data relationships that don’t fit into the relational model, or else requirements of high-scale volume or speed that demand data processing be done on distributed collections of servers, instead of central database servers.

    Even though these advanced technologies do great things to solve the specialized problem they were designed for, relational databases are still a good general-purpose solution for most business needs. SQL isn’t going away.

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “The Next-Gen Databases.”

  • Verifying a Company Uses Best Practices

    A user recently asked how to use the Joel Test in an interview, to confirm that a software company practices what they preach with regard to professional software development habits:

    I’ve got an interview with a company that claims to score a 12 on the Joel Test. […] What are some ways of determining if they really implement all 12 points? Are there any particular questions I can ask?

    It’s reasonable to say, “show me.” Ask them for examples and concrete details of their support for the Joel Test subjects. Since they claim they score all 12 points, they are obviously proud of it. People tend to like to show off, so they’ll probably be eager to share more details.

    If you ask more specific questions, it’ll become apparent from their descriptions whether they really have those good practices.

    We can think of many specific follow-up questions to the basic questions. The Joel Test questions are in bold below, and my follow-ups, er, follow:

    1. Do you use source control? What source control system do you use? Why did you pick that one? What is your branch/release policy? What are your tag naming conventions? Do you organize your tree by code vs. tests at the top with all modules under each directory, or do you organize by module at the top with code and tests under each module directory?
    2. Can you make a build in one step? What tools do you use to make builds? How long does it take to go from a clean checkout to an installation image? What would it take to modify the build? Is it integrated into your testing harness? What would it take to duplicate a build environment? Are the build scripts and tools also under source control?
    3. Do you make daily builds? What software testing tools do you use for daily builds? Do you use a Continuous Integration tool? If so, which one? How do you identify who “broke the build?” What is your test coverage?
    4. Do you have a bug database? What bug tracker software do you use? Why did you pick that one? What customizations did you apply to it? Can you show me trends of rate of bugs logged or bugs fixed per month? How does a change in source control get associated with the relevant bug?
    5. Do you fix bugs before writing new code? What is your bug triage process? Who is involved in prioritizing bugs? How many bugs did you fix in the last release of your product? Do you do bug hunts with bounties for finding critical bugs?
    6. Do you have an up-to-date schedule? Can I see it? How far are you ahead of/behind schedule right now? How do you do estimating? How accurate a method has that turned out to be?
    7. Do you have a spec? Can I read one? Do you have a spec template? Can I see that? Who writes the specs? Who reviews and approves the specs?
    8. Do programmers have quiet working conditions? Can I see the cubicle or work area for the position I’m interviewing for? (or an equivalent work area)
    9. Do you use the best tools money can buy? What tools do you use? Are you up to date on versions? What tools do you want you don’t have yet? Why not?
    10. Do you have testers? How many? Can I meet one? Do testers do black-box or white-box testing?
    11. Do new candidates write code during their interview? What code would you like me to write? What are you looking for by seeing my code?
    12. Do you do hallway usability testing? How frequently? Can I see a report documenting one of your usability testing sessions? Can you give me an example of something you changed in the product as a result of usability testing?

    Beware if their answers to the specific follow-up questions are evasive like, “um yeah, we are committed to doing more best practices and we’ll be looking to you to help us effect changes toward that goal.” If they’re so committed to it, why don’t they have anything to show for it yet? Probably because like many companies, when the schedule is in jeopardy, following “best practices” goes out the window.

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “Administering the Joel Test.”

  • Do I really need version control?

    A user recently asked:

    I read all over the internet (various sites and blogs) about version control. How great it is and how all developer NEED to use it because is a god bless.

    Here is the question: do I really need this? … I usually work alone (freelancer) and I had no client that asked me to use svn (but never is too late for this, right?). So, should I start and struggle to learn to use svn (or something similar?) Or it’s just a waste of time?

    Here’s a scenario that may illustrate the usefulness of source control even if you work alone.
    Your client asks you to implement an ambitious modification to the website. It’ll take you a couple of weeks, and involve edits to many pages. You get to work.
    You’re 50% done with this task when the client calls and tells you to drop what you’re doing to make an urgent but more minor change to the site. You’re not done with the larger task, so it’s not ready to go live, and the client can’t wait for the smaller change. But he also wants the minor change to be merged into your work for the larger change.
    Maybe you are working on the large task in a separate folder containing a copy of the website. Now you have to figure out how to do the minor change in a way that can be deployed quickly. You work furiously and get it done. The client calls back with further refinement requests. You do this too and deploy it. All is well.
    Now you have to merge it into the work in progress for the major change. What did you change for the urgent work? You were working too fast to keep notes. And you can’t just diff the two directories easily now that both have changes relative to the baseline you started from.
    The above scenario shows that source control can be a great tool, even if you work solo. Source control can solve many problems for you, such as the following:

    • You can use branches to work on longer-term tasks and then merge the branch back into the main line when it’s done.
    • You can compare whole sets of files to other branches or to past revisions to see what’s different.
    • You can track work over time (which is great for reporting and invoicing by the way).
    • You can recover any revision of any file based on date or on a milestone that you defined.

    For solo work, Subversion is recommended. CVS is all but antiquated, and GIT is more useful for distributed teams. A good book is Pragmatic Version Control Using Git by Travis Swicegood.

    I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “Do I really need version control?