关于配置文件的设计考虑(zz)

2011年5月22日阅读(208)

一些观点：

Never use Singleton, this is nothing but a honking huge global variable whose initialization is bound to the lifetime of your process not the resource it manages.

What you want is an object injected into every scope that depends on it, in other words the pattern you are looking for is Dependency Injection.

For the simplest of apps, just use command-line parameters and a config object.

For slightly more complex apps, allow a default config object to be created from a config-file, and then modified by the command-line parameters.

For moderately more complex apps, integrate with your platform/language/OS native registry facility to provide the defaults.

For production applications, use an IoC-DI container and have the config/db/etc objects built externally to your application classes and injected at runtime.

The key is to avoid having a process-static config object, as this will interfere with testing, soft-restarts, and make migrating to more flexible configuration approaches more difficult.

需要考虑的问题：

where to put the defaults (hardcode them, or in a separate file)
how to deal with incompatible options
be strict or be gentle ?
command line or configuration file ?
what if the default changes depending on other configuration options?
how to communicate the defaults to an external program ?
how to structure properly a configuration file depending on the nature of the options (mandatory, optional, mandatory if)
how to handle the configuration on the program side, with different languages

Perhaps this is something for the SO community wiki to begin to build. Maybe we could start with a survey of the approaches different applications take, along some dimensions (strict, gentle), and the requirements which drove those choices. For example, strict approaches, with data types and schema validation, may be trying to satisfy other mission critical requirements where user misconfiguration can lead to disaster (security vulnerabilities, corrupted data, etc).

For example, Git seems like a great example of the extreme top end of the scale for complexity of command line tools. It has a huge number of command line options, rich resource files (global ~/.gitconfig and repo-specific .git/config), specialized local configuration (.gitignore), etc.

Some of the dimensions the survey could address:

Data typing in the config file. Is everything a string, or can values be given types? Is typing implicit (JSON) or explicit (XML Schema)?
Early (schema validation) vs late (checking values at time of use).
Flat vs. hierarchical. Internal complexity and modularity may force the configuration model to become more hierarchical and compartmentalized, even perhaps introducing scope to configuration variables. Apache and Nginx are two examples.
Verbosity, weight of the configuration syntax and schema, and other concerns for the humans who have to edit these files. (I personally dread editing some XML-based configuration files).
Techniques for exposing the configuration to the application. For example, once you’ve parsed a config file, how do you present the model to the app internally? Is it a generic model (e.g. DOM) or do groups of keys get translated into application classes, or does it reduce overhead to just use simple key-value retrieval via hashmaps?

The Portland Pattern Repository Wiki has a configuration file page which discusses some of the same issues. I haven’t yet come across a comprehensive resource on the design of software configuration interfaces, but would love to see one.

http://c2.com/cgi/wiki?ConfigurationFiles

Configuration files are increasingly being used to control the behaviour of application programs. For a small program with a well defined behaviour, command line parameters can be used to alter the behaviour of a program without having to modify and recompile its source code. However, larger and more complex applications, like most of the existing business-critical web applications, carry out a large number of broadly varying tasks. These applications have lots of potential configurations and behaviours. Designers have resorted to using configuration files that are read during initialisation time or monitored during run-time to alter the application’s behaviour.

Now, the drawbacks of this approach are as follows.

The application program needs to write its own interpreter (which can become increasingly complex depending upon the complexity of the configuration) to interpret the configuration and control the application.
The application may need to develop a new language to describe the configuration. The application programmer is then required to write a parser to parse this configuration.
All the source code that is needed to control these various configurations (i.e. to dynamically support the various permutations and combinations of features provided by the application) tends to be excessively large and hence is more difficult to maintain. (see ConfigurationManagement, ManagementOfSoftwareConfigurations)

There are some ways to dodge these problems:
Use a lightweight, simple, proven ScriptingLanguage that comes with a lightweight embedded interpreter. This scripting language will call back into the application, configuring it procedurally, and will support abstractions by default. LuaLanguage is a reasonable choice, but there are many others (ForthLanguage, JavaScript, etc.). It’s important that the application can be selective about which capabilities it gives to the script. It’s also really nice if the interpreter provides good error reports, and even better if an embedded debugger is readily available.
Keep configurations to just one file. However, also support ‘includes’ or other form of composition, so that different configurations can ‘share’ a lot of code. Both of these simplify ConfigurationManagement, and includes of scripts that allow abstraction supports a great deal more sharing (since one essentially can have parameterized configurations in the include files, and parameterize them from the root file).
Allowing configurations to spill into multiple files simply by virtue of their location makes ConfigurationManagement for more than one configuration combinatorially more difficult. Essentially, one starts needing a directory-tree per configuration, and a shared file must carefully be duplicated to each branch. It’s also painful simply by extension of HierarchicalFileSystems being painful. I’ve made this mistake. Don’t do it!
If your application has a PluginArchitecture, consider use of a PolicyInjection framework to help select between competing implementations. This form of abstraction can greatly ease portable configurations because configurations do not need to be ‘aware’ of exactly which implementations for a given feature are available.

Some will also suggest ExtensibleMarkupLanguage (XML) or various other options like YamlAintMarkupLanguage (YAML). I guess they’ve somehow made these succeed without feeling disgusted with the resulting product. Here are my feelings on these subjects:
The result isn’t reusable anyway, since the content is specific to the application.
The schema verification is expensive, and rarely helps uncover real configuration problems.
The syntax is inefficient, bulky, and repetitive. You can’t abstract out the boring, repetitive stuff with loops over condensed data. The bulk factor adds to the ConfigurationManagement burden. The lack of abstraction doesn’t even offer useful features – for example, it doesn’t really help you trace a configuration problem to a certain location.
Since you lack abstraction, you also need more ConfigurationFiles for slightly different ‘variations’.
The interpreters are inefficient and bulky. Essentially, they need to ‘remember’ a lot of information about structure. There are variations on the parsers that are streaming, but if you use them you’ll just be configuring your product by procedural commands reacting to the stream anyway. Might as well start with a language that is designed as a stream of commands that remembers only what is needed, and thereby keep everything really lightweight.
Most application programming languages support the procedural paradigm (events with side-effects). Not so many support LogicProgramming or the more declarative approaches. By using a procedural scripting language to configure the application, you will be struggling with one less Framework.
Most application-configurations need to support command-events anyway, e.g. so you can add a button and configure what that button does when clicked. So, if you do go with XML or YAML or a similar solution, you will still be including a ScriptingLanguage. The end result is a bulkier product than it reasonably could have been.
Since you’re going to have scripts called from the application, you’ll end up adding them to the configuration somewhere. Scripts are a pain to work with in XML. Not only is it difficult to trace them to your source, but your code will likely be full of ‘<’ and ‘>’ and ‘&’ symbols. If your ScriptingLanguage is capable of keeping ‘function’ objects around, a reasonable approach is to allow those to be attached to the application buttons (perhaps with capabilities provided through the input object to avoid SideEffects from ‘events’ before the application is fully built).

If you need ‘declarative’ ConfigurationFiles (e.g. for GatedCommunityPattern), I happily suggest you start with DataLog. Alternatively, have the initial configuration build the declarative configuration as a large value or set of tables. It’s what you’d have had to do with XML or YAML streaming anyway. Seriously, the abstraction that comes from a real ScriptingLanguage will greatly help you avoid ConfigurationManagement issues from exploding, and will allow a great deal of reuse. YouAreGonnaNeedIt.

As always GreenspunsTenthRuleOfProgramming applies: EmacsLisp.

In the course of developing a simulation app in Java the required many numeric configuration parameters, I developed the OverridableCodeConstants patterns, which I backed with an XML file accessed through the http://jakarta.apache.org/commons/configuration implementation, which provides a means to EasyFindMutableConfigFilesInJava.

— BenHutchison?

What we need is a simple solution that would enable us to dynamically control and configure the behaviour of complex applications. The solution needs to be more reliable and less cumbersome than configuration files and should lead to a much smaller code base.

— AmitShanbhag

After being repeatedly repulsed by ApacheTomcat configuration files I have moderated my praise for XML config files (though the encoding really doesn’t matter) somewhat. If the config files are simple then it can work. But the Tomcat config files are anything but simple. There are billions of options and locations and poor documentation. I desperately need a GUI to help with the configuration. A GUI should encode all the things I need to do and could do, but don’t know about. For example, how the hell do I turn on debugging so I can trace servlet requests? What magic context needs to be placated?

As you say, this has not much to do with ExtensibleMarkupLanguage; a ConfigFileFromHell? will be such no matter what format it’s rendered in.

Putting Tomcat — or any other app, for that matter — config files in XML would at least make it easy to create XSLT translations and Schema to make sure the config isn’t hosed from the get-go. Also, XML can be edited directly using a slew of available editors. Hmm. I might attack my Apache and Tomcat configs to move them into XML…

Editing is fine. Knowing what options are available isn’t.

When I saw this entry, my first thought was "Tomcat config files". And regardless if it is easy to edit XML files with any editor – a configuration program definitely is a must taking the 1000001 options of Tomcat into account. Having a configuration program which includes all options however, makes the file format, in which configurations are saved, unimportant. Although XML still has the advantage then, that you can edit it with any editor and don’t NEED the configuration program.

Well I am glad I have never had to configure Tomcat, but I still agree that XML config files can suck. Why are people so hung up on XML? Most config files have a few uniquely named properties that can be assigned values – nothing more. so why insist on firing up an XML parser for that? Simple properties files can even be accumulated into one if a bit of care is taken when defining namespaces. That is often impossible in XML.

As was pointed out XML isn’t really the problem. I think XML is as good as any format. The problem is if it’s not something you know intimately, complex configuration files for a complex product are virtually unusable. So regardless of the encoding a GUI needs to be available to encode the knowledge of what can be done.

But once you have a GUI to edit the config file, why bother with XML? A less verbose format can do everything XML can do if you remove the requirement that humans will read and edit it by hand.

Though I prefer property files to XML, the real problem isn’t that XML config files can suck; instead it’s that badly-designed configuration mechanisms suck. A poorly designed GUI can be just as unusable as an obstruse XML file.

The best approach is to design the configuration mechanism, whether it’s a file or a GUI, to be as intuitive as possible to the person configuring the system. You can always write code that converts the configuraton into whatever internal format is most convenient.

其他参考资料：

http://josefbetancourt.wordpress.com/2010/10/24/cascadeconfigpattern/

http://stackoverflow.com/questions/3285329/singleton-registry-or-config-files-for-db-handler

http://stackoverflow.com/questions/4934284/configuration-files-and-defaults-magnum-opus

http://www.config4star.org/#main-overview

技术专题

duanple

关于配置文件的设计考虑(zz)

duanple

关于配置文件的设计考虑(zz)

You Might Also Like

关于内存泄露

正则表达式原理及引擎实现

浮点数有效数字与其表示的关系

duanple

分布式理论(4):Leases 一种解决分布式缓存一致性的高效容错机制

The Anatomy of a Large-Scale Hypertextual Web Sear